h-sarhan / webserv Goto Github PK

View Code? Open in Web Editor NEW

4.0 1.0 0.0 25.5 MB

An HTTP web server written in C++

License: MIT License

C++ 86.10% Makefile 2.03% HTML 6.17% Python 2.64% JavaScript 1.70% PHP 1.36%

42school cpp98 http-server

webserv's Introduction

webserv

An RFC compliant HTTP web server written in C++98

42 Abu Dhabi
mfirdous · hsarhan

Table of Contents

About The Project
Features
Getting Started
- Prerequisites
- Usage
License

About The Project

This project is an implementation of an HTTP web server written in C++. The server can handle incoming multiple concurrent requests from web clients and serve them with HTML files or other static resources, such as images or CSS files. The server also supports the HTTP/1.1 protocol, including features such as persistent connections, chunked encoding, and content compression. In addition, the server includes support for dynamic content generation through CGI scripts, allowing developers to write server-side scripts in languages such as Perl or Python.

This HTTP web server is a highly reliable implementation that remains stable even under extreme loads and with limited system resources. Its efficient design ensures that the server can handle a large number of incoming requests without crashing or hanging

Features

Able to handle multiple concurrent requests
Implements the HTTP protocol including GET, POST, and DELETE requests
HTTP/1.1 support with persistent connections and chunked encoding
Static file serving for HTML files, images, and other resources
CGI script support for dynamic content generation
Robust and resilient implementation
Parses a configuration file
Handles multiple CGIs
Supports cookies and session management

(back to top)

Getting Started

Prerequisites

The only prerequisites for this project are a C++ compiler and the Make build system

This project produces no warnings even with strict warning flags on gcc and clang

To compile
```
  make
```
To run our tests
```
  make test
```

Usage

To run the web server with default configuration
```
  ./webserv
```
To run the web server with a custom configuration
```
  ./webserv ./server.conf
```

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

webserv's People

Contributors

Stargazers

Watchers

webserv's Issues

Handle error pages in config

We should be checking if there is an error page already specified in the config. If there is then we should serve that instead of the default one

Fix partial send and recv issue with poll

Partial send (sendAll) function currently calls send in a while loop without going through poll, must rewrite to make all sends go through poll first and not return an error.

Partial send
Partial recv
Refactor

Handle max client body size

We should stop reading from a request if its body is too big. The max body size is specified in the config

Fix CGI bugs

Check waitpid with WNOHANG before calling write in Response::sendCGIRequestBody to check the status of the child process
Test if 500 error is returned correctly when execve fails, there's a chance it won't be correct since we throw an exception now
When CGI doesn't set content-length make sure to stop reading on EOF
Check leaks, fds, trace children
Handle 504 (Gateway timeout) error

Improve parsing for CGI

Currently the cgi_extensions rule does not allow arbitrary file extensions. This will need to be changed to pass the tester at least.

Also there is no way for the Server class to know if it has to execute a CGI. So we will add another ResourceType that is a CGI
I also have to avoid trimming the query string for calls to execute a CGI

Prepare CGI examples

I am preparing some examples to demonstrate our server's CGI capabilities.

Basic CGI example in Python
CGI upload example using the POST method
CGI example demonstrating cookies/sessions
Basic CGI example in another language

Parse the mime_types.txt file into a map

Handle HEAD requests

This should be easy as a HEAD request is the same as a GET request but the response does not have a message body

#47
Respond to them

Make the resource object hold the full path to the resource along with the original requested resource

Send error response when max clients reached

Will require restructuring of how POLLIN and POLLOUT are checked
Make sure to check for both events at the same time
Run thorough tests of everything since this is a refactoring of core logic

Create Response class

Create a response class that will maintain the response string

Finish keep-alive
Different error codes

Remove unit tests

No Unit Tests

I have decided not to write unit tests for this project and instead write end-to-end tests for the web server possibly in another language or maybe even using postman. But these will be written later.

Delete all the current tests
Remove anything related to the unit tests from the Makefile

Clean up parsing code

The code for parsing the config can be cleaner, there is a lot of code duplication that I can encapsulate into a function. The code for request parsing is just messy and I don't handle query parameters. I think I just need to trim them from the GET request. I will not change any public method signatures so the classes will be used the same way.

Clean up config parsing
Clean up request parsing
Fix query parameters not being trimmed from request
Put the parsing tests in a separate file
Move RequestTarget to its own file
Document everything
Use adjacent_find to remove double slashes in paths
Use std::copy instead of memcpy

Parse HEAD requests

Create a function to unchunk a message body recieved in chunks

Create an unchunking function

Parse a configuration file

Config File

Our server has a lot of options that need to be configured. These options will be specified using a file that is read when the server starts. We need to

Understand all the config options.
Specify a format for the file
Implement a tokenizer to make parsing the file easier #15
Write a Configuration class that will be used by the webserver
Read and parse the contents of the config file into the Configuration class

Location header should send correct route to newly created resource

Now, the location header is set to the relative path of the new resource from the server perspective. It should be the route that the user can request.

Return NO_MATCH instead of NOT_FOUND for non existent paths

What happens now

For a file that's not found in any route, the server returns 404 Not found, unless its a POST or a PUT request in which case the file will be created. However this assumes that the path till right before the filename is a valid and existing path.
eg: POST /posts/nonexistent/nonexistent.txt assumes nonexistent/ is a dir that exists in posts/ and tries to create nonexistent.txt in it, which will fail and return 500 Internal Server Error.

What should happen

A file-not-found case should return NOT_FOUND only if the path leading up to the filename is valid route on the server, if the route isn't recognized it should return NO_MATCH instead. Although this will make no difference to how GET, HEAD, and DELETE currently work, POST and PUT will correctly return 404 instead of 500.

Create a directory listing HTML page

Write a simple function to generate an HTML directory listing when the client requests a directory

The function will simply take in a directory path as a string and return an HTML page as a string listing the contents of the directory

Example of what it might look like:

Tokenizer

Write a general purpose tokenizer class that can produce a list of tokens. This list of tokens will help in parsing the configuration file (@h-sarhan)

Write a generalized Token class whose type is specified by an enum
The Tokenizer class will have rules to tokenize many types of tokens
This class is designed to be easily extendible, meaning that we should be able to add new token types easily
Add support for tokens that can be found in a config file
Fully document any files, classes, and methods written

Make configBlocks map static

Make Server's configBlocks map static and make a static getter that returns a std::vector <ServerBlock*> given a listenerFd.

Add listenerFd as an attribute to Request

Makefile

This project's Makefile should

Compile the project (lol)
Build both a release and production build
Recompile with header file changes
Generate compile_commands.json for use with clangd
Run the tests
Display documentation with doxygen?

Parse a GET request

GET Request

Read the RFC to understand what I am responsible for
Create request class with appropriate member variables/functions
Prepare several examples of GET requests
Parse a request. This may involve tokenization based on the structure of a request
Handle invalid requests

Create logger class

Create a class that will print custom log messages with different categories: Error, Info, Warning (or more). Each error message will have its own color.

This logger must be used to replace all debug messages being printed in Server.cpp.

Handle multiple hostnames in a location block

This is not a necessary requirement of the project, but it is a simple change in parsing and makes writing config files easier.

So I would like to make a config block like this valid

server {
  server_name localhost 234.122.21.1 webserv.com;
}

Make unchunker a method of Request that acts on the entire request

Add an optional index file to the config file

When the client requests for a resource and their request matches the name of the route then a specified index file is served to them instead.

This should be optional and only work with a location block that has a try_files rule in it

Redirects are handled incorrectly

We are not serving the correct response for a redirected request in most cases.

Lets say we have the following location blocks:

location / 
{
    return /tours;
}

location /tours 
{
    try_files /web;
}

If a user makes a request to /italy.html they should be redirected to /tours/italy.html. Currently they just get redirected to /tours. This should be an easy fix

Handle Content-Type header

We need to be able to provide the client with the correct Content-Type header. We will do this by matching the resource's file extension against common values for Content-Type. We are storing the common file types in a file called mime_types.txt

#40
Use the map in Server or Response class to generate a correct header for the response

Implement route matching in a more "logical" way

Route matching right now is kinda wrong with some requests that fully contain the route name.

If the request was /cgiupload it would match on the /cgi route and interpret the request as /cgi/upload. Where it should actually match on / and interpret the request as /cgiupload.

Create function to form CGI env variables from HTTP headers

This function will convert an HTTP header like content-length to HTTP_CONTENT_LENGTH for use with CGI environmental variables.

Decode body messages received with chunk encoding

HTTP request bodies can be encoded in chunks. We need to detect that we have recieved a chunked body and unchunk it,

#42
Use that function in the Server class to unchunk a request body

Fix empty file issue in responses

Either revert to using strings instead of char *
Protect tellg and return 0 as size (do this only if the file streams are the issue)

Fix generation of compilation database on Linux

Generation of a compilation database works on Mac but on a Linux environment there are some invalid '$' tokens.

The issue is in the Makefile produced in #2. The sed script that is used to combine the individual compilation flags together needs to be modified for Linux

Understand the subject

The subject

Do some research into what developing an HTTP web server entails.

Figure out what we are responsible for
Look into the allowed functions
Learn the basics of network programming
Read the RFC HTTP specification

For learning about network programming this resource looks helpful.

Boilerplate code

To introduce new team members to the project we will have some boilerplate/starter code.

This starter code should:

Show what our naming convention and code style will be
Implement dummy functions and classes
Have proper documentation
Setup project directory structure

Set up environment variables for CGI

Notes

CGI will usually be used with POST request
URL must be sanitized, URL decoding has to be done: this is in case the CGI is being called on a GET
- Here the CGI will expect the QUERY_STRING env variable which is the part of the URL after the ?
GET, POST and HEAD are the types of requests that a CGI can process
PUT and DELETE if called on a CGI as target will update/create/delete the CGI file itself

Fix parsing issues

An IP address is not parsed correctly when it is used as a hostname. And we segfault with an empty file.

Incorporate unit testing framework

Unit testing framework

To be able to write tests easily we should use a testing framework while developing

So before we start development we should

Decide on a unit testing framework to use. (Probably doctest)
Add it to the project
Write some example tests to show usage

Default error page

Create a default error page for our HTTP server. The function takes in an HTTP response code and returns an HTML page as a string

Bug in directory listings with routes other than "/"

Directory listings provide the wrong URL with routes other than "/"

This issue can be fixed by providing the directory listing function the route of the request and from that we can form the URL.
Fixing this would be a lot easier if directoryListing() had access to the route the request matched on. So I will probably end up providing it another parameter

Test directory listing with nested routes like "/bob" "/bob/alice"
Fix any issues that come up
Provide directoryListing() with the route it needs to produce correct URLs
Potential issues that can come up are requests for nested directories in a nested routes. In all cases this can be fixed by matching the requested directory against the route

README

The README should include

Improve README

Improved README

Our README is good but it can be better. Some ideas to improve it are to:

Refactor the Request class

The Request class is doing too much right now. It is responsible for parsing the request, managing the buffer, creating a Resource object, unchunking the request, and providing getters to the attributes of a request.

I want to split this functionality across multiple classes. My idea is to have the Request class only be responsible for providing getters and managing the buffer. The parsing of the actual request should be delegated to a dedicated RequestParser class and the generation of a Resource object can be handled in the RequestParser class.

I am not sure which class should handle unchunking the request.

The public API to the Request class will remain the same

CGI related parsing bugs

This issue is to keep track of various bugs in parsing CGI requests

The query string URL should not be decoded

Create a mini web server for testing

Mini web server

In order to learn about network programming and how a web server works, we will build a small web server that is capable of receiving any request sent to it and can send a basic response back. This web server will serve as the base of our final HTTP server.

The web server can recieve and print out any amount of data including large files
The server must be able to send back a response. This can be anything as long as the data is transferred correctly
The server must be resilient, protect every system call
The server must poll fds in a non-blocking manner using poll, epoll, or kqueue