Git Product home page Git Product logo

webserv's Introduction


Logo

webserv

An RFC compliant HTTP web server written in C++98

42 Abu Dhabi
mfirdous · hsarhan

Table of Contents
  1. About The Project
  2. Features
  3. Getting Started
  4. License

About The Project

Web Server

This project is an implementation of an HTTP web server written in C++. The server can handle incoming multiple concurrent requests from web clients and serve them with HTML files or other static resources, such as images or CSS files. The server also supports the HTTP/1.1 protocol, including features such as persistent connections, chunked encoding, and content compression. In addition, the server includes support for dynamic content generation through CGI scripts, allowing developers to write server-side scripts in languages such as Perl or Python.

This HTTP web server is a highly reliable implementation that remains stable even under extreme loads and with limited system resources. Its efficient design ensures that the server can handle a large number of incoming requests without crashing or hanging

Features

  • Able to handle multiple concurrent requests
  • Implements the HTTP protocol including GET, POST, and DELETE requests
  • HTTP/1.1 support with persistent connections and chunked encoding
  • Static file serving for HTML files, images, and other resources
  • CGI script support for dynamic content generation
  • Robust and resilient implementation
  • Parses a configuration file
  • Handles multiple CGIs
  • Supports cookies and session management

(back to top)

Getting Started

Prerequisites

The only prerequisites for this project are a C++ compiler and the Make build system

This project produces no warnings even with strict warning flags on gcc and clang

  • To compile

      make
  • To run our tests

      make test

Usage

  • To run the web server with default configuration

      ./webserv
  • To run the web server with a custom configuration

      ./webserv ./server.conf

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

webserv's People

Contributors

h-sarhan avatar mehrinfirdousi avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

webserv's Issues

Handle error pages in config

We should be checking if there is an error page already specified in the config. If there is then we should serve that instead of the default one

Fix partial send and recv issue with poll

Partial send (sendAll) function currently calls send in a while loop without going through poll, must rewrite to make all sends go through poll first and not return an error.

  • Partial send
  • Partial recv
  • Refactor

Handle max client body size

We should stop reading from a request if its body is too big. The max body size is specified in the config

Fix CGI bugs

  • Check waitpid with WNOHANG before calling write in Response::sendCGIRequestBody to check the status of the child process
  • Test if 500 error is returned correctly when execve fails, there's a chance it won't be correct since we throw an exception now
  • When CGI doesn't set content-length make sure to stop reading on EOF
  • Check leaks, fds, trace children
  • Handle 504 (Gateway timeout) error

Improve parsing for CGI

Currently the cgi_extensions rule does not allow arbitrary file extensions. This will need to be changed to pass the tester at least.

Also there is no way for the Server class to know if it has to execute a CGI. So we will add another ResourceType that is a CGI
I also have to avoid trimming the query string for calls to execute a CGI

Prepare CGI examples

I am preparing some examples to demonstrate our server's CGI capabilities.

  • Basic CGI example in Python
  • CGI upload example using the POST method
  • CGI example demonstrating cookies/sessions
  • Basic CGI example in another language

Handle HEAD requests

This should be easy as a HEAD request is the same as a GET request but the response does not have a message body

  • #47
  • Respond to them

Send error response when max clients reached

  • Will require restructuring of how POLLIN and POLLOUT are checked
  • Make sure to check for both events at the same time
  • Run thorough tests of everything since this is a refactoring of core logic

Create Response class

Create a response class that will maintain the response string

  • Finish keep-alive
  • Different error codes

Remove unit tests

No Unit Tests

I have decided not to write unit tests for this project and instead write end-to-end tests for the web server possibly in another language or maybe even using postman. But these will be written later.

  • Delete all the current tests
  • Remove anything related to the unit tests from the Makefile

Clean up parsing code

The code for parsing the config can be cleaner, there is a lot of code duplication that I can encapsulate into a function. The code for request parsing is just messy and I don't handle query parameters. I think I just need to trim them from the GET request. I will not change any public method signatures so the classes will be used the same way.

  • Clean up config parsing
  • Clean up request parsing
  • Fix query parameters not being trimmed from request
  • Put the parsing tests in a separate file
  • Move RequestTarget to its own file
  • Document everything
  • Use adjacent_find to remove double slashes in paths
  • Use std::copy instead of memcpy

Parse a configuration file

Config File

Our server has a lot of options that need to be configured. These options will be specified using a file that is read when the server starts. We need to

  • Understand all the config options.
  • Specify a format for the file
  • Implement a tokenizer to make parsing the file easier #15
  • Write a Configuration class that will be used by the webserver
  • Read and parse the contents of the config file into the Configuration class

Return NO_MATCH instead of NOT_FOUND for non existent paths

What happens now

For a file that's not found in any route, the server returns 404 Not found, unless its a POST or a PUT request in which case the file will be created. However this assumes that the path till right before the filename is a valid and existing path.
eg: POST /posts/nonexistent/nonexistent.txt assumes nonexistent/ is a dir that exists in posts/ and tries to create nonexistent.txt in it, which will fail and return 500 Internal Server Error.

What should happen

A file-not-found case should return NOT_FOUND only if the path leading up to the filename is valid route on the server, if the route isn't recognized it should return NO_MATCH instead. Although this will make no difference to how GET, HEAD, and DELETE currently work, POST and PUT will correctly return 404 instead of 500.

Create a directory listing HTML page

Write a simple function to generate an HTML directory listing when the client requests a directory

The function will simply take in a directory path as a string and return an HTML page as a string listing the contents of the directory

Example of what it might look like:
dir_listing

Tokenizer

Tokenizer

Write a general purpose tokenizer class that can produce a list of tokens. This list of tokens will help in parsing the configuration file (@h-sarhan)

  • Write a generalized Token class whose type is specified by an enum
  • The Tokenizer class will have rules to tokenize many types of tokens
  • This class is designed to be easily extendible, meaning that we should be able to add new token types easily
  • Add support for tokens that can be found in a config file
  • Fully document any files, classes, and methods written

Make configBlocks map static

Make Server's configBlocks map static and make a static getter that returns a std::vector <ServerBlock*> given a listenerFd.

Add listenerFd as an attribute to Request

Makefile

Makefile

This project's Makefile should

  • Compile the project (lol)
  • Build both a release and production build
  • Recompile with header file changes
  • Generate compile_commands.json for use with clangd
  • Run the tests
  • Display documentation with doxygen?

Parse a GET request

GET Request

  • Read the RFC to understand what I am responsible for
  • Create request class with appropriate member variables/functions
  • Prepare several examples of GET requests
  • Parse a request. This may involve tokenization based on the structure of a request
  • Handle invalid requests

Create logger class

Create a class that will print custom log messages with different categories: Error, Info, Warning (or more). Each error message will have its own color.

This logger must be used to replace all debug messages being printed in Server.cpp.

Handle multiple hostnames in a location block

This is not a necessary requirement of the project, but it is a simple change in parsing and makes writing config files easier.

So I would like to make a config block like this valid

server {
  server_name localhost 234.122.21.1 webserv.com;
}

Add an optional index file to the config file

When the client requests for a resource and their request matches the name of the route then a specified index file is served to them instead.

This should be optional and only work with a location block that has a try_files rule in it

Redirects are handled incorrectly

We are not serving the correct response for a redirected request in most cases.

Lets say we have the following location blocks:

location / 
{
    return /tours;
}

location /tours 
{
    try_files /web;
}

If a user makes a request to /italy.html they should be redirected to /tours/italy.html. Currently they just get redirected to /tours. This should be an easy fix

Handle Content-Type header

We need to be able to provide the client with the correct Content-Type header. We will do this by matching the resource's file extension against common values for Content-Type. We are storing the common file types in a file called mime_types.txt

  • #40
  • Use the map in Server or Response class to generate a correct header for the response

Implement route matching in a more "logical" way

Route matching right now is kinda wrong with some requests that fully contain the route name.

If the request was /cgiupload it would match on the /cgi route and interpret the request as /cgi/upload. Where it should actually match on / and interpret the request as /cgiupload.

Fix empty file issue in responses

  • Either revert to using strings instead of char *
  • Protect tellg and return 0 as size (do this only if the file streams are the issue)

Fix generation of compilation database on Linux

Generation of a compilation database works on Mac but on a Linux environment there are some invalid '$' tokens.

The issue is in the Makefile produced in #2. The sed script that is used to combine the individual compilation flags together needs to be modified for Linux

Understand the subject

The subject

Do some research into what developing an HTTP web server entails.

  • Figure out what we are responsible for
  • Look into the allowed functions
  • Learn the basics of network programming
  • Read the RFC HTTP specification

For learning about network programming this resource looks helpful.

Boilerplate code

Boilerplate code

To introduce new team members to the project we will have some boilerplate/starter code.

This starter code should:

  • Show what our naming convention and code style will be
  • Implement dummy functions and classes
  • Have proper documentation
  • Setup project directory structure

Set up environment variables for CGI

Notes

  • CGI will usually be used with POST request
  • URL must be sanitized, URL decoding has to be done: this is in case the CGI is being called on a GET
    • Here the CGI will expect the QUERY_STRING env variable which is the part of the URL after the ?
  • GET, POST and HEAD are the types of requests that a CGI can process
  • PUT and DELETE if called on a CGI as target will update/create/delete the CGI file itself

Fix parsing issues

An IP address is not parsed correctly when it is used as a hostname. And we segfault with an empty file.

Incorporate unit testing framework

Unit testing framework

To be able to write tests easily we should use a testing framework while developing

So before we start development we should

  • Decide on a unit testing framework to use. (Probably doctest)
  • Add it to the project
  • Write some example tests to show usage

Default error page

Create a default error page for our HTTP server. The function takes in an HTTP response code and returns an HTML page as a string

Bug in directory listings with routes other than "/"

Directory listings provide the wrong URL with routes other than "/"

This issue can be fixed by providing the directory listing function the route of the request and from that we can form the URL.
Fixing this would be a lot easier if directoryListing() had access to the route the request matched on. So I will probably end up providing it another parameter

  • Test directory listing with nested routes like "/bob" "/bob/alice"
  • Fix any issues that come up
  • Provide directoryListing() with the route it needs to produce correct URLs
  • Potential issues that can come up are requests for nested directories in a nested routes. In all cases this can be fixed by matching the requested directory against the route

README

README

The README should include

  • Project name
  • Good banner
  • Team members
  • Project description
  • Compilation and usage instructions
  • Project features

Improve README

Improved README

Our README is good but it can be better. Some ideas to improve it are to:

  • Add more icons/emojis
  • Better screenshots/images
  • More badges and shields
  • Add a team section using badges from this repo
  • Include more written content
  • Improve organization

Refactor the Request class

The Request class is doing too much right now. It is responsible for parsing the request, managing the buffer, creating a Resource object, unchunking the request, and providing getters to the attributes of a request.

I want to split this functionality across multiple classes. My idea is to have the Request class only be responsible for providing getters and managing the buffer. The parsing of the actual request should be delegated to a dedicated RequestParser class and the generation of a Resource object can be handled in the RequestParser class.

I am not sure which class should handle unchunking the request.

The public API to the Request class will remain the same

CGI related parsing bugs

This issue is to keep track of various bugs in parsing CGI requests

  • The query string URL should not be decoded

Create a mini web server for testing

Mini web server

In order to learn about network programming and how a web server works, we will build a small web server that is capable of receiving any request sent to it and can send a basic response back. This web server will serve as the base of our final HTTP server.

  • The web server can recieve and print out any amount of data including large files
  • The server must be able to send back a response. This can be anything as long as the data is transferred correctly
  • The server must be resilient, protect every system call
  • The server must poll fds in a non-blocking manner using poll, epoll, or kqueue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.