andrelouiscaron / httpxx Goto Github PK

C++ wrapper for C-based HTTP parser

License: MIT License

C++ 89.92% CMake 6.53% Makefile 2.31% M4 1.24%

httpxx's Introduction

httpxx --- HTTP Parser for C++

Authors:	André Caron
Contact:	[email protected]
Version:	0.1
Date:	2011-07-18

Description

This library is a simple C++ wrapper for the C library http-parser [1] (This code was derived from the HTTP parser code in NGINX). http-parser is a simple HTTP streaming parser (for those of you familiar with XML, it works much like a SAX parser). It knows nothing of sockets or streams. You feed it data and it invokes registered callbacks to notify of available data. Because http-parser operates at the very lowest level, it does not buffer data or allocate memory dynamically. This library attempts to make that library easily usable by C++ programs by interpreting those callbacks and buffering data where needed.

[1]	https://github.com/ry/http-parser.

Documentation

The API for defined classes is documented using Doxygen [2]. You will need to run Doxygen from the project source folder to generate the output HTML.

[2]	http://www.stack.nl/~dimitri/doxygen/

Compiled HTML documentation for official releases is available online. Check the project page.

Fetching the code

This project does not distribute the code to http-parser directly. To fetch the entire source code, make sure you fetch submodules [3] too:

$ git clone ...
$ cd httpxx
$ git submodule init
$ git submodule update

[3]	http://book.git-scm.com/5_submodules.html

Portability

http-parser itself has no dependencies and compiles with C++ compilers. httpcxx uses only standard library facilities (namely std::string and std::map) and introduces no additional dependencies.

The code should compile as is under a standard-compliant C++03 implementation.

Memory allocation policy

A good memory allocation policy is important in server programs, which typically run for a long time and suffer from memory fragmentation. httpcxx does its best to avoid repeated allocation, but it needs a little help on your part.

http::Request and http::Response parser object allocate memory as required because they buffer different parts of the incoming HTTP request/response in std::string instances. However, they are implemented carefully as to use the growing property of std::string [4] to their advantage. In particular, you may re-use http::Request and http::Response parser objects for parsing multiple request/response objects using their .clear() method. This method marks all header lengths as 0 but keeps the instances as well as the map. All this ensures that parsers avoid repeated memory allocation.

[4]	`std::string` instances keep the allocated memory buffer even when you resize them such that their length decreases. In particular, `std::string::clear()` marks the string length as 0 but keeps the allocated buffer.

Samples / demos

Check out the sample programs in the demo/ subfolder.

httpxx's People

Contributors

Stargazers

Watchers

httpxx's Issues

Message isn't totally polymorphic

Message is not totally polymorphic, ie. not all functions overridden by Request or Response (or their buffered versions) are declared virtual (more specifically: clear() and reset_buffers())
This makes clearing a Response or Request through a reference or pointer to the base class impossible.

Multiple Set-Cookie headers not supported

It is perfectly fine to receive multiple Set-Cookie headers, unfortunately it seams that the library doesn't support that. I'm able to see only the last Set-Cookie header from original message. Further examination of the library code revealed that headers are stored in std::map which totally explains current situation.

I Think this wrapper is too heavy

it may cost high overhead

Problems with feed method

Hello Andre,

congrats on the great lib! I guess my problem is all about knowing on how to use the feed method. I'm working with libtins, a library which abstracts network packets and provides a cool method for following TCP streams and give their final payloads in a callback method.

However, the payload given by it is provided in a vector<unsigned char> structure. For extracting it in a buffer, I'm doing the following:

if(!stream.client_payload().empty()){
    std::size_t size = stream.client_payload().size();
    unsigned char* buf = new unsigned char[size + 1];
    buf = reinterpret_cast<unsigned char*>(stream.client_payload().data());
    //buf[size + 1] = '\0';

    Request request;
    request.feed(buf, size);
}

When I run this piece of code, the program gives me the following runtime error, in the line where I call the feed method:

terminate called after throwing an instance of 'http::Error'
  what():  parser is paused

If you noticed, I commented the line where I concat a \0 string terminator to the buffer. I've tried to run the code with that line as well, but the problem persisted. Just to be clear, I can print the buffer without problems. Hope you can help me. Thanks in advice.

Cheers.

Where did the request::body() method go?

The documentation refers to body() method - but getting below error while using it:

Error 1 error C2039: 'body' : is not a member of 'http::Request' E:\WinApp\webserver\include\win\http_connection.h 117 1 webserver

Verified the source code, could not find it there either.

Incorrect parse error detection

At line 174 of Messages.cpp, you detect errors by checking if (used < size). However, if you call http_parser_execute with size of 0 (indicating EOF) it returns 1 to indicate an error (i.e., used > size). The check employed in the http_parser documentation is (used != size) for this reason.

Response::clear doesn't reset the parser

Calling clear() on a Response instance doesn't reset the parser struct. It only calls Message::clear() since Response has no overridden clear().

Impossible to parse pipelined HTTP 1.1 headers

When parsing multiple HTTP 1.0 headers in one buffer, feed() returns as soon as the first message was parsed (which is the behaviour I actually need).

However, if I just replace every "HTTP/1.0" with "HTTP/1.1" in the headers, http-parser doesn't return after the first message. It calls on_message_complete but doesn't return like it does for HTTP 1.0 headers.

That way, when using httpx, it's impossible to parse all 3 headers, without having to resort to feeding it single bytes in a loop and checking if the message is complete everytime.

Maybe the simplest solution would be to return non-0 in Message::on_message_complete to abort parsing, copy over needed structures and reset the http-parser struct.

See the sample code below. It outputs all 3 headers no problem. Replace every 1.0 with 1.1 and it outputs one header with body "HELLOWorld".

const char HEADER_TEST[] =

    "GET /get_funky_content_length_body_hello HTTP/1.0\r\n"
    "conTENT-Length: 5\r\n"
    "\r\n"
    "HELLO"

    "GET /get_one_header_no_body HTTP/1.0\r\n"
    "\r\n"

    "POST /post_identity_body_world?q=search#hey HTTP/1.0\r\n"
    "Accept: */*\r\n"
    "Transfer-Encoding: identity\r\n"
    "Content-Length: 5\r\n"
    "\r\n"
    "World"
    ;

http::Request request;

const char* buf = HEADER_TEST;
size_t size = sizeof(HEADER_TEST) - 1;

int header = 1;

while(size > 0)
{

    size_t read = request.feed(buf, size);
    if(read == 0)
    {
        std::cout << "ERROR" << std::endl;
        break;
    }

    size -= read;
    buf += read;

    std::cout << "Header " << header << '\n';
    std::cout << "   Read: " << read << " bytes" << '\n';
    std::cout << "   Body: " << request.body().length() << " bytes" << '\n';
    std::cout << "   Body: " << request.body() << '\n';

    request.clear();
    header++;
}

the last header is missing, if it follows a chunk

data:
POST /111.html HTTP/1.1
h1:h1
h2:h2

5
hello
0
h3:h3
h4:h4

h3 is stored, but h4 is not. Checking the code, on_headers_complete stores h2, following this logic, on_message_complete should store h4.

Support GNU Autotools

To bring httpxx to all UNIX/Linux platforms, old and new.

Fix README

All links provided in README.md are not accessible

Allow parsing messages without buffering (parsing?) the body

httpxx buffers the whole body of a message without the ability to erase it or keep it from doing that at all. Huge messages whose bodies are decoded by the parser may easily blow up the body string to several dozens of MB.

There should be a way to discard the buffer (creation), or refactor the classes to allow seperate parsing of header and body of a message, and possibly allow getting the body size via the received header (maybe set a flag for chunked/eof messages).

Hide http-parser completely

As the cmake script written, libhttp-parser.a won't be installed, so maybe the header http-parser.h is not necessary to expose.

I guess this project is developed under windows with MSVC, which allows a static lib to be linked into another static lib. And This is what httpxx expected.
Most compilers do not allow this, MSVC is an exception.

If this is an issue, a possible solution is linking http-parser.o directly into libhttpxx.a.

http::Request has not body method

thanx for this project;
how can retrieve body of message?
in demo request object has a body method but that is not true;

Expose http_should_keep_alive

This allows a simpler check whether connections should be keep-alive without having to additionally check the HTTP version along with the flags.
E.g. HTTP/1.0 with both F_CONNECTION_CLOSE and F_CONNECTION_KEEP_ALIVE not set means Close, while for HTTP/1.1 it means Keep-Alive.

http_parser_pause causes feed to return wrong read count

This is kinda weird and I don't know if it's a bug in http-parser or in the usage of http_parser_pause in httpxx.

I call rev(MSG_PEEK) which reads, say, 300 bytes. Request::feed returns 299 bytes read, although the header is really 300 bytes long (I doublechecked). The complete flags are both set (it has no body).
As a result, I recv 299 bytes (to remove the header from the TCP buffer), but that's one byte to few. In my next loop (thinking this is a pipelined request), I call recv which blocks and at some point returns with no data read if the connection was closed.

PS: I temporarily used the newest commit of http-parser but the bug is the same.

(Only noticed this now after adding tons of error checking. What looked like a slow connection was a timed out select because of the missing byte. I know I'm a terrible bug reporter lol)

Allow setting Response's F_SKIPBODY flag

Not necessarily needed, but without it, Flags::skipbody() is useless. Normally, the application should know whether this response is for a HEAD request (meaning complete header, but no body). But it would allow for loading off the information on the Response object without having to pass on further state variables.
Maybe just add a bool to the constructor is_head_response. Then in the on_headers_complete callback, you should return 1. The infrastructure to override the callback in Response seems to be there.

if add call back for chunk

message does not deal with chunk mode, is it vath to do it in http message wrap?

Conan package

Hello,
Do you know about Conan?
Conan is modern dependency manager for C++. And will be great if your library will be available via package manager for other developers.

Here you can find example, how you can create package for the library.

If you have any questions, just ask :-)

std::basic_string::clear does not not explicitly require that capacity is unchanged

Unlike for std::vector::clear, the C++ standard does not explicitly require that capacity is unchanged by this function.