I changed the test program examples/doge.c if (argc > 1) { mpc_result_t r;<

Parsing is extremely slow,about orangeduck/mpc

Comments (21)

orangeduck commented on August 19, 2024

Thanks je-so. I will profile mpc and see what is going on.

from mpc.

orangeduck commented on August 19, 2024

You can take a look at the most recent version in the repo and see if it is any faster for you. In 4472dd4 and af81180 I made some minor tweaks to try and pick any low hanging performance-based fruit. In that version I ran your test of 2000 lines of doge and it took ~0.3 seconds although my computer may be faster than yours.

I'll continue looking into improving performance but it looks like it will be hard to profile and find areas to target. This is just inherent in this kind of library. From my basic evaluation it seems it is spending almost all the time in the "interpreter" part of the code where it executes the parsing computation stored in the parser data structures. There is definitely a bit more work to do but ultimately there isn't much leeway here as this isn't redundant work and is really what makes mpc generic and able to parse any language. You can compare this to hand made parsers, or parser code from a parser generator - which are used for only one language and so do not require an "interpreter" to run and are naturally much faster. The difference in design/performance here is a bit like that of a compiled language vs an interpreted language and the trade offs / solutions are the same. I.E the only real way to target this is something complex like a bytecode JIT compiler or similar.

For example mpc can actually take the input string to mpca_lang at runtime! So it is like you are generating a parser on-the-fly. This is a bit like being able to eval in python or other dynamic languages.

Saying that, performance is improved if you make a predictive grammar (which the doge one isn't) as no backtracking is required. And also it should be better if you use the parser combinator approach rather than the grammar definition approach with your own syntax tree data type. If you have your own syntax tree data type you can also use numeric tags rather than strings which should be faster.

Anyway I'll continue looking into it.

Thanks,

from mpc.

je-so commented on August 19, 2024

Yes, the new version is faster.

I played with it and changed the input of example/doge from "so c" to "so a"
and got "Segmentation fault". Changing definition of noun into
noun : "lisp" | "language" | "a" | ...
worked but now input "so c" produces a segfault.

It seems that erroneous input produces segfaults.

I've checked out version 81abc6f and it produces the correct error message
file:1:4: error: expected whitespace, "lisp", "language", "c", "book" or "build" at 'a'

It seems that the optimization does not work in error cases.
A good idea is to add this error case to your unit tests.

from mpc.

je-so commented on August 19, 2024

Found another problem in doge.c (with older version cause newer version does a segfault):

Changing noun into
noun : "lisp" | "language" | /c+c/ | ...
does not work on input "so cc"

The shown error is
file:1:6: error: expected 'c' at newline

With noun defined as
noun : "lisp" | "language" | /cc/ | ...
it works just fine.

A good candidate for another unit test.

from mpc.

je-so commented on August 19, 2024

I think a good way to speed up the parser is to replace the calls to malloc/free
for small objects with a call to an optimized version which allocates memory
for 10000 small objects at once and uses a simple free list to cache free calls.

Something like:

static struct free_list free_list_type1;

struct type1 * malloc_type1(void) {
   if (free_list_type1.size == 0) /* allocate another big block and add to free list */
   assert (free_list_type1.size != 0);
   /* remove from free list and return */
}

from mpc.

orangeduck commented on August 19, 2024

Hi je-so,

Thanks for the comments and bug reports. I'll look into them.

Definitely the number of small allocations is wasteful. I was actually wondering how to fix this in the best way. The problem of using a custom allocator is that object we pass to the user has to have freeable with the free function. Do you have any ideas?

from mpc.

je-so commented on August 19, 2024

Simple. Export a generic mpc_free function in the interface.
The implementation calls the type specific function free_type1, free_type2 according to the type of
object which must be stored at (and set by malloc_type1, malloc_type2 ...).

((char*)object)[-1]

Another possibility is to store a destructor field in every type.

struct type1 {
  int  (*mpc_delete) (void* object);
  /* other fields */
};
struct type2 {
  int  (*mpc_delete) (void* object);
  /* other fields */
};
...

The generic mpc_free(void * object) could then be implemented as

((struct type1*)object)->mpc_delete(object);

With a type specific destructor it is even possible to free all objects recursively this object points to.
A user of the library could supply his own object types + destructor function without affecting the implementation of mpc_free or the internal workings of mpc.

from mpc.

orangeduck commented on August 19, 2024

Latest commits should fix the crashes.

Using something like mpc_free might work in the future but it will break existing code which expects to be able to free parse results with free. I think what may be possible is to allocate a big block of memory for use with temporary internal strings and to ensure that these strings are converted to heap strings at any point where they exit the mpc internals and face users. This may be a little tricky but I think it should be possible.

from mpc.

je-so commented on August 19, 2024

I wondered how fast a simple LL(1) parser could be without building any AST at all.

To test it I have written a very simple interpreted parser which tries to parse 32000 lines of "so c so c so c so c so c so c so c so c so c" (see https://github.com/je-so/testcode/blob/master/parser/doge2.c).

Compared to examples/doge.c (switched to PREDICTIVE) the parser runs 65 times faster.

By the way the new version is faster!

To be honest the simple test parser does no memory management at all.
So the speed up is achieved due to the fact that no copy, malloc and free
operations are called.

from mpc.

orangeduck commented on August 19, 2024

Hey that is an interesting experiment and really nice little bit of code. I wonder how fast it would be building the AST too.

I've pushed up a new version. It uses the C function stack instead of a custom heap allocated stack to run the interpreter and so is quite a bit faster (I think roughly 2x). It does mean that now you could potentially go very deep in the C stack trying to read a very large string with a badly structured grammar (I.E recursion used in the grammar instead of *) but I think this shouldn't be a problem in most practical cases.

After this update I guess it should be around 20x-30x slower than your test project. This is actually quite encouraging and I suspect it could be a lot faster if we reduce the memory allocation (currently mpc will call malloc for each character it reads) so that is what I will look into next. Potentially it could only be 5-10x slower than a hand built parser which I think is quite fair performance wise.

from mpc.

je-so commented on August 19, 2024

The following simple grammer with input "cc" does not work.

a = <b>* <b>;
b = \"c\";

It seems that the parser engine is trying to match as many <b>* as possible
without checking that another <b> is expected afterwards.

This type of grammar is neither LL nor LR (I think) but backtracking should solve
it anyway even if it is extremely slow.

Rewriting the grammar as

a = <b> <b>*;
b = \"c\";

should work as expected.

from mpc.

orangeduck commented on August 19, 2024

mpc has never supported these types of grammars: https://github.com/orangeduck/mpc#backtracking-isnt-working

This is because it requires a different type of backtracing where rather than just trying a different choice (such as with or) the backtracer "chooses to fail" for no reason. In this case when parsing the rule <b>* the backtracer must choose to not consume all <b>. As noted this is actually some kind of exponential factor backtracing and also I think acts unintuitiively in some cases as the parser is not acting greedily. When the parse is greedy the behaviour is easy to understand.

Anyhow this is something that might be supported eventually in mpc but it does add quite a bit of complexity.

from mpc.

je-so commented on August 19, 2024

OK! So regular expressions of the form c*c will not work if the you map them internally to a a mpc construct, should they?

from mpc.

orangeduck commented on August 19, 2024

No and actually you'll get the same behaviour in regex engines because * is usually greedy by default. You can make it not greedy in some engines by using *? but mpc doesn't support this.

from mpc.

je-so commented on August 19, 2024

I understand. Only engines which convert a regex into a deterministic construct (DFA) match c*c correctly (c*c == deterministic cc*).

This explanation invalidates my supposed error (see /c+c/ in 4th post).

from mpc.

orangeduck commented on August 19, 2024

I've pushed up a new version which is almost 2x faster again. I tried a couple of things this time. First I wrote a custom block allocator which mallocs a large block of memory and then on allocations stores a pointer to next block and size of allocation a bit like a flattened linked list.

This turned out to be slower than just using malloc lol. So then I wrote an allocator which allocated fixed blocks of 64 bytes. The internal allocation function then just scans the allocations in a cyclic way the first free block. It starts just past the last block it allocated so almost always has a free block immediately ready. This one was a faster than malloc for the many small allocations and deallocations that mpc does so that was a success.

I also realized that mpc spends a fair amount of time deleting error messages which could never be shown to the user so I added a method in the parser to suppress error message generation when it is certain it will just be free'd later. This shaved off some more time.

There is one place this still happens a lot - which is in an or clause. Even for a valid, predictive input mpc will generate lots of error messages for each option of the or which fails, just to delete them later on. Instead really mpc should generate the error message for the whole clause once every possible option has failed rather than generating them as it goes along and either deleting them or combining them at the end depending on the success. But I'm still not sure how to fit this into the structure.

Re: c*c. This transformation is something we could potentially put into the mpc_optimise function but as I mentioned before I'm a bit wary of doing transformations which affect the behaviour of the parser.

from mpc.

je-so commented on August 19, 2024

It's getting faster 👍

I found line mpc.c:125 memset(i->mem_full, 0, sizeof(mpc_mem_t) * MPC_INPUT_MEM_NUM);
I think you meant memset(i->mem_full, 0, MPC_INPUT_MEM_NUM);
or better memset(i->mem_full, 0, sizeof(i->mem_full));.

Also mpc_realloc(mpc_input_t *i, void *p, size_t n) should check for (p == 0) and call mpc_malloc in case of p == 0. Else in case of p == 0 the current implementation always calls
realloc. If you do not call mpc_realloc(i, p, 0) to free memory then calling mpc_free in case of n == 0 is not necessary.

Extract from man page of void *realloc(void *ptr, size_t size);

If ptr is NULL, then the call is equivalent to malloc(size), ...
if size is equal to zero, and ptr is not NULL, then the call is equivalent to free(ptr).

from mpc.

je-so commented on August 19, 2024

I wrote another raw test program which builds an AST.
This version is not fully functional it could only parse simple prefix operators and an integer value.
++ + - ! ~
... (above line repeats 38692 times)
++ + - ! ~ 123

See https://github.com/je-so/testcode/blob/master/parser/context_oriented_parser.c

The time needed to parse 464331 bytes of input is less than 30 milliseconds.
The implementation uses its own memory management. And avoids copying
of strings which is a real time saver!

from mpc.

je-so commented on August 19, 2024

Now https://github.com/je-so/testcode/blob/master/parser/context_oriented_parser.c supports right and left associativity and also the ternary operator '?:'.

It should be quite possible (for simple expressions) to use this type of parsing algorithm (needs more parameterisation of course) as engine behind a front end like mpc.
If and only if the front end would allow to support simplified (restricted LL(1)) grammar expressions.
The speed gain of 50 - 100 times should be worth the effort.

Maybe some sort of AST defining syntax with a simple pre compiler to generate
all the needed structs into a single ast.h header file.

from mpc.

orangeduck commented on August 19, 2024

While there are still lots of performance improvements to be made in mpc, I think it still does a lot more than your demo so 50-100 times sounds quite optimistic. For example now the main performance bottleneck in mpc is error message generation. This is tricky to get right for non LL(1) grammars and also difficult to make the code work in a performant way for LL(1) grammars without becoming a huge mess.

mpc supports various different inputs such as reading from file or a pipe, which complicates the process of reading characters. By design it has to allocate memory for each read and can't just splice up some input. It also has to allocate new memory when merging strings. All the parsing structures are bigger because of all the possible parser types and also the parsing function is much more complex.

All these small changes add up quite significantly. For example if I add a couple more variables into to main parsing function it can increase the runtime by almost half. This is just because it is forced to spill a few more variables into memory and because so much time is spent in this function.

The only real way to get this performance would be to call out to a completely different parsing routine under some conditions.

Anyway - thanks for working on the demo. I'm quite busy right now but I was working on it a bit more and I'll try to finish off that work later.

from mpc.

je-so commented on August 19, 2024

I've done another experiment.

I've compared cop with another version of cop which calls (*node)->mem = malloc(10) for every created node of type expr_t.

Parsing a file with 120744 lines with every line containing + 10 + 200 * 3 * 500 + 20 has given the following results:
Time needed normal: 213 millisec (flag -g), 72 millisec (flag -O3)
Time needed with malloc: 336 millisec (flag -g), 170 millisec (flag -O3)

A single malloc call for every created node adds 50% overhead for the unoptimized version and
more than 130% overhead for the optimized version. The test created 1207440 nodes.

from mpc.

Parsing is extremely slow about mpc HOT 21 OPEN

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent