Comments (19)
I like the simplicity of that one, although I think it'd be better to stick closer to the R6RS specification of byte vectors: http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-3.html
I'd start by just implementing the functions needed by the compiler itself, which I think is basically just creating byte vectors and then bytevector-u8-ref
and bytevector-u8-set!
. Strictly speaking, we probably don't really even need bytevector-u8-ref
.
from schism.
I've been using %
generally to signify that it is a compiler intrinsic, not part of any Scheme standard, and frequently unsafe. Basically, things that users shouldn't be using directly, but rather things that the compiler uses to implement the user facing features. I'd probably do %time-start
in this case, although I'm fine with either way.
from schism.
Bootstrapping the compiler takes less than 10 seconds now, so I'd consider this fixed.
We'll want to keep an eye on this in case it gets out of control again, but I think we're good for now.
from schism.
I've been sad about compile time lately too. This is something that should be fixable in Schism. I have a few ideas.
I think one of the easier options that will help most is changing how Schism writes its output. Right now this happens byte by byte, by calling write-char
in the runtime over and over. This means we're paying the cost of transitioning from Wasm to JS for every single byte of output, which I expect is significant. A better option would be to add support for bytevectors and let Schism write out the whole bytevector at once, or at least write out larger chunks at a time.
The other thing I'd like to see is to have Schism start doing some optimizations. For example, inlining, constant folding and constant propagation will probably have a pretty strong effect.
from schism.
Is there a specification of byte vectors that you have in mind? Otherwise this API seems good to me: https://srfi.schemers.org/srfi-66/srfi-66-1.3.html
from schism.
I tested how long it takes to do the writing. I went from the first byte written until compilation was complete and then compared that to the total compilation time. This is what I got running a test:
write: 1.069ms
compile: 64749.019ms
stage1 succeeded
write: 0.913ms
compile: 67400.288ms
stage2 succeeded
I pushed the changes to a commit here: matthewp@7e91d22 if you want to verify I didn't do something wrong.
I then tested against a pet project using the stage0 snapshot:
write: 4.725ms
compile: 3063.516ms
So from this I take it that writing is not a significant part of the problem.
Side note, 3 seconds is not bad! So that's good news. Is there a reason run-schism.mjs uses the stage1 compiler? Depending on why that is I say we either make it use stage0 or provide a cli flag to use a specific stage. Happy to submit a PR for any of the above changes :)
from schism.
Thanks for gathering this data! I guess my theory that the time to write everything out was the problem is wrong. Good to know :)
I had run-schism.mjs default to stage1 because during development stage1 tends to support more features than stage0 (basically, anything that's been added since the last snapshot). I'm fine with changing the default though. The compiler is more fully featured now, and I tend to update the snapshot as soon as I add a feature. Having command line options would be awesome!
Do you think you'd be interested in breaking down the timing for the rest of the compiler passes? I'd suggest exposing some version of console.time
and console.timeEnd
to Schism as a runtime function, and then add calls to it around each of the compiler passes. Some more information about what's going on would be very helpful!
from schism.
Ok, I'll submit a PR later that changes the default to stage 0 and adds a --stage
flag.
Exactly, adding a time-start
and time-end
so we can break down the compiler passes was going to be the next thing I did here. By the way, should it be %time-start
? Does the % signify that it is side-effectual?
from schism.
Need to do some more investigation, but just as a start these numbers are for compiling the stage 2 compiler:
parse-library: 13252.056ms
convert-closures: 24765.116ms
functions->names: 8.949ms
functions->types: 26.728ms
functions->type-ids: 34.074ms
compile-functions: 14048.685ms
compile-library->module-package: 52260.581ms
from schism.
Here's some slightly better numbers:
parse-library: 14168.960ms
convert-closures: 26090.513ms
functions->names: 9.552ms
functions->types: 25.474ms
functions->type-ids: 33.270ms
compile-functions: 15084.660ms
build-exports: 9.574ms
gather-imports: 10.181ms
compile-library->module-package: 55539.606ms
Probably not a surprise that parsing, creating closures, and compiling are the most expensive parts.
@eholk Anything particular interesting from this data? Am I timing the right things? Any of these you'd be interested in my drilling down and getting numbers on?
from schism.
Awesome! Thanks for getting these numbers. It's nice to see that some of the simpler passes are in fact quite cheap to run.
I was thinking some more about the cause of the slowness, and I think a lot of it has to do with how symbols are compiled. Basically, if Schism sees 'foo
, this gets translated in place to (string->symbol "foo")
, which gets further translated to (string->symbol (list->string '(#\f #\o #\o)))
, which then becomes (string->symbol (list->string (cons #\f (cons #\o (cons #\o '())))))
.
The most expensive part is the string->symbol
step, because it has an O(n) traversal of the symbol table. The worst part is that none of this is able to be cached or even recompiled. Every check of a tag throughout the compiler involves creating a brand new symbol from a string.
There's a couple of ways to address this.
The proper solution is to generate a data section with all the symbol data pre-computed and then replace 'foo
with the actual symbol pointer value. This is a fair amount of work though.
A shorter term solution, which also might help validate the hypothesis is to do a manual tagging system. Basically, instead of having a bunch of cond
clauses like (eq? tag 'if)
, we'd replace those with (eq? tag (if-tag))
, where if-tag
is somewhere defined like (define (if-tag) 0)
, and we have similar things like (define (call-tag) 1)
, and so on. I think this approach will be less maintainable in the long run, but it should work to validate the hypothesis that tag construction is taking the bulk of the time.
Actually, an even simpler option would be to track to cumulative time spent in string->symbol
.
from schism.
Thanks @eholk! I don't think I'm capable of implementing the proper solution at this time but happy to try the other approaches. I'll start by measuring the cumulative in string->symbol
and report back to you.
By the way, have you ever tried converting the compiler's wasm to wat format and inspect that? I looked at it briefly but wasn't able to see is that there are indeed a lot of calls to cons
, but I guess that is to be expected for any lisp language. Worth looking at if you have some time.
from schism.
I've spent a little time looking at the disassembly, but usually just to fix issues with generating correct Wasm bytes. I'm not surprised about the large number of calls to cons
, especially because the compiler inserts quite a few of its own.
I'd like to add a pass that does a combination of inlining, constant folding, constant propagation, and dead code elimination. I expect this will inline some of the calls to cons
, although it'll probably just move the work around. In some cases, it may be able to entirely remove an allocation.
from schism.
@eholk Can confirm your theory, see below:
parse-library: 13714.420ms
convert-closures: 25603.515ms
functions->names: 9.427ms
functions->types: 26.791ms
functions->type-ids: 35.433ms
compile-functions: 14860.114ms
build-exports: 9.707ms
gather-imports: 11.063ms
compile-library->module-package: 54382.075ms
[string->symbol] total: 52388.65443624556ms
So the majority of time spent is in string->symbol
. If you want to check my work: matthewp@e9bbfc7
from schism.
Awesome, this data is really useful!
I think the easiest thing in the short term is to stop comparing against quoted symbols and just use integers. That'll make the compiler faster until we have proper support for static data.
from schism.
About that idea, I admit that I don't quite get how it works. Are you saying to replace the quoted symbols in parse-expr and compile-expr with static integers? Anywhere else?
from schism.
Yeah, in parse-expr
, compile-expr
, and pretty much everywhere else.
Basically, before parse-expr
(I think all that's before parse-expr
is the reader, macro expander, and quote/quasiquote expander), we'd still use symbols, since we're processing human readable input. After parse-expr
, the idea is that program is converted to a form that's easier for the compiler to handle.
So for example, take the number line in parse-expr
. This would get rewritten from:
((number? expr) `(number ,expr))
to:
((number? expr) `(,(number-tag) ,expr))
number-tag
would be a function that just returns a number that's different from all the other tags, so something like:
(define (number-tag) 42)
Then anywhere we compare against 'number
, we'd instead compare against (number-tag)
. Take line 807 for example, in apply-representation-expr
:
((eq? tag 'number) `(ptr ,(bitwise-arithmetic-shift-left (cadr expr) (tag-size))))
This would get rewritten as:
((eq? tag (number-tag)) `(,(ptr-tag) ,(bitwise-arithmetic-shift-left (cadr expr) (tag-size))))
Does that make more sense? It's unfortunately a pretty big change, but it's mostly find-and-replace.
from schism.
Yeah that does make sense. I think I was more confused about where not to make the change because, as you said, before parse-expr
its human readable code.
I think I can try my hand at this. My only other question is, will there be a problem with making all of the changes at once? When adding functions you need to write the function before actually using it in the compiler (iirc, I tried this once), but inside of the parser I'm not sure... my brain can't figure out if I need to work on one function first and then do others or not.
from schism.
If you want to try doing this, that would be awesome!
Making the changes all at once shouldn't be a problem. In fact, you pretty much have to, or else you'd have to add a step to convert between the new representation and the old one for the places that haven't been updated. I wish it were possible to do this a little more piecemeal.
You shouldn't have to make any changes to the primitives list, and this shouldn't change the syntax the compiler supports, so staging shouldn't be a problem. Basically, any functions you add as top-level functions in compiler.ss
are immediately available to the compiler. With primitives (i.e. these functions), you have to wait for a new snapshot to be able to use those in the compiler.
from schism.
Related Issues (20)
- Do we want a SchismValue or similar class?
- Bootstrapping from guile fails: "Unbound variable: %file-exists?" HOT 6
- master playground doesn't work HOT 2
- equal? should handle cyclic structures
- bug related to tail-call-indirect
- Allow multiple tests per test file
- test/list-find.ss fails HOT 1
- Guile bootstrap seems to be broken again? HOT 3
- Add macro expander
- Try µKanren as a test case
- Add support for failure tests
- Add file input ports
- First class procedures HOT 5
- Add a command to compile a file to .wasm HOT 3
- Question: What is the difference between the schism-stageN.wasm HOT 4
- Pass options to Schism.Engine as a dictionary HOT 1
- Remove `list-all-eq?` HOT 6
- README.md review: What does Schism do? HOT 2
- Add test case for string `eq?` behavior HOT 5
- Add some kind of `include-data` macro HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from schism.