iucompilercourse / essentials-of-compilation Goto Github PK
View Code? Open in Web Editor NEWA book about compiling Racket and Python to x86-64 assembly
A book about compiling Racket and Python to x86-64 assembly
Section 4.8 uses the term "control flow graph", when really what we are building is a map from labels to basic blocks, which makes it confusing. Furthermore, the actual CFG is built in the register allocation section 4.10
Wouldn't label->block
be a better phrasing here?
It would be nice to include information on running the passes in chapter 5. For example, I had to figure out the way to run expose allocation is to use
("expose allocation" ,expose-allocation ,interp-Rvec-prime ,type-check-Rvec)
by looking around the codebase.
Biber warning: [609] Utils.pm:395> WARN - legacy month field 'October' in entry 'Matz:2013aa' is not an integer - this will probably not sort properly
The variable names in different passes for the running example are different in chapter 5 (e.g. what is called collectret7974
in output of the expose allocation pass is called collectret9023
in the output of the select instruction pass). This makes it hard to follow what is going on, using consistent variable names would be better.
Hello! I would first like to thank you for making such a wonderful resource available for free.
While I was going through the code, I noticed that the contract for the Let struct restricts the variable binding to any arbitrary symbol instead of just Var
structs. Ins't the latter choice more appropriate here?
applies to Vectors/Register Allcoation
The grammar figures in the book don't currently explicitly mention side conditions like having a start
label, or the characters allowable in identifiers (or even what is allowed in the "info" field). It would be good if each figure was a self-contained and detailed description of the grammar, including some of these extra constraints and definitions.
Latex translates ` and ' into curved open and close single quotes. It took me a while to figure out in the code listings that the open single quotes were backticks. While I'm not sure if it covers all the cases, adding these two lines to the included packages in book.tex should make most of the backticks look like backticks, and apostrophes like apostrophes:
\usepackage{upquote}
\usepackage[straightquotes]{newtxtt}
Here, it is not mentioned that TailJmp
also reads from its arg
field (the register containing the function address), in addition to callee saved registers.
This isn't going to hold on all compilers, OS's, architectures (or even most).
Students ran into this with some difficult debugging:
Solution: just shift the darn procedure pointers when doing R7.
based on Nov. 12 2020 lecture notes
Currently we have the following command
clean:
$(LATEXMK) -C book
rm -f book.log book.aux book.bbl book.lof book.out book.toc book.blg book.pdf book.ilg book.ind book.lot book.run.xml book.bcf book.fls book.fdb_latexmk authors.idx authors.ilg authors.ind subject.idx subject.ilg subject.ind
which is massive.
Is it OK with you if we could reduce it to
clean:
$(LATEXMK) -C book
$(RM) \
*.idx *.ilg *.ind *.aux *.bbl *.bcf \
*.blg *.fdb_latexmk *.fls *.lof *.log \
*.lot *.out *.run.xml *.toc book.pdf
which looks cleaner and much more tight?
Consider the following example in R_{Var}
(+ 1 (+ 2 (+ 3 4)))
From the description of rco-exp
and rco-atom
, it appears that the book wanted the following translation to R_{Var}^{ANF}
:
;; Code 1
(let ([tmp.1 (+ 3 4)])
(let ([tmp.2 (+ 2 tmp.1)])
(+ 1 tmp.2)))
However, there is another way to do this:
;; Code 2
(let ([tmp.1
(let ([tmp.2 (+ 3 4)])
(+ 2 tmp.2))])
(+ 1 tmp.1))
which I believe is also a valid R_{Var}^{ANF}
program. But I feel Code 2
is harder to evaluate, which would need a stack, while Code 1
can be evaluated without a stack.
Should we restrict the R_{Var}^{ANF}
such that Code 1
is valid but Code 2
is invalid, for example, by not allowing e
in (Let x e body)
to contain any Let
expression?
This makes it mysterious that the compiler will produce rbp-offsets like "-40" for the first local variable, when the student is expecting a smaller offset (not taking callee-saves into account).
after register allocation, but before patching, leaq should handle Deref in the target, but currently does not.
make
command defaults to Python on my system which currently misses chapters 9 to 12.
How do I tell it to build the Racket version which is complete?
The description for the shrink pass in chapter 6 does not mention the relationship between Exp and Exp'. I had to watch the lecture video to figure out that Exp' = (shrink-exp Exp), would be nice to have it in the book itself.
Here, the book does not say what lhs'
is supposed to be.
the point is to clarify the addition of variables to x86* with respect to x86
This was a sticking point in class this year.
Some questions came up last week in class about chapter 6 and figure 6.4. The figure shows "rsp" as storing the base address for the local m
in the callee, but it says rbp
for local 1
. That seems like a discrepancy.
In the functions.rkt
reference implementation, there are comments about the stack-arg
form possibly being obsolete. The stack-arg
references relative to rsp
, whereas the code generated in the preamble of a function def references rbp
.
It currently using listings, which is also my goto.
But is there a good way to make sure that the code is syntactically correct / runnable?
At the beginning of section 5.7 it is mentioned that compiling void must be discussed. However the section never discusses it. The generated code for the running example, however, has movq $0
s for the void assignments.
Hello.
When compiling the head of master as of 2021-05-11 on an XUbuntu 20.04 x86_64 LTS machine with a texlive-full installed, I get:
! Use of \\titlecap doesn't match its definition.
\update@series@target@value #1->\def \reserved@a {
#1}\ifx \target@meta@famil...
l.402 ...tract Syntax Trees and Racket Structures}
Thank you!
Regarding the typo "...we assignment of each..." in the description of the assign-homes compiler pass shown below.
Essentials-of-Compilation/book.tex
Lines 1603 to 1605 in 9c77ff6
I wasn't sure what phrasing might be best here, but I was thinking "...we map/translate the assignment of each variable..."
The book references this file in several places but it's not mentioned where to find this file.
The grammar for Rvec's ANF is missing the cases for vector-ref
and vector-set!
primitives.
In one part of Chapter 3, it states:
The logical operations
not
andand
behave as you might expect, but note that theand
operation is short-circuiting. That is, the second expressione2
is not evaluated ife1
evaluates to#f
.
But in another part of Chapter 3:
Recall that the
and
operator of C1 does not perform short circuiting, but evaluates both arguments unconditionally.
Which is right?
Hi,
I'm a bit confused about why the second requirement is there for the register allocator("if a vector-typed variable is live during a call to the collector, it must be spilled to ensure it is visible to the collector"). Doesn't requirement 1 already ensure that vector-typed variables are always spilled to the root stack?
I'm following the book, and in section 2.5, Remove Complex Operands, the example for rco-atom
is to transform (- 10)
to tmp.1
:
I was confused as (- 10)
is already atomic (it's a (Prim - (Int 10))
). Then I went to check the lecture notes, there the example is actually:
(+ (+ 42 10) (- 10))
=>
(let ([tmp.1 (+ 42 10)])
(let ([tmp.2 (- 10)])
(+ tmp.1 tmp.2)))
which makes sense as (- 10)
is the operand here hence needs to be transformed to atomic.
Perhaps we can use the same example in the book text?
An suggestion: racket version or python version move to a branch (like apt-ocaml ?)
Why do we use S-expressions for syntax? It has a certain convenience, but the students pay for it in any number of small ways. How many times do they end up with "extra parentheses", because non-type-safe operations are applied to expressions in list form?
If anything, struct-based pattern matching is actually cleaner in racket than quasiquoted list patterns:
#lang racket
(struct foo (a b))
(match (foo 1 2)
[(foo a b) a])
Struct based ASTs would be a major change to the book, so it shouldn't be undertaken lightly. As a stop-gap I have recommended that students make more defensive helper functions like "append two lists of instructions" that use a contract to (deeply) check the validity of their inputs.
Presently, list-based S-expressions combine with the slightly underspecified grammars (#29) to create deviations between the student compilers and the textbook (more importantly, the expectations of the testing-framework / interpreters).
Aside from the cost of the change itself:
In section 4.11 patch instructions it is suggested to use a temporary mov
into rax
when the second argument to a cmpq
is an immediate. However, it is possible that rax
is live at this point and this mov
will overwrite the value in rax
.
One possible solution is to do this temporary mov
during the select instructions pass, so that the liveness analysis pass (which happens after select instructions, but before patch instructions) takes care of such situations.
The original text said
(movq (int 10) (var tmp.1))
(negq (var tmp.1))
(movq (var tmp.1) (var tmp.2))
(addq (int 52) (var tmp.2))
(movq (var tmp.2) (reg rax)))
The variable tmp.1 is assigned to stack location -8(%rbp), and tmp.2 is
assign to -16(%rbp), so the assign-homes pass translates the above to
(movq (int 10) (deref rbp -16))
(negq (deref rbp -16))
(movq (deref rbp -16) (deref rbp -8))
(addq (int 52) (deref rbp -8))
(movq (deref rbp -8) (reg rax)))
But shouldn't it be
(movq (int 10) (deref rbp -8))
(negq (deref rbp -8))
(movq (deref rbp -8) (deref rbp -16))
(addq (int 52) (deref rbp -16))
(movq (deref rbp -16) (reg rax)))
?
The outputted ASTS do not match the starting input program. It seems that there might have been a mix up, since the outputted ASTS are found earlier in the chapter.
The phrase "garbage collection" is repeated twice here https://github.com/IUCompilerCourse/Essentials-of-Compilation/blob/master/book.tex#L6123
The text mentions that the expose-allocation
pass comes before flatten
.
Pr #26 added an int alternation to the exp non-terminal, I don’t think it is correct.
I think I have a solution that doesn’t require the int alternation in the exp non-terminal.
The idea is that we flatten the expression tree into an addition chain. Then partition the terms in the chain into two groups:
(read)
s and (- (read))
s.Each group could be empty. Then we compute the sum of the first group, and add the second group to the result.
The exp non-terminal corresponds to the second group in my solution, which doesn’t need int alternation since they are all in the first group.
Is there something I missed? Is the int alternation really necessary?
This causes additional confusion with the students, and it really has no benefit.
IMHO dotted pairs should only be used if the cdr
is non-list data (and really not even then because there's always a better data structure). One major problem is that we currently lose read-write read/write invariance for ASTs:
> '((start . (seq _ (return _))))
'((start seq _ (return _)))
Aside from being ugly, "(start seq _
" requires the students understand in detail the source, memory, and printed representations of cons-lists, and distinctions between them.
Eliminating dotted pairs would be a relatively small change (in contrast w the larger overhaul proposed by #30) would be to just eliminate dotted pairs and use proper lists.
I am on Debian testing 64-bit and got the following error while running make
:
Running 'makeindex -o "book.ind" "book.idx"'
------------
This is makeindex, version 2.15 [TeX Live 2022/dev] (kpathsea + Thai support).
Scanning input file book.idx...done (0 entries accepted, 0 rejected).
Nothing written in book.ind.
Transcript written in book.ilg.
Latexmk: applying rule 'biber book'...
Rule 'biber book': File changes, etc:
Changed files, or newly in use since previous run(s):
'book.bcf'
------------
Run number 1 of rule 'biber book'
------------
------------
Running 'biber "book.bcf"'
------------
DateTime.c: loadable library and perl binaries are mismatched (got handshake key 0xce00080, needed 0xed00080)
Latexmk: Summary of warnings from last run of *latex:
Latex failed to resolve 324 reference(s)
Latex failed to resolve 149 citation(s)
Latexmk: Errors, so I did not complete making targets
Collected error summary (may duplicate other messages):
biber book: Could not open biber log file for 'book'
Latexmk: Use the -f option to force complete processing,
unless error was exceeding maximum runs, or warnings treated as errors.
make: *** [Makefile:7: all] Error 12
biber
version is 2.16-1
and texlive
version is 2021.20210921-1
.
Which versions do I have to have installed in order to compile the book successfully?
P.S.: Three generated files are not included in .gitignore
thus polluting the locally cloned repository:
$ git status
On branch master
Your branch is up to date with 'origin/master'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
book.fdb_latexmk
book.fls
book.pdf
nothing added to commit but untracked files present (use "git add" to track)
It is out of date relative to what the compiler produces. And it has some elements that are confusing people.
E.g. where did movq %rbx, 0(%r15)
come from?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.