A book about compiling Racket and Python to x86-64 assembly

TeX 99.55% Racket 0.30% Makefile 0.12% Nix 0.03%

essentials-of-compilation's People

Contributors

Stargazers

Watchers

Forkers

cgswords vikraman projanen cderici pruthvishetty mattwollf chubbymaggie rrnewton localchart tianji-zhang jeapostrophe abduld venantius samrat topzhangye neo4reo jellyr indieyourself cano159 vtheno jdreutt gordonmcshane crcc haskellstudio lambda7xx graydon cgag capfredf jnear jimbog qiuwei fuath aymanosman simple555a dvanhorn jbclements t0350 whoback trajafri efanzh thisisian ncihnegn yourushenzhu nvasilakis stjordanis chuanwen onelharrison wordhardqi marsmxm monkeymq beyonddream-productions cshonig arthurgleckler scottwloyd gisairo alinab xuejwei xc42 kaizhiyu zpzgone noahstorym sehriff fdotli weichen6613 mitchellmolnar arsdragonfly umatani atsuzaki kcaliban cc-uni-freiburg cs-course-lab nilslice zenspider saltedf ksromanov runningk virtualjpeg destinyson7 davisson2003 badtyro yudistrange collectrobot skogsbaer felixonmars cnguoyj temyurchenko rundong08 moningz hanke98 joshuacrotts ra223036m adwinwhite gr-qft zeta1999 guagua777 unshorn-forks jiningjlu drruisseau xdgg1005 startupinstitutechicago

essentials-of-compilation's Issues

Section 4.8, `label->block` rather than CFG?

Section 4.8 uses the term "control flow graph", when really what we are building is a map from labels to basic blocks, which makes it confusing. Furthermore, the actual CFG is built in the register allocation section 4.10

Wouldn't label->block be a better phrasing here?

Missing documentation on running passes in chapter 5

It would be nice to include information on running the passes in chapter 5. For example, I had to figure out the way to run expose allocation is to use

("expose allocation" ,expose-allocation ,interp-Rvec-prime ,type-check-Rvec)

by looking around the codebase.

Month fields in book.bib are using legacy format

Biber warning: [609] Utils.pm:395> WARN - legacy month field 'October' in entry 'Matz:2013aa' is not an integer - this will probably not sort properly

Inconsistent variable names across passes for running example in chapter 5

The variable names in different passes for the running example are different in chapter 5 (e.g. what is called collectret7974 in output of the expose allocation pass is called collectret9023 in the output of the select instruction pass). This makes it hard to follow what is going on, using consistent variable names would be better.

`Let` struct first field contract should be a `Var`

Hello! I would first like to thank you for making such a wonderful resource available for free.

While I was going through the code, I noticed that the contract for the Let struct restricts the variable binding to any arbitrary symbol instead of just Var structs. Ins't the latter choice more appropriate here?

add a flag to switch between printing AST's using AST notation versus concrete syntax

Give hint for color -> location mapping for spills and root spills

applies to Vectors/Register Allcoation

Document grammar "side conditions"

The grammar figures in the book don't currently explicitly mention side conditions like having a start label, or the characters allowable in identifiers (or even what is allowed in the "info" field). It would be good if each figure was a self-contained and detailed description of the grammar, including some of these extra constraints and definitions.

Backticks don't look like grave accent

Latex translates ` and ' into curved open and close single quotes. It took me a while to figure out in the code listings that the open single quotes were backticks. While I'm not sure if it covers all the cases, adding these two lines to the included packages in book.tex should make most of the backticks look like backticks, and apostrophes like apostrophes:
\usepackage{upquote}
\usepackage[straightquotes]{newtxtt}

Section 6.9.1, `TailJmp`'s read from `arg` not mentioned

Here, it is not mentioned that TailJmp also reads from its arg field (the register containing the function address), in addition to callee saved registers.

Add exercise numbers to later chapters!

Book seems to assume 8-byte aligned functions.

This isn't going to hold on all compilers, OS's, architectures (or even most).

Students ran into this with some difficult debugging:

Solution: just shift the darn procedure pointers when doing R7.

challenge: optimize closures

based on Nov. 12 2020 lecture notes

Suggestion for readable `make clean` command

Currently we have the following command

clean:
	$(LATEXMK) -C book
	rm -f book.log book.aux book.bbl book.lof book.out book.toc book.blg book.pdf book.ilg book.ind book.lot book.run.xml book.bcf book.fls book.fdb_latexmk authors.idx authors.ilg authors.ind subject.idx subject.ilg subject.ind

which is massive.

Is it OK with you if we could reduce it to

clean:
	$(LATEXMK) -C book
	$(RM) \
		*.idx *.ilg *.ind *.aux *.bbl *.bcf \
		*.blg *.fdb_latexmk *.fls *.lof *.log \
		*.lot *.out *.run.xml *.toc book.pdf

which looks cleaner and much more tight?

Is the `R_{Var}^{ANF}` language defined in Figure 2.12 too broad?

Consider the following example in R_{Var}

(+ 1 (+ 2 (+ 3 4)))

From the description of rco-exp and rco-atom, it appears that the book wanted the following translation to R_{Var}^{ANF}:

;; Code 1
(let ([tmp.1 (+ 3 4)])
  (let ([tmp.2 (+ 2 tmp.1)])
    (+ 1 tmp.2)))

However, there is another way to do this:

;; Code 2
(let ([tmp.1
       (let ([tmp.2 (+ 3 4)])
         (+ 2 tmp.2))])
  (+ 1 tmp.1))

which I believe is also a valid R_{Var}^{ANF} program. But I feel Code 2 is harder to evaluate, which would need a stack, while Code 1 can be evaluated without a stack.

Should we restrict the R_{Var}^{ANF} such that Code 1 is valid but Code 2 is invalid, for example, by not allowing e in (Let x e body) to contain any Let expression?

write chapter about gradual typing

Figure 6.4 does not show callee save registers

This makes it mysterious that the compiler will produce rbp-offsets like "-40" for the first local variable, when the student is expecting a smaller offset (not taking callee-saves into account).

fix leaq in interpreter

after register allocation, but before patching, leaq should handle Deref in the target, but currently does not.

How to build the Racket version

make command defaults to Python on my system which currently misses chapters 9 to 12.

How do I tell it to build the Racket version which is complete?

Chapter 6 shrink description incomplete

The description for the shrink pass in chapter 6 does not mention the relationship between Exp and Exp'. I had to watch the lecture video to figure out that Exp' = (shrink-exp Exp), would be nice to have it in the book itself.

Section 6.8 uses undefined `lhs'`

Here, the book does not say what lhs' is supposed to be.

add "further reading" sections to all the chapters

Add syntax definition of pseudo-x86 (x86*)

the point is to clarify the addition of variables to x86* with respect to x86

Closure conversion needs better explanation

This was a sticking point in class this year.

issues with rbp/rsp in chapter 6

Some questions came up last week in class about chapter 6 and figure 6.4. The figure shows "rsp" as storing the base address for the local m in the callee, but it says rbp for local 1. That seems like a discrepancy.

In the functions.rkt reference implementation, there are comments about the stack-arg form possibly being obsolete. The stack-arg references relative to rsp, whereas the code generated in the preamble of a function def references rbp.

Any method to run the code in the book?

It currently using listings, which is also my goto.

But is there a good way to make sure that the code is syntactically correct / runnable?

Section 5.7 missing information on compiling void

At the beginning of section 5.7 it is mentioned that compiling void must be discussed. However the section never discusses it. The generated code for the running example, however, has movq $0s for the void assignments.

Compilation error.

Hello.

When compiling the head of master as of 2021-05-11 on an XUbuntu 20.04 x86_64 LTS machine with a texlive-full installed, I get:

! Use of \\titlecap doesn't match its definition.
\update@series@target@value #1->\def \reserved@a {
                                                  #1}\ifx \target@meta@famil...
l.402 ...tract Syntax Trees and Racket Structures}

Thank you!

Typo in the description of assign-homes compiler pass

Regarding the typo "...we assignment of each..." in the description of the assign-homes compiler pass shown below.

Essentials-of-Compilation/book.tex

Lines 1603 to 1605 in 9c77ff6

 \item[Pass \key{assign-homes}] To handle the difference between the 

 variables in $R_1$ versus the registers and stack locations in x86, 

 we assignment of each variable to a register or stack location.

I wasn't sure what phrasing might be best here, but I was thinking "...we map/translate the assignment of each variable..."

write chapter about parametric polymorphism

Where can I find `interp.rkt`?

The book references this file in several places but it's not mentioned where to find this file.

Missing primitives in Rvec ANF grammar

The grammar for Rvec's ANF is missing the cases for vector-ref and vector-set! primitives.

Book is self-contradictory about whether (and e1 e2) short-circuits

In one part of Chapter 3, it states:

The logical operations not and and behave as you might expect, but note that the and operation is short-circuiting. That is, the second expression e2 is not evaluated if e1 evaluates to #f.

But in another part of Chapter 3:

Recall that the and operator of C1 does not perform short circuiting, but evaluates both arguments unconditionally.

Which is right?

Confusion about Sec. 5.3.4 (Register Allocation in R3)

Hi,
I'm a bit confused about why the second requirement is there for the register allocator("if a vector-typed variable is live during a call to the collector, it must be spilled to ensure it is visible to the collector"). Doesn't requirement 1 already ensure that vector-typed variables are always spilled to the root stack?

confusing example for `rco-atom` in section 2.5

I'm following the book, and in section 2.5, Remove Complex Operands, the example for rco-atom is to transform (- 10) to tmp.1:

I was confused as (- 10) is already atomic (it's a (Prim - (Int 10))). Then I went to check the lecture notes, there the example is actually:

(+ (+ 42 10) (- 10))
=>
(let ([tmp.1 (+ 42 10)])
  (let ([tmp.2 (- 10)])
    (+ tmp.1 tmp.2)))

which makes sense as (- 10) is the operand here hence needs to be transformed to atomic.

Perhaps we can use the same example in the book text?

The python version don't have public-student-support-code

An suggestion: racket version or python version move to a branch (like apt-ocaml ?)

Struct-based syntax?

Why do we use S-expressions for syntax? It has a certain convenience, but the students pay for it in any number of small ways. How many times do they end up with "extra parentheses", because non-type-safe operations are applied to expressions in list form?

If anything, struct-based pattern matching is actually cleaner in racket than quasiquoted list patterns:

#lang racket
(struct foo (a b))
(match (foo 1 2)
  [(foo a b) a])

Struct based ASTs would be a major change to the book, so it shouldn't be undertaken lightly. As a stop-gap I have recommended that students make more defensive helper functions like "append two lists of instructions" that use a contract to (deeply) check the validity of their inputs.
Presently, list-based S-expressions combine with the slightly underspecified grammars (#29) to create deviations between the student compilers and the textbook (more importantly, the expectations of the testing-framework / interpreters).

Disadvantages / Alternatives

Aside from the cost of the change itself:

Reading input programs from SExpressions would require a basic "parser" pass that convers sexp->structs. This is probably a good practice anyway, like the old "parse-scheme" pass.

Erroneous patch instructions

In section 4.11 patch instructions it is suggested to use a temporary mov into rax when the second argument to a cmpq is an immediate. However, it is possible that rax is live at this point and this mov will overwrite the value in rax.

One possible solution is to do this temporary mov during the select instructions pass, so that the liveness analysis pass (which happens after select instructions, but before patch instructions) takes care of such situations.

A possible error in Chapter 2.7

The original text said

(movq (int 10) (var tmp.1))
(negq (var tmp.1))
(movq (var tmp.1) (var tmp.2))
(addq (int 52) (var tmp.2))
(movq (var tmp.2) (reg rax)))

The variable tmp.1 is assigned to stack location -8(%rbp), and tmp.2 is
assign to -16(%rbp), so the assign-homes pass translates the above to

(movq (int 10) (deref rbp -16))
(negq (deref rbp -16))
(movq (deref rbp -16) (deref rbp -8))
(addq (int 52) (deref rbp -8))
(movq (deref rbp -8) (reg rax)))

But shouldn't it be

(movq (int 10) (deref rbp -8))
(negq (deref rbp -8))
(movq (deref rbp -8) (deref rbp -16))
(addq (int 52) (deref rbp -16))
(movq (deref rbp -16) (reg rax)))

Page 62 shows incorrect program transformation

The outputted ASTS do not match the starting input program. It seems that there might have been a mix up, since the outputted ASTS are found earlier in the chapter.

Typo

The phrase "garbage collection" is repeated twice here https://github.com/IUCompilerCourse/Essentials-of-Compilation/blob/master/book.tex#L6123

Fig 5.16: `expose-allocation` should come before `flatten`

The text mentions that the expose-allocation pass comes before flatten.

Is the grammar in exercise 1 correct?

Pr #26 added an int alternation to the exp non-terminal, I don’t think it is correct.

I think I have a solution that doesn’t require the int alternation in the exp non-terminal.

Click here to view my solution. (I am not sure that whether I should put my solution here. But how should I explain myself?)

The idea is that we flatten the expression tree into an addition chain. Then partition the terms in the chain into two groups:

The group that only contains ints.
The group that only contains (read)s and (- (read))s.

Each group could be empty. Then we compute the sum of the first group, and add the second group to the result.

The exp non-terminal corresponds to the second group in my solution, which doesn’t need int alternation since they are all in the first group.

Is there something I missed? Is the int alternation really necessary?

replace VectorOf with any-vector-ref, etc.

Eliminate dotted pairs from the grammars

This causes additional confusion with the students, and it really has no benefit.

IMHO dotted pairs should only be used if the cdr is non-list data (and really not even then because there's always a better data structure). One major problem is that we currently lose read-write read/write invariance for ASTs:

> '((start . (seq _ (return _))))
'((start seq _ (return _)))

Aside from being ugly, "(start seq _" requires the students understand in detail the source, memory, and printed representations of cons-lists, and distinctions between them.

Eliminating dotted pairs would be a relatively small change (in contrast w the larger overhaul proposed by #30) would be to just eliminate dotted pairs and use proper lists.

Cannot compile it due to an error with biber

I am on Debian testing 64-bit and got the following error while running make:

Running 'makeindex  -o "book.ind" "book.idx"'
------------
This is makeindex, version 2.15 [TeX Live 2022/dev] (kpathsea + Thai support).
Scanning input file book.idx...done (0 entries accepted, 0 rejected).
Nothing written in book.ind.
Transcript written in book.ilg.
Latexmk: applying rule 'biber book'...
Rule 'biber book': File changes, etc:
   Changed files, or newly in use since previous run(s):
      'book.bcf'
------------
Run number 1 of rule 'biber book'
------------
------------
Running 'biber  "book.bcf"'
------------
DateTime.c: loadable library and perl binaries are mismatched (got handshake key 0xce00080, needed 0xed00080)
Latexmk: Summary of warnings from last run of *latex:
  Latex failed to resolve 324 reference(s)
  Latex failed to resolve 149 citation(s)
Latexmk: Errors, so I did not complete making targets
Collected error summary (may duplicate other messages):
  biber book: Could not open biber log file for 'book'
Latexmk: Use the -f option to force complete processing,
 unless error was exceeding maximum runs, or warnings treated as errors.
make: *** [Makefile:7: all] Error 12

biber version is 2.16-1 and texlive version is 2021.20210921-1.

Which versions do I have to have installed in order to compile the book successfully?

P.S.: Three generated files are not included in .gitignore thus polluting the locally cloned repository:

$ git status
On branch master
Your branch is up to date with 'origin/master'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	book.fdb_latexmk
	book.fls
	book.pdf

nothing added to commit but untracked files present (use "git add" to track)

	\item[Pass \key{assign-homes}] To handle the difference between the
	variables in $R_1$ versus the registers and stack locations in x86,
	we assignment of each variable to a register or stack location.

iucompilercourse / essentials-of-compilation Goto Github PK