My experiments with compilers.
-
Indent parsing
Parses source into tree with scopes defined by indentation (python-like).
-
Grammar parsing
Build abstract syntax tree over the previous tree. It's a ``top-down'' parser. No any semantic analysis yet, the tree is 1 to 1 match the original program.
-
Semantic analysis
Find functions, loops, branches and other main building blocks.
-
Type inference
The compiler tries guess the types of variables.
-
Sanity check
Checks that, e.g., main() has correct arguments and so on.
-
Code generation
AST traversing and generation of ``llvm intermediate representation''.
-
Compilation Invoce llvm to build the program.
What do computer programs? They provide a sequency of transformatins to you data in way to get the meaningful output. The goal of a computer language is to support writing such transformations
- Be safe, compact and friendly
- Static typing
- ML-like syntax (inspired by LiveScript)
- Public/Protected/Private attributes of the classes
- Built-in regexp support
- Built-in shell commands invocation
- Function overloading
- Custom operators
- Garbage collection
- Will alarm on useless statements (like forget to call function)
- Substitute vars in strings: "Hello, {username}!"
- UTF8 strings
- Assigments in if-clause (but it should evaluate to bool <- safety measure)
- Support comments: shell-style # blah cpp // here is the comment C /* Hi! */
- All programs can be opened as libraries
- No header files needed, everything is in elf (possibly in compressed format).
- Keep it simple (to learn, to read, to extend)
- Error-resistant coding
- dead.py -- just launcher of all stuff
- peg.py -- PEG parser that allows to define grammar in a bnf-like way
- pratt.py -- Pratt parser, used to parse expressions
- tokenizer.py -- split input into tokens, uses PEG
- ast.py -- abstract syntax tree and rewrite tools
- codegen.py -- a small helper script to write correctly-indented code
The normal assumtion is that memory allocation will never fail. This is because most of programs anyway don't know how to deal with these errors. If a program must not silently fail there is a method to provisionally allocate required amount of memory.
Why static: Just today I found typing bugs in pypeg and modgrammar. I see typing problems almost every day in many programs and libraries!
Why methods instead of functions: Python's namespaces highly polluted with abs, len, sum, all, vars, min/max, next, list, id, to, dict, etc...
- Parser (definitions are from https://siod.svn.codeplex.com/svn/winsiod/pratt.scm, A simple Pratt-Parser for SIOD: 2-FEB-90, George Carrette, [email protected]): 1. NUD -- NUll left Denotation (op has nothing to its left (prefix)) 1. LED -- LEft Denotation (op has something to left (postfix or infix)) 1. LBP -- Left Binding Power (the stickiness to the left) 1. RBP -- Right Binding Power (the stickiness to the right)
- Brainduck (busy)
- Concrete mixer
- http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/
- http://effbot.org/zone/simple-top-down-parsing.htm
- http://roscidus.com/blog/blog/2013/06/20/replacing-python-round-2/#syntax
- http://en.wikipedia.org/wiki/Linear_type_system
property? -- boolean property
Functions should either return a value or raise exception.