elzawawy / compiler-frontend Goto Github PK

First two front-end phases of a language-independent compiler implemented in C++

License: MIT License

CMake 0.05% C++ 99.89% C 0.06%

compiler compiler-frontend compiler-design compilers compiler-theory frontend-phases

compiler-frontend's Introduction

Language-Independent Compiler Frontend

Frontend phases of a language-independent compiler, that takes in language specifications, source code, and emits a parse tree for the code, and any errors found. This is a project developed for the PLT (Programming Language Translation) course at Faculty of Engineering, Alexandria University in Spring 2020 Offering.

Project Objectives And Goals Achieved

Be able to build a compiler for a (simplified) programming language.
Study the theory behind compilers and langugaes translation.
Know how to use compiler construction tools, such as generators of scanners and parsers.
Practice how to work on a large software project: using Agile Project managment (Check the Github Projects on Repository) and clean git branching/PR model.
Experience using C++ heavily where the compiler phases is fully developed in C++ language.

Project Overview

We built the first two phases of the compiler frontend (Follow links):
- Lexical Analysis Generator
- Syntatic Analysis Generator

In each of the directories, is the source code of each module separted with it's unit test cases and documentation. Refer to each README inside the directories to get to know each phase well and how to use it. Both phases are run in one pass only, to avoid multiple and wasted computations.

The input to the program is three files:

Language Grammar Specifiction.
Language Tokens/Terminals Specification.
Source code (follows the above two files).

The output of the program is as follows:

A representation of the parse the of the source code.
Tokens recognized in the source code.
Any errors found in the source code.

Made with ❤️

compiler-frontend's People

Contributors

Stargazers

Watchers

Forkers

prakritishrivastava oswidan97

compiler-frontend's Issues

[PRSGEN-4] Eliminate Left Factoring

Goal: Automatically eliminating grammar left factoring before generating the parser.
Reference: Slides 23 through 28 in Lecture 6

Add Running Executable For Compiler

The compiler currently executes using the PRSGEN executable, let's add an executable for both components and make it easier to run it from outside.
Provide a command line (one-line) for running/testing the compiler instead from running from the IDE.
Add notes for Running/Testing in README.

[LEXGEN-5] Build LexDriver Class

Component 4: LexDriver

It is the lexical analyzer program that simulates the resulting DFA machine.
that simulates the resulting DFA machine.

[LEXGEN-1] Build Language Parser

The Language Parser Class

In other words, it is a file reader, that parses lexems specification.
Takes an input file with format specified in requirements document.
Emits an output with (Table of Inputs - List of Regular Expressions).

Token

It is a pair consisting of a token name (class) and an optional attribute value. The token names are the input symbols that the parser processes.

Pattern

It is a description of the form that the lexemes of a token may take.

Lexeme

It is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token.

Some notes:

One token for each keyword. The pattern for a keyword is the same as the keyword itself.
Tokens for the operators, either individually or in classes.
One token representing all identifiers.
One or more tokens representing constants, such as numbers and literal strings.
Tokens for each punctuation symbol, such as left and right parentheses,comma, and semicolon.

Alphabet

It is any finite set of symbols. The set {0,1} is the binary alphabet.

String

A string over an alphabet is a finite sequence of symbols drawn from that alphabet. The empty string, denoted e, is the string of length zero.

Language

It is any countable set of strings over some fixed alphabet.
In lexical analysis, the most important operations on languages are Union, Concatenation, Kleene Closure, and Positive Closure

RE (Regular Expressions)

A convenient notation for specifying lexeme patterns.
While they cannot express all possible patterns, they are very effective in specifying those types of patterns that we actually need for tokens.
Regular expressions notation has come into common use for describing all the languages that can be built from language operators applied to the symbols of some alphabet.
Each regular expression r denotes a language L(r), which is also defined recursively from the languages denoted by r's sub-expressions.
Regular expressions often contain unnecessary pairs of parentheses. We may drop certain pairs of parentheses if we adopt the conventions that:

The unary operator * has highest precedence and is left associative.
Concatenation has second highest precedence and is left associative.
Union | has lowest precedence and is left associative.
We may replace the regular expression(a)|((b)*(c)) by a|b*c.

A language that can be defined by a regular expression is called a regular set. If two regular expressions r and s denote the same regular set, we say they are equivalent and write r = s.

RD (Regular Definitions)

We may wish to give names to certain regular expressions and use those names in subsequent expressions, as if the names were themselves symbols.

[PRSGEN-9] Write Parsing Table

Closed by PRSGEN-7 #29

[LEXGEN-7] Write Transition Table

[LEXGEN-2] Build NFA-GEN Class

Component 2: NFA-GEN

This component is required to construct a non-deterministic finite automata (NFA) for the given regular expressions, combine these NFAs together with a new starting state to produce one final NFA.

[LEXGEN-4] Build MinDFA Class

[LEXGEN-9] Handle Null Token

[LEXGEN-8] Write Flex Tutorial

Flex is a tool for generating scanners: programs which recognize lexical patterns in text.
As a bonus task, we would like to add a tutorial of usage of Flex to automatically generate a lexical analyzer for the given regular expressions.
The tutorial should include a detailed description of the required steps together with screenshots for using the tool.

Handle Testing Files Properly

There are multiple unit/integration testing files in the repository. Let's find a better way to organize them.
Add the input file auto-generated file to .gitignore if it's necessary.

[PRSGEN-6] Compute FOLLOW

Goal:

ParsingTableGenerator should compute First and Follow sets for each Non Terminal in grammar.
Uses them to construct a predictive parsing table for the grammar.
The table is to be used to drive a predictive top-down parser.

This task is concerned with computing First for each non terminal.
Reference: Slides 28 through 35 from Lecture 7

[PRSGEN-5] Compute FIRST

Goal:

ParsingTableGenerator should compute First and Follow sets for each Non Terminal in grammar.
Uses them to construct a predictive parsing table for the grammar.
The table is to be used to drive a predictive top-down parser.

This task is concerned with computing First for each non terminal.
Reference: Slides 28 through 35 from Lecture 7

[PRSGEN-7] Construct Parsing Table

Goal:

ParsingTableGenerator should compute First and Follow sets for each Non Terminal in grammar.
Uses them to construct a predictive parsing table for the grammar.
The table is to be used to drive a predictive top-down parser.

This task is concerned with constructing the parsing table from follow and first sets.
Reference: Slides 28 through 35 from Lecture 7

Also we consider the Error Recovery Panic Mode:
For each nonterminal A, mark the entries M[A,a] as synch if a is in the follow set of A. Reference: Slides 41 through 46 from Lecture 7

[LEXGEN-6] Write A README

Write a README for the LEXGEN Project based on the steps conducted for the implementation.
Make sure that README contains at least the following:
- A description of the used data structures.
- Explanation of all algorithms and techniques used
- The resultant transition table for the minimalDFA.
- The resultant stream of tokens for the example test program.
- Any assumptions made and their justification.

[PRSGEN-10] Create README

[LEXGEN-3] Build DFA-GEN Class

Component 3: DFAGen

That component convert the resultingNFA to a DFA.

[PRSGEN-3] Eliminate Left Recursion

Goal: Automatically eliminating grammar left recursion before generating the parser.
Reference: Slides 16 through 22 in Lecture 6

[PRSGEN-2] Parse Grammar File

The parser generator expects an LL(1) grammar (CFG) as input.

CFG Input File Format

CFG input file is a text file.
Production rules are lines in the form LHS ::= RHS
Production rule can be expanded over many lines.
Terminal symbols are enclosed in single quotes.
\L represents Lambda symbol.
The symbol | is used in RHS of production rules with the meaning discussed in class.
Any reserved symbol needed to be used within the language, is preceded by an escape
backslash character.

The GrammarParser Object exposes a parseFile method that takes in a file of the mentioned format and fills the data-structures to be used throughout this project.

List of Non-terminals.
List of Terminals.

Required in this task is to implement this method.

[PRSGEN-8] Build Non-Recursive Predictive Parser

The Parsing table is to be used to drive a predictive top-down parser.
If the input grammar is not LL(1), an appropriate error message should be produced.

Goal: The generated parser is required to produce some representation of the leftmost derivation for a correct input. If an error is encountered, a panic-mode error recovery routine is to be called to print an error message and to resume parsing.

Take care of Output Format mentioned in PDF:
METHOD_BODY
STATEMENT_LIST
STATEMENT_LIST STATEMENT
STATEMENT_LIST STATEMENT STATEMENT
STATEMENT STATEMENT STATEMENT DECLARATION
STATEMENT STATEMENT PRIMITIVE_TYPE IDENTIFIER;
STATEMENT STATEMENT int IDENTIFIER; STATEMENT
STATEMENT
....to be continued

Task: Implement the Flowchart of the parser we discussed and the photo can be found in Media of Team Chat since no photos can be appended here.
**Reference: ** Slides 20 through 46 from Lecture 7

elzawawy / compiler-frontend Goto Github PK

compiler-frontend's Introduction

Language-Independent Compiler Frontend

Project Objectives And Goals Achieved

Project Overview

Made with ❤️

compiler-frontend's People

Contributors

Stargazers

Watchers

Forkers

compiler-frontend's Issues

Component 4: LexDriver

The Language Parser Class

Token

Pattern

Lexeme

Some notes:

Alphabet

String

Language

RE (Regular Expressions)

RD (Regular Definitions)

Component 2: NFA-GEN

Component 3: DFAGen

CFG Input File Format

Recommend Projects

Recommend Topics

Recommend Org