Git Product home page Git Product logo

compiler-frontend's Introduction

Language-Independent Compiler Frontend

Frontend phases of a language-independent compiler, that takes in language specifications, source code, and emits a parse tree for the code, and any errors found. This is a project developed for the PLT (Programming Language Translation) course at Faculty of Engineering, Alexandria University in Spring 2020 Offering.

Project Objectives And Goals Achieved

  • Be able to build a compiler for a (simplified) programming language.
  • Study the theory behind compilers and langugaes translation.
  • Know how to use compiler construction tools, such as generators of scanners and parsers.
  • Practice how to work on a large software project: using Agile Project managment (Check the Github Projects on Repository) and clean git branching/PR model.
  • Experience using C++ heavily where the compiler phases is fully developed in C++ language.

Project Overview

In each of the directories, is the source code of each module separted with it's unit test cases and documentation. Refer to each README inside the directories to get to know each phase well and how to use it. Both phases are run in one pass only, to avoid multiple and wasted computations.

The input to the program is three files:

  • Language Grammar Specifiction.
  • Language Tokens/Terminals Specification.
  • Source code (follows the above two files).

The output of the program is as follows:

  • A representation of the parse the of the source code.
  • Tokens recognized in the source code.
  • Any errors found in the source code.

Made with ❤️

compiler-frontend's People

Contributors

abdallahmaradny avatar abdod97 avatar mostafayousry avatar oswidan97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

compiler-frontend's Issues

Add Running Executable For Compiler

  • The compiler currently executes using the PRSGEN executable, let's add an executable for both components and make it easier to run it from outside.
  • Provide a command line (one-line) for running/testing the compiler instead from running from the IDE.
  • Add notes for Running/Testing in README.

[LEXGEN-5] Build LexDriver Class

Component 4: LexDriver

It is the lexical analyzer program that simulates the resulting DFA machine.
that simulates the resulting DFA machine.

[LEXGEN-1] Build Language Parser

The Language Parser Class

  • In other words, it is a file reader, that parses lexems specification.
  • Takes an input file with format specified in requirements document.
  • Emits an output with (Table of Inputs - List of Regular Expressions).

Token

It is a pair consisting of a token name (class) and an optional attribute value. The token names are the input symbols that the parser processes.

Pattern

It is a description of the form that the lexemes of a token may take.

Lexeme

It is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token.

Some notes:

  1. One token for each keyword. The pattern for a keyword is the same as the keyword itself.
  2. Tokens for the operators, either individually or in classes.
  3. One token representing all identifiers.
  4. One or more tokens representing constants, such as numbers and literal strings.
  5. Tokens for each punctuation symbol, such as left and right parentheses,comma, and semicolon.

Alphabet

It is any finite set of symbols. The set {0,1} is the binary alphabet.

String

A string over an alphabet is a finite sequence of symbols drawn from that alphabet. The empty string, denoted e, is the string of length zero.

Language

It is any countable set of strings over some fixed alphabet.
In lexical analysis, the most important operations on languages are Union, Concatenation, Kleene Closure, and Positive Closure

RE (Regular Expressions)

  • A convenient notation for specifying lexeme patterns.
  • While they cannot express all possible patterns, they are very effective in specifying those types of patterns that we actually need for tokens.
  • Regular expressions notation has come into common use for describing all the languages that can be built from language operators applied to the symbols of some alphabet.
  • Each regular expression r denotes a language L(r), which is also defined recursively from the languages denoted by r's sub-expressions.
  • Regular expressions often contain unnecessary pairs of parentheses. We may drop certain pairs of parentheses if we adopt the conventions that:
  • The unary operator * has highest precedence and is left associative.
  • Concatenation has second highest precedence and is left associative.
  • Union | has lowest precedence and is left associative.
  • We may replace the regular expression(a)|((b)*(c)) by a|b*c.
  • A language that can be defined by a regular expression is called a regular set. If two regular expressions r and s denote the same regular set, we say they are equivalent and write r = s.

RD (Regular Definitions)

  • We may wish to give names to certain regular expressions and use those names in subsequent expressions, as if the names were themselves symbols.

[LEXGEN-2] Build NFA-GEN Class

Component 2: NFA-GEN

This component is required to construct a non-deterministic finite automata (NFA) for the given regular expressions, combine these NFAs together with a new starting state to produce one final NFA.

[LEXGEN-8] Write Flex Tutorial

  • Flex is a tool for generating scanners: programs which recognize lexical patterns in text.
  • As a bonus task, we would like to add a tutorial of usage of Flex to automatically generate a lexical analyzer for the given regular expressions.
  • The tutorial should include a detailed description of the required steps together with screenshots for using the tool.

Handle Testing Files Properly

  • There are multiple unit/integration testing files in the repository. Let's find a better way to organize them.
  • Add the input file auto-generated file to .gitignore if it's necessary.

[PRSGEN-6] Compute FOLLOW

Goal:

  • ParsingTableGenerator should compute First and Follow sets for each Non Terminal in grammar.
  • Uses them to construct a predictive parsing table for the grammar.
  • The table is to be used to drive a predictive top-down parser.

This task is concerned with computing First for each non terminal.
Reference: Slides 28 through 35 from Lecture 7

[PRSGEN-5] Compute FIRST

Goal:

  • ParsingTableGenerator should compute First and Follow sets for each Non Terminal in grammar.
  • Uses them to construct a predictive parsing table for the grammar.
  • The table is to be used to drive a predictive top-down parser.

This task is concerned with computing First for each non terminal.
Reference: Slides 28 through 35 from Lecture 7

[PRSGEN-7] Construct Parsing Table

Goal:

  • ParsingTableGenerator should compute First and Follow sets for each Non Terminal in grammar.
  • Uses them to construct a predictive parsing table for the grammar.
  • The table is to be used to drive a predictive top-down parser.

This task is concerned with constructing the parsing table from follow and first sets.
Reference: Slides 28 through 35 from Lecture 7

Also we consider the Error Recovery Panic Mode:
For each nonterminal A, mark the entries M[A,a] as synch if a is in the follow set of A. Reference: Slides 41 through 46 from Lecture 7

[LEXGEN-6] Write A README

  • Write a README for the LEXGEN Project based on the steps conducted for the implementation.
  • Make sure that README contains at least the following:
    • A description of the used data structures.
    • Explanation of all algorithms and techniques used
    • The resultant transition table for the minimalDFA.
    • The resultant stream of tokens for the example test program.
    • Any assumptions made and their justification.

[PRSGEN-2] Parse Grammar File

The parser generator expects an LL(1) grammar (CFG) as input.

CFG Input File Format

  • CFG input file is a text file.
  • Production rules are lines in the form LHS ::= RHS
  • Production rule can be expanded over many lines.
  • Terminal symbols are enclosed in single quotes.
  • \L represents Lambda symbol.
  • The symbol | is used in RHS of production rules with the meaning discussed in class.
  • Any reserved symbol needed to be used within the language, is preceded by an escape
    backslash character.

The GrammarParser Object exposes a parseFile method that takes in a file of the mentioned format and fills the data-structures to be used throughout this project.

  • List of Non-terminals.
  • List of Terminals.

Required in this task is to implement this method.

[PRSGEN-8] Build Non-Recursive Predictive Parser

The Parsing table is to be used to drive a predictive top-down parser.
If the input grammar is not LL(1), an appropriate error message should be produced.

Goal: The generated parser is required to produce some representation of the leftmost derivation for a correct input. If an error is encountered, a panic-mode error recovery routine is to be called to print an error message and to resume parsing.

Take care of Output Format mentioned in PDF:
METHOD_BODY
STATEMENT_LIST
STATEMENT_LIST STATEMENT
STATEMENT_LIST STATEMENT STATEMENT
STATEMENT STATEMENT STATEMENT DECLARATION
STATEMENT STATEMENT PRIMITIVE_TYPE IDENTIFIER;
STATEMENT STATEMENT int IDENTIFIER; STATEMENT
STATEMENT
....to be continued

Task: Implement the Flowchart of the parser we discussed and the photo can be found in Media of Team Chat since no photos can be appended here.
**Reference: ** Slides 20 through 46 from Lecture 7

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.