Git Product home page Git Product logo

compiler-construction's Introduction

Harpinder Jot Singh

I am Harpinder, working at DevRev almost since the start of the company. I have experience in building products 0->1, am a generalist by nature, and have worked on multiple projects here at DevRev including the Snap-in platform, and ML/LLMOps solution for our AI needs.

I did my B.tech in Computer Science from BITS Pilani.

Recently, I have started diving into LLMs era, and tinker around with them in my free time.

Let's talk!

Harpinder | Gmail Harpinder | LinkedIn

compiler-construction's People

Contributors

adhyay2000 avatar lakshmiteja17 avatar singhcoder avatar vismit2000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

compiler-construction's Issues

Testing required

Please write test cases and verify correctness for following

  1. Symbol table scoping works correctly
    Selecting 7 (and 8 also) from menu while running program generates symbol table
    Interpret output as :
  • When a new scope is opened
    • It prints New Scope opened at
      • (non-terminal is the one involved in AST which opens up a scope)
  • Whenever it encounters a declaration of a variable,
    • Declaration statement
    • For loop opening {right now it is assumed that whenever a FORLOOP is encountered, the index variable involved is declared on the fly so it's added to symbol table as type int}
    • Input/output parameters of a function definition
  • It prints "Inserting in hashtable '' "
  • When the id is accessed and it exists in symbol table, it prints "Received entry for '' , type_name = ARRAY|BOOLEAN|REAL|INTEGER "
    • If array, it also prints range and primitive type
      • if the range contains an identifier, 0x3f3f3f3f is printed at that place
  • When a function definition is encountered
    • Inserting in hashtable <fun_id>
    • Received entry for , type_name = MODULE
    • is_declared : 1(true)
    • Input_types_list : [<num_of_inp_args>] - arg1 type | arg2 type | ... | argn type
    • Output_types_list : same as above
  1. Type checker module implementation

Whenever it encounters an assignment statement, it prints :

  • "id" = .. type checking

  • LHS_TYPE : REAL | BOOLEAN | ARRAY | INTEGER | TYPE_ERROR

  • RHS_TYPE : same as above

  • If types match

    • =======Type matches======== is printed
  • Else

    • SEMANTIC ERROR: lhs and rhs types don't match
  • If an identifier is not defined,

    • SEMANTIC ERROR: ID not defined is printed
  • If array is not dynamic,

    • and index is out of bounds, it prints,
      • SEMANTIC ERROR: INDEX OUT OF BOUNDS
    • if index is not int type
      • SEMANTIC ERROR: INVALID ARRAY ACCESS
  • Description of type checking rules here

Type Extractor and Checker:
Type of an identifier is extracted from the declaration statement that declares the identifier. The data types supported in the language you are implementing are: integer, real, boolean and array. The type of an element of an array is the type of the data the array refers to. For instance, consider the statement declare a: array[1..15] of integer; the type of an element a[5], say is integer. The type checker verifies the type of an expression appearing at the right hand side of the assignment statement such as value := (a+bโ€c ) and checks if it matches with that of the identifier on the left hand side. An arithmetic operator can have two operands of the similar type, where types can be integer and real data types. Example : let a,b, c and d be declared as integer then

An expression in a statement a := (b2โ€4c)+5*d; its RHS has a type integer and since the identifier a is also of the same type, the type checker approves the expression assignment

An expression in a statement a:= (b2 +c)/d <= c10; has a boolean valued expression which is assigned to the real valued identifier, hence the expression value assignment is wrong.

The Static type checking rules are:

  1. The type of an identifier is the type appearing while declaring the variable.

  2. The type of NUM is integer.

  3. The type of RNUM is real.

  4. The type of true or false value is boolean.

  5. The type of an array variable A of type array [12..20] of real (say) is defined as an expression <real, 12, 20>. The type of an array element A[13] (say) is real if 13 is within the bounds [12, 20]

  6. The type of a simple expression (say E) of the form expression(say E1) Expression(say E2)

    is integer, if both expressions are of type integer and the operator is arithmetic operator.
    is real, if both the expressions are of type real and the operator is arithmetic operator.
    is boolean, if both expressions are of type integer and the operator is a relational operator.
    is boolean, if both expressions are of type real and the operator is relational.
    is boolean, if both expressions are of type boolean and the operator is logical.

The type of the expression is ERROR, if the above rules do not derive the type of E appropriately.

Type checking rules for array construct are as follows:

  1. The operations +, -, *, / and all relational and logic operators cannot be applied on array variables of type <type, 12, 20> (say)

  2. The assignment operator applied to two array variables of the same type is valid. For example, if A and B are the array variables of type array[12..20] of real, then A:= B; is a valid statement. This applies to dynamic arrays of type array[a..b] of real as well.

  3. Consider array elements with index represented by integer identifier say A[k]. Here type checking of variable k is done at compile time. If the type of k is integer, then it is valid else it is reported as an error. Also, the type checking of A[13], for type of index (NUM), is done at compile time.

  4. The bound checking of A[13] where A is a static array is done at compile time. If A is a dynamic array, then bound checking of A[13] is done at run time [see below]

  5. The type of an identifier or an expression is computed by traversing the abstract syntax tree.

Things left in semantic analyzer module

Currently, Single AST pass has been implemented for semantic rules checking

  • Second pass is required to
    - check function calls validity where declaration existed but definition could not be verified due to inexistence

Storing of lexeme in heap

We are storing lexeme in heap here, which could create a problem when function and other invocation are done. String local variable has value in the heap which is not "reset" until the program terminates. Also when we change it to storing lexeme within token then we have to pass the information of lexeme appropriately.

Minor Implementation problems/bugs

  1. Array bounds, if of type a..b were kept as a constant OBTAIN_DYNAMICALLY while installing in symbol table to tell that it will be obtained dynamically.
    But it isn't working as expected, and gives some other integral values check it out
    Maybe keep a label is_dynamic to ensure it?

  2. While making hash_table to static instead of dynamic added a #define NUM_TABLE_ENTRIES but it's not needed due to presence of HASH_SIZE already, so remove it

  3. Currently each error printing again and again types code for printing colors, make a separate module print_error with following declaration

void print_error( char *error_type /*LEXICAL / SYNTAX / SEMANTIC */, char *error_msg);
  1. Right now, to get second/third child, it's hardcoded to as leftmost_child->sibling->sibling->...
    Define a function giving nth child

  2. Store number of children of a tree_node also in a node. Will help in generating codes later

  3. There is an unnecessary error_present variable present in parser.h/c which is not used anywhere later. Remove it

Some Final changes for creating AST

  • Added a non-terminal NT_ARRAY and a rule DATATYPE->NT_ARRAY to be able to distinguish that it's an array type
  • Removed OTHERCASE stmt and made CASESTMT rule right recursive so that we get a linked list of form value -> stmts -> value -> stmts.. rather than again a tree like structure
  • Changes MODULEREUSESTMT name to FUNCTIONCALLSTMT for sake of clearity

Offset and width calculations

  • Try to incorporate changes to the code base to implement offset and width calculations and store them in the symbol table
  • Offset calculations is required everytime an id is installed in symbol table
  • For width you can directly go inside function inserting an id in symbol table and check the token and store width
  • Give some #defines defining widths of types in bytes so that we can later change them if needed accordingly

You can find the changes done to incorporate symbol table in these two commits 9aef702 and 59f8140

Hashtable recursive search

Checkout functionality for searching for a lexeme in hash table(key_present_in_table),

  • Right now it searches only in current table and returns false if not found
  • Implement a recursive version so as to search in parents (outer scopes) if not found in current scope and return false only if not found anywhere.

Semantic Rules implementation

Implement these semantic rules for ERPLAG

  1. An identifier cannot be declared multiple times in the same scope.
    While installing in symbol table, make a check if already present, flag an error (make sure that while checking you only search current symbol table and not parent too)

  2. An identifier must be declared before its use.
    Whenever there is a usage and symbol table says no entry, flag an error

  3. The types and the number of parameters returned by a function must be the same as that of the parameters used in invoking the function.
    Check semantics of FUNCTIONCALLSTMT

  4. The parameters being returned by a function must be assigned a value. If a parameter does not get a value assigned within the function definition, it should be reported as an error.
    There must be a assignmentstmt for each output parameter type within function

  5. The function that does not return any value, must be invoked appropriately.
    Same as 3

  6. Function input parameters passed while invoking it should be of the same type as those used in the function definition.
    Semantics of FUNCTIONCALLSTMT

  7. A switch statement with an integer typed identifier associated with it, can have case statement with case keyword followed by an integer only and the case statements must be followed by a default statement.
    Semantics of SWITCH stmt

  8. A switch statement with an identifier of type real is not valid and an error should be reported.
    same as 7

  9. A switch statement with a boolean type identifier can have the case statements with labels true and false only. The switch statement then should not have a default statement.
    same as 7

  10. Function overloading is not allowed.
    Same as identifier multiple installations

  11. A function declaration for a function being used (say F1) by another (say F2) must precede the definition of the function using it(i.e. F2), only if the function definition of F1 does not precede the definition of F2.
    might require 2 passes of AST

  12. If the function definition of F1 precedes function definition of F2(the one which uses F1), then the function declaration of F1 is redundant and is not valid.

  13. A for statement must not redefine the variable that participates in the iterating over the range.

  14. The function cannot be invoked recursively.

  15. An identifier used beyond its scope must be viewed as undefined

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.