Git Product home page Git Product logo

ifj_proj_2022's Introduction

Hi there 👋

Some stats for you 😉

Anurag's GitHub stats Anurag's GitHub stats

ifj_proj_2022's People

Contributors

defancet avatar lenamarochkina avatar nicksettler avatar pasynkovnikita avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

gmh5225

ifj_proj_2022's Issues

Fix lexical analyzer operator error

Lexical analyzer parses the following expression to the following table. Actually, such code must lead to lexical error. So it is needed to add more tests for different operator combinations with some failing tests. Example combinations: */, +*, -*-, *+ and so on.

$a = 2 */ 3;
type value
IDENTIFIER $a
ASSIGN =
INTEGER 2
MULTIPLY *
INTEGER 3

Add binary tree tests

Binary tree needs to be tested. The following tests are needed to create:

  • Insert item into tree
  • Delete item from tree
  • Find item in tree
  • Some advanced tests, which will be made upon multilevel tree and contain combination of tree operations (insert, delete, find)

Fix allowed letters in lexical analyzer in numbers parsing

Lexical analyzer implementation currently parses source string

5 - 3.4asd4;

as the following tokens

value type
5 INTEGER
- MINUS
3.4 FLOAT
asd4 IDENTIFIER
; SEMICOLON

The same behaviour is present while working with integers, for example for the following code 5 - 3asd4;

Such behaviour is wrong. Trying to parse these lines of code should throw a lexical error, saying, that number can contain only digits.

Add more useful error output

To print more useful error messages, there is a need to add information about each token where it starts and ends. This information can be stored in each token of lexical analyzer and be passed to syntax analyzer. Syntax analyzer nodes can also contain this information in node attributes to print its error during analysis.

Add text filtering before lexical analyzer

Source code must be filtered before lexical analzying:

  • One line comments starting with // ... must be removed
  • Multiline comments starting with /* and ending with */ must be removed

Such filtering may be implemented using regular expressions

Lexical analyser close php bracket bug

The following piece of code can be correctly parsed using compiler. This happens because one of the tokens it gets from lexical analyzer is ?>write with type IDENTIFIER. Anyway the source code is parsed correctly, because PHP standards allow programmers to skip closing php bracket (?>). So this statement is parsed like a function call with no closing PHP bracket after it. EOF after the last ; token just finishes syntax analysis.

<?php
  $a = 1;
  write($a);
?>write(1);

This bug is related to wrong lexical analyzer behaviour. Lexical analyzer must return these tokens separately from the source code:

  1. ?> (CLOSE_PHP_BRACKET)
  2. write (IDENTIFIER)
  3. ( (LEFT_PARENTHESIS)
  4. 1 (INTEGER)
  5. ) (RIGHT_PARENTHESIS)
  6. ; (SEMICOLON)

Code generator functions addition

The project will contain code generators functions. These functions will write commands to target program source code. The list of all the commands is listed in the task.pdf in the chapter 10.4. It is needed to create file with all code generator functions. Each function with instruction printing should append it to the string_t data type from #1 PR.

Return type mismatch check in semantic analyzer

Current implementation of semantic analysis does not support checking return types in function declaration. Returned value type from function must match return type from function declaration. This logic should be added

Add file comments according to the task

According to the section 12.4 of the task.pdf each source code file must be started with a comment which includes:

  • project name
  • students' logins and names, who's been working with a file
  • brief file description

Úvod všech zdrojových textů musí obsahovat zakomentovaný název projektu, přihlašovací jména a jména studentů, kteří se na daném souboru skutečně autorsky podíleli.

For future files it might be useful to create a File Template in CLion. Settings → Editor → File and Code Templates → Includes tab. Set the content of C File Header the following. Change #set directivies with your VUT login and your full name

#set( $Login = "xmoise01" )
#set( $Fullname = "Nikita Moiseev" )
/**
 * Implementace překladače imperativního jazyka IFJ22.
 * @authors
 *   ${Login}, ${Fullname}
 *
 * @file ${FILE_NAME}
 * @brief 
 * @date ${DATE}
 */

Syntax analyzer if and while brackets fix

Current implementation can parse the following code into SYN_NODE_IF. Its right branch will not contain SYN_NODE_SEQUENCE, but SYN_NODE_ASSIGN, which prevents semantic analysis from checking if variables in assignment expression are defined. This behaviour should be changed, to make if statement contain SYN_NODE_SEQUENCE in the right branch even without curly brackets

if ($a == 1) $b = 0;

Fix recursive function call in semantic analysis

Semantic analysis does not support calling function recursively. For example the following code will throw a SEMANTIC UNDEF VAR ERROR saying that factorial function is used before declaration

function factorial(int $n) : int {
  if ($n < 2) {
    $result = 1;
  } else {
    $decremented_n = $n - 1;
    $temp_result = factorial($decremented_n);
    $result = $n * $temp_result;
  }
  return $result;
}

The same error is called if two function are used recursively like in the following code

function abc(int $a) {
  return def($a - 1);
}

function def(int $a) {
  if ($a != 0) abc($a);
}

Optimiser logic addition. Simplification of mathematical expressions

Current compiler implementation does not support any code optimisations. It is needed to add mathematical expressions optimisations where it is possible. Optimiser must be able to simplify math expressions containing only constant values like in the example below. Float type can be removed in places where it is not needed - decimal part of a number is 0.

$a = 1 + 2;
# Simplify to $a = 3;

$b = 3.0 * 2;
# Simplify to $b = 6;

$c = 2.5 * 4;
# Simplify to $c = 10;

$d = 6.0 / 3.0;
# Simplify to $d = 2;

$e = 1.0 * 2 + 4.5 * 4 - (2 - 2.0)
# Simplify to $c = 20;

Grammar rules design

For creating syntax analyzer is needed to add grammar rules in some text file

Add function declaration parsing to syntax analyzer

Syntax analyzer must be capable to parse function declarations

Examples of code that should be parsed

Function declaration without arguments

function f() {
  # statements
}

Function declaration with non-typed arguments and no return type

function sum($a, $b) {
  return $a + $b;
}

Function declaration with non-typed arguments and return type

function sum($a, $b): int {
  return $a + $b;
}

Function declaration with typed arguments and no return type

function sum(int $a, int $b) {
  return $a + $b;
}

Function declaration with typed arguments and return type

function sum(int $a, int $b): int {
  return $a + $b;
}

Optimiser logic addition. Deletion of unused variables

It is needed to add deletion of unused variables to code optimiser. Variable can be deleted in the following cases:

  • It is not used anywhere in the code
  • It is used only in assignment expression, where the left part of assignment is a variable going to be deleted
// First case example
$a = 1;         # this line can be deleted
write("Hello world");

// Second case example
$i = 0;
$b = 0;         # this line can be deleted
while ($i < 10) {
  $b = $b + 1;  # this line can be deleted
  $i = $i + 1;
}
write($i);

Lexical analyzer tests addition

Lexical analyzer should be tested better for handling error code. The following errors should be tested:

  • strict_types error. Must be thrown when declare contains something which is not strict types or its value is not 0 or 1
  • Identifier declaration error. Must be thrown when identifier name starts with something which is not a letter ([a-zA-Z]) or an underscore (_)
  • Float parsing error. Must be thrown when float contains some character which is not a digit (\d). Must be implemented in #21
  • Integer parsing error. Must be thrown when integer contains some character which is not a digit ('\d'). Must be implemented in #21

Optimiser logic addition. Unreachable code elimination

Code optimiser must be able to remove unreachable code. Code is unreachable in the following cases:

  • Code statements after return statement in function / statements list
  • Always false condition body in if..else construction / Always true else body in if..else construction
  • Always false loop condition
    Examples of unreachable code are presented below
function sum($a, $b) {
  return $a + $b;
  write($a, $b); # this code is unreachable and can be removed
}

$s = 1;
# The condition below is unreachable - the following 3 lines of code can be removed.
if ($s == 2) {
  write("Hello");
}

$s = 2;
# Else body in the following if..else construction in unreachable and can be removed
if ($s == 2) {
  write("Hello");
} else {
  write("Bye");
}

$s = 10;
# The following while loop body is unreachable and can be removed
while ($s < 10)
  write($s);

Optimiser logic addition. Constants elimination

Code optimiser must be able to remove unused variables declaration. Variable declaration can be useless, when it is storing just constant values. Any usage of such variable can be replaced with its value like in the example below

$a = 1;
$a = $a + 2;  # Can be changed to $a = 1 + 2; First line can be deleted
write($a);    # Can be changed to write(3); Second line can be deleted

Write production bash script

Source code contains a lot of redundant files and code, which must not be included in final archive with completed task. All things, that are needed to be done, before creating completed task archive are stated in repository wiki in the following page. Bash script should be used to simplify execution of these tasks.

Moreover it might be reasonable to add execution of this bash script to CI pipeline. This pipeline should be triggered when milestone is closed, or when tag is created (for creating unplanned versions). The pipeline must create a release including an archive with completed task as a release attachment.

Syntax analyzer return parsing

Current syntax analyzer implementation does not support parsing return command in function or in any other block of code. This feature should be added.

Change syntax analyzer to parse same level commands to the same level in AST

Current syntax analyzer implementation will process the following program

$a = 1;
$b = 2;

as the following tree

graph TD
  R(SYN_NODE_SEQUENCE) --L--> RL(SYN_NODE_SEQUENCE)
  R --R--> RR(SYN_NODE_ASSIGN)
  RL --L--> RLL(NULL)
  RL --R--> RLR(SYN_NODE_ASSIGN)
  RLR --L--> RLRL(SYN_NODE_INDENTIFIER)
  RLR --R--> RLRR(SYN_NODE_INTEGER)
  RR --L--> RRL(SYN_NODE_INDENTIFIER)
  RR --R--> RRR(SYN_NODE_INTEGER)
Loading

Current implementation parses program in a kind of binary tree.

This should be changed to make syntax tree able to store multiple nodes in a particular level. So the mentioned program must be parsed into the following tree to reduce nesting of the whole tree. This implementation is more suitable unlike the second one

graph TD
  R(PROGRAM) --1--> R1(SYN_NODE_ASSIGN)
  R --2--> R2(SYN_NODE_ASSIGN)
  R --"..."--> RN(...)
  R1 --1--> R1L(SYN_NODE_INDENTIFIER)
  R1 --2--> R1R(SYN_NODE_INTEGER)
  R2 --1--> R2L(SYN_NODE_INDENTIFIER)
  R2 --2--> R2R(SYN_NODE_INTEGER)
Loading

Also this can be parsed into the following tree. But such implementation will be hard to debug

graph TD
  R(PROGRAM) --N--> R1(SYN_NODE_ASSIGN)
  R1 --L--> R1L(SYN_NODE_INDENTIFIER)
  R1 --R--> R1R(SYN_NODE_INTEGER)
  R1 --N--> R2(SYN_NODE_ASSIGN)
  R2 --L--> R2L(SYN_NODE_INDENTIFIER)
  R2 --R--> R2R(SYN_NODE_INTEGER)
  R2 --N--> R3(...)
Loading

Add code generator bash tests

Code generator should have tests. These tests can be written in bash due to complexity of testing its through GoogleTest - target code, output and input may differ in tests. This should be better tested using existing language interpreter and prepared source codes, inputs and output of the code via bash. Usage of bash script can be also useful for tracking interpreter return codes

Add semantic expressions analyzing

Syntax tree may contain expressions in the different parts of code. Semantic analysis should be abe to define type of its expression (INT, FLOAT, STRING).

Exmaple cases:

  • If expression is assigned to a variable, this variable must has the type of the expression.
$a = 1 + 2.5; // typeof $a = "float"

Optimiser logic addition. Functions elimination

Code optimiser must be able to remove unused functions declaration. Function is unused when it is not used anywhere in the code except its own scope. For example, the fib function can be removed from the example below. It is used in its own block of code, but it is not used anywhere else in the code, so it is unused.

function fib(int $x): int {
  if (x == 1 || x == 0) {
      return(x);
   } else {
      return fib(x - 1) + fib(x - 2);
   }
}

$a = 2;
write($a);

Creation of lexical analyzer FSM diagram

According to the task, chapter 12.3, project documentation must contain FSM diagram of lexical analyzer. This diagram can be created using any diagram creation tool. Diagram should be in .png format. Also it may be good to save diagram in a format of editor it has been made in. These files should be stored in docs folder of the code repository.

Add type casting to semantic analysis

Expressions, variables and function calls always have some type/return type. Some kind of leafs if AST might need type casting for processing invalid types passed.

Example cases:

  • Expressions in function call arguments must match the function definition type. Also it can be type casted if it is possible.
function test(string $a) {
  // ...
}

$a = 1; // typeof $a = "int"

test($a); // typeof $a = "int" => int can be type casted to string => $a = "1"
  • Relational expressions in loop and if conditions must be integer (1/0). They can be also type casted to int if it is possible
$a = "Hello"; // typeof $a = "string"

if ($a) // typeof $a = "string" => string can be type casted to int (bool): 1 if $a is not NULL and not empty, 0 otherwise => if-condition: $a != NULL || $a != ""
  // ...

$a = ""

while ($a) // typeof $a = "string" => string can be type casted to int (bool) => while-condition: $a != NULL || $a != ""
  // ...

Move semantic defined check to separate method

Current implementation of defined/undefined variables using symbol table is implemented in syntax tree creation. Tokens into symtable are inserted in l;exical analysis. This behaviour should not be in the mentioned methods. So it is needed to refactor this logic and move it to separate file with all semantic analysis functions.

Add loop parsing to syntax analyzer

Syntax analyzer must be capable to parse while loops

Examples of code that should be parsed

Simple while loop

while ($a != 0) 
  $a = $a - 1;

# OR

while ($a != 0) {
  $a = $a - 1;
}

While loop

while ($a != 0) {
  $a = $a - 1;
  $b = $b + 1;
}

Add lexical parsing of <?php and ?>

Lexical analyser cannot parse <?php and ?> from input string. These are the open and closing brackets of PHP code, which must be parsed by lexical analyser for their processing by next analysers.

Add full conditions parsing to syntax analyzer

Syntax analyzer must be capable to parse multiple if..else conditions.

Examples of code that should be parsed

Simple if conditions

if ($a == 1) $b = 2;

# OR

if ($a == 1) {
  $b = 2;
}

If..else conditions

if ($a == 1) $b = 2; else $b = 3;

# OR

if ($a == 1) {
 $b = 2;
} else {
 $b = 3;
}

Multiple if..else conditions

if ($a == 1) $b = 2; else if ($a == 2) $b = 3; else $b = 4;

# OR

if ($a = 1) {
 $b = 2;
} else if ($a == 2) {
 $b = 3;
} else {
 $b = 4;
}

Add function parsing to syntax analyzer

Syntax analyzer must be capable to parse function calls with any parameters

Examples of code that should be parsed

Simple function call

test();

Function call with simple parameters

test(1);

#OR

test(1, 2);

#OR

test(1, 2, 3, ...);    // function with any number of parameters

Function call with parameters

test(1 + 3, $a, f());

Syntax analyzer typed equal implementation

Current syntax analyzer implementation supports parsing only non-typed equal. It does not support typed equal parsing and throws and error in this case. This bug should be fixes

Syntax analyzer php brackets bug

Current implementation allows to parse any code without php brackets. This logic must be changed according to grammar rules. Source code must start with open php bracket (<?php/<?). Then source code must be parsed as usual. The end of the program can contain close php bracket (?>), but according to php standards this bracket is optional. Anyway, if there is a close php bracket in the end of source code, there must not be any other tokens except END_OF_FILE after it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.