Git Product home page Git Product logo

compilerhw1's Introduction

Lexical Analyzer (Scanner)

This assignment is to write a scanner for the μC language (NOT C language) with lex. This document gives the lexical definition of the language, while the syntactic definition and code generation will follow in subsequent assignments.

1. Lexical Definitions

Tokens are divided into two classes:

  • tokens that will be passed to the parser, and
  • tokens that will discarded by the scanner (e.g., recognized but not pased to the parser).

1.1 Tokens that will be passed to the parser

The following tokens will be recognized by the scanner and will be eventually passed to the parser.

1.1.1 Delimiters

1.1.2 Arithmetic, Relational, and Logical Operators

1.1.3 Keywords

1.1.4 Identifiers

An identifier is a string of letters (a ~ z , A ~ Z , _ ) and digits ( 0 ~ 9 ) and it begins with a letter or underscore. Identifiers are case-sensitive; for example, ident , Ident , and IDENT are not the same identifier. Note that keywords are not identifiers.

1.1.5 Integer Literals and Floating-Point Literals

Integer literals: a sequence of one or more digits, such as 1 , 23 , and 666 . Floating-point literals: numbers that contain floating decimal points, such as 0.2 and 3.141 .

1.1.6 String Literals

A string literal is a sequence of zero or more ASCII characters appearing between double-quote ( " ) delimiters. A double-quote appearing with a string must be written after a " , e.g., "abc" , "Hello world" , and "She is a \"girl\"" .

1.2 Tokens that will be discarded

The following tokens will be recognized by the scanner, but should be discarded, rather than returning to the parser.

1.2.1 Whitespace

A sequence of blanks(spaces), tabs, and newlines.

1.2.2 Comments

Comments can be added in several ways:

  • C-style is texts surrounded by /* and */ delimiters, which may span more than one line
  • C++ style comments are a text following a // delimiter running up to the end of the line.

Whichever comment style is encountered first remains in effect until the appropriate comment close is encountered. For example,

// this is a comment // line */ /* with /* delimiters */ before the end and /* this is a comment // line with some and C delimiters */ are both valid comments.

1.2.3 Other characters

The undefined characters or strings should be discarded by your scanner during parsing.

2. What should we do in this Scanner

2.1 Assignment Requirements

Here we have prepared 11 μC programs, which are used to test the functionalities of your scanner.

python3 judge/judge.py input/in01_arithmetic.c output/in01.out

2.2 Output

2.3 Debug

Make Makefile

$ make clean && make

Execute

$ ./myscanner < input/in01_arithmetic.c > output/in01.out

Check diff

$ diff -y tmp.out answer/in01_arithmetic.out
$ od -c answer/in05_comment.out

3. Environmental Setup

  • Ubuntu 20.04 LTS
  • Install dependencies: $ sudo apt install gcc flex bison python3 git

4. References

compilerhw1's People

Contributors

howeng98 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

limitinit

compilerhw1's Issues

[Compiler2021] Warning by TA

Hi @Howeng98
Your code can be accessed in public. If anyone plagiarize your code, you will also have score penalty for this public repo. Please set this repo into private repo ASAP. You can set it to public after the semester.
Thank you for the cooperation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.