pilif0 / basilisk Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 294 KB

LLVM frontend for my pet programming language

License: MIT License

CMake 1.53% C++ 98.43% Shell 0.04%

cmake compiler-frontend cpp cpp17 llvm llvm-frontend programming-language

basilisk's People

Contributors

Stargazers

Watchers

basilisk's Issues

Simple function definition variant

A feature I enjoyed in my work with Kotlin was defining functions with an expression body (see Kotlin reference). These make the code cleaner and easier to read, and shouldn't be too hard to implement.

In essence, a function definition:

f(x, y) = x + y;

would be equivalent to:

f(x, y) {
    return x + y;
}

Rich tokens

More information (e.g. line number and character) should be included in tokens. This information should then be used in the parser to improve error reporting.

Can order of definitions in a program not matter?

As discussed in the LLVM IR generation pull request (#4), there is a question of whether order of definitions in a program should matter. At the time of that pull request, making definition order not matter would produce ambiguities and would require handling of special cases. Therefore the decision was taken to make the order matter.

This issue is created to continuously examine when the change to definition order not mattering could be made, and what it would entail.

Data types

The language needs to have more data types than just doubles. I propose at least the following types based on the types in LLVM:

Void
Boolean
Integers of widths 8, 16, 32 and 64 bits (byte, short, int, long)
Floating-point values of widths 32 and 64 bits (float, double)
Array
Structure

It might also be good to implement further types while this is being done, such as vectors, and prepare for later implementation of pointers.

With these new types, it seems appropriate to expand the set of valid literals:

Decimal, binary, hexadecimal integer literals - e.g. 7, 0b111, 0x7
Floating point literals - e.g. 3.14
Scientific notation - e.g. 3.6e2, 2e-4
Boolean literals - true and false

Furthermore, underscores should be allowed and discarded in literals, allowing more readable formatting, for example 0xffff_f0f0_abcd_1234 instead of 0xfffff0f0abcd1234.

Unsigned versions of the types should also be considered. LLVM doesn't distinguish between signed and unsigned types, that is done when selecting an instruction to use.

Nested blocks

If we regard a block of statements as a statement in itself, they can naturally be nested. This would allow better management of scope as well as prepare for implementation of conditional statements and loops.

Tasks:

Adjust grammar to consider a block of statements as a statement
Add AST node type Block containing a set of statements to reflect this
Generate code from this node type by pushing a scope on the named values stack, executing the statements in sequence and popping the scope

Error token recognition

Currently error tokens are picked up by the parser as unexpected tokens (as error tokens are never expected). It would be better if a unified way of intercepting error tokens as added. Then they could be better reported, with possible recommendations based on the context. One of the main requirements for the solution is that it interferes as little as possible with the actual parsing, in order to keep the parser as easy to expand as possible.

Remove main function wrapper

Due to everything being a double, there needs to be a wrapper around the main function that converts the double it returns into the integer that the system expects. Once more data types are added this wrapper can be removed.

Possibility of dropping semicolon requirement

Currently all statements are required to end in a semicolon. While thinking about designs for other features, I started wondering whether this requirement is really necessary or could be dropped.

The semicolon currently works to divide statements. As one of my main principles for basilisk is that whitespace should not matter beyond dividing tokens, I can't replace it with a deadline and force each statement on a separate line. This would give meaning to whitespace and make it less suitable for formatting code without impacting function.

This issue is focused on simply dropping the semicolon and seeing what ambiguities are produced and if they can be reconciled. If it seems that all possible ambiguities can be easily solved, I would proceed with removing the requirement while keeping the option to include a semicolon there in case it is preferable for readability.

Global variable multiple initializers

Multiple definitions of the same global variable currently produce multiple initializers, with the variable taking on the value of the last initializer for the full execution. This behaviour is unintuitive and should be removed. A good time to straighten this would be when adding more data types and differentiating variable definitions and assignments.

Identifiers starting with underscore

Currently all identifiers have to start with a letter. I think it would be good to expand this to allow identifiers starting with an underscore, which is often used in other languages.

pilif0 / basilisk Goto Github PK

basilisk's People

Contributors

Stargazers

Watchers

basilisk's Issues

Simple function definition variant

Rich tokens

Can order of definitions in a program not matter?

Data types

Nested blocks

Tasks:

Error token recognition

Remove main function wrapper

Possibility of dropping semicolon requirement

Global variable multiple initializers

Identifiers starting with underscore

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent