Git Product home page Git Product logo

sage's Introduction

🌿🌱The Sage Programming Language🪴🍃

Sage advice for your coding conundrums!

Here's a link to the online compiler playground!

Table of Contents

Community

Join the Discord server to chat about Sage! Let us know if you have any thoughts or comments about the language!

What is Sage?

Sage is a programming language that tries to be maximally portable, expressive, and intuitive. It borrows some aspects of Rust, C, and Python. It currently has an x86 compiler backend, a C source backend, and a VM interpreter backend which can run on the web.

Sage is licensed under the MIT License, and has been under development since April 2022.

Why Sage?

Sage is very portable -- run it on your thermostat! Here's the complete list of core virtual machine instructions and their C equivalents:

Instruction C Equivalent
while while (reg) {
if if (reg) {
else } else {
end }
set N reg = N;
call funs[reg]();
ret return;
save *tape_ptr = reg;
res reg = *tape_ptr;
move N tape_ptr += N;
where reg = tape_ptr;
deref push(tape_ptr); tape_ptr = *tape_ptr;
refer tape_ptr = pop();
index reg = (cell*)(reg) + *tape_ptr;
add reg += *tape_ptr;
sub reg -= *tape_ptr;
mul reg *= *tape_ptr;
div reg /= *tape_ptr;
rem reg %= *tape_ptr;
nand reg = ~(reg & *tape_ptr);
gez reg = reg >= 0;

You could fit this instruction set on a T-shirt!

The compiler can target this limited "core" instruction set, with an expanded "standard" instruction set for floating point operations and foreign functions. The core instruction set is designed to be as simple as possible for anyone to implement their own backend. Try to see if you can implement it yourself for your backend of choice!

The virtual machine has some important optimization properties: Although Sage's VM is a very simple zero-address-code representation, it preserves all the information to reconstruct an LLVM-like three-address-code representation of the original higher level IR. This makes the instruction set capable of applying LLVM's optimizations while being much easier to implement. Sage's innovation is in the backend, not the frontend.

This combination of simplicity and capacity for optimization was my motivation for creating Sage. I wanted to create a virtual machine with the largest speed + expression + portability to implementation difficulty ratio, and a high level language that could compile to it. I think Sage is a good solution to this problem.

This project is based on some ideas I had while working on Harbor for a hackathon.

How useful is Sage?

Sage is a very young project, and is not ready for production. It's still possible to write very useful programs in it, though.

SageOS is an operating system with a userspace written in Sage. Its graphical shell and presentation app (both written in Sage) use the FFI to draw to the screen, receive input from the mouse and keyboard, interact with the filesystem, and schedule new processes. You can look at the shell code here.

Shell1 Shell2

The presentation app parses PPM image files from the filesystem and renders them to the screen. You can look at the presentation code here.

Presentation

Sage's foreign function interface is simple and can directly call C functions or backend-specific builtins. Check out the web-demo's foreign function interface example that calls some JavaScript code to draw graphics or alert the user!

How do I use Sage?

To start using sage, install it with cargo:

$ cargo install --git https://github.com/adam-mcdaniel/sage

Then, you can run a sage file with the sage command:

$ sage examples/frontend/interactive-calculator.sg

You can also compile a sage file to C with the --target flag:

$ sage examples/frontend/interactive-calculator.sg --target c
$ # Or `-t c` for short
$ sage examples/frontend/interactive-calculator.sg -tc
$ gcc out.c -o out
$ ./out

Check out the code for the web-demo to see how to use Sage in a web page.

What does Sage look like?

Here's an example of a polymorphic linked list in Sage using Rust-like enums! It's straightforward to implement operations like map with just a few lines.

Here's an example of Sage's structural typing: a Rectangle can be created by concatenating the fields of a Position and a Size!

Here's an example of Sage's pattern matching: it's easy to deconstruct a value using match, if let, or a simple let binding. Sage's match expressions are very powerful!

Go to the web-demo or the examples/frontend folder to see more code examples.

Feature Roadmap

  • Compiler Backends
    • x86 (semi-implemented and unoptimized)
    • RISC-V
    • ARM
    • LLVM (highly desired!)
    • C (fully-implemented but unoptimized)
    • Interpreter (fully-implemented but unoptimized)
    • Web Backend
  • Static variables and constant expressions
  • Conditional compilation
  • Polymorphic functions
  • Mutability checks
  • Rust-like enums
  • Pattern matching
  • Structural typing
  • Associated constants and methods
  • Recursive polymorphic types
  • Iterators and list/vector/array comprehensions
  • Hindley-Milner type inference
  • VSCode extension (syntax highlighting, code completion, etc.)
  • Typeclasses
  • no-std implementation of compiler
  • Modules
  • A standard library
    • Type Reflection Module
    • Collections Module
    • Networking Module
    • Filesystem Module
    • Graphics Module
    • Audio Module
    • GUI Module
    • WebAssembly Module
    • Foreign Function Interface Module (create backend with .toml file)
    • Memory Management Module
  • Better frontend parser (switch to Nom?)
  • A package manager
  • AST Macros
  • C frontend (compile C to Sage VM)
  • Self-hosting implementation

Where can I learn more?

You can read my blog post (~20 minute read) about the programming language to learn more about the implementation!

Join the Discord server to chat about Sage!

How do I contribute?

If you want to contribute, you can open an issue or a pull request. Adding backends for other architectures is a great way to contribute! We also need a VSCode syntax highlighting extension!

About the Author

I'm a 21 year old computer science graduate student at the University of Tennessee, Knoxville🍊. Rust is my favorite language, and I've written many other compilers. This is the last project I started as a teenager, and I was the only author to touch any of the code up to version v0.0.2-alpha (12/25/2023)! I'm looking for work opportunities for Summer 2024 (after I finish my Masters degree), so if you're interested in hiring me, please reach out to me at [email protected]!

sage's People

Contributors

adam-mcdaniel avatar bichanna avatar ronsor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sage's Issues

Parameterized Types in Lower Intermediate Representation

I want to add parameterized types to the lower intermediate representation. I expect the architecture to look something like this.

// src/lir/types.rs

enum Param {
    Type(Type),
    Const(ConstExpr),
}

// ...

enum Type {
    // ...
    
    LetParam(Vec<(String, Vec<(String, Option<Type>)>, Box<Type>>)>, Box<Self>),
    Param(Box<Type>, Vec<Param>)
}

This way, the LetParam will be simplified in terms of the Let variant of the Type type.

let StackVec<T, N: Int> = unit StackVec = [T; N],
    List<T> = struct {
        data: T,
        next: &List<T>
    } in ...

Parameterized types can be substituted directly to an equivalent Let expression of types by converting them like so:

let List<T> = struct {
        data: T,
        next: &List<T>
    } in
        let x: List<Int> = ...

// becomes

let x: (let T = Int, MANGLED_LIST_T_NAME = struct { data: T, next: &MANGLED_LIST_T_NAME } in MANGLED_LIST_T_NAME) in ...

Whenever a parameterized type is instantiated, it checks if that specific instance of the type has already been instantiated. This prevents creating several copies of the type and possibly recursing infinitely.

If not, we drop in the type definition. This will create an in-line type definition with a mangled, unique name for that specific combination of type / const arguments.

Migrate from LALRPOP to Pest

LALRPOP has been a pain. Use Pest instead. Right now the frontend uses Pest, and the stages of IR use LALRPOP.

Change diagrams and table for new instructions

I changed the Inc and Dec operators to two new operators:

  1. Index
  2. Swap

Index looks at the tape, reads the value, and uses that as an index for a pointer stored in the register. The address of the cell at that index is stored in the register. For implementations that use tape indexes as pointers, like the interpreter, this operation is simply addition. For implementations where pointers do not increment by 1 (an implementation with real pointers), this is addition with a constant multiplier.

Swap simply swaps the value at the tape with the value in the register.

FFI acronym in readme.md may be unrecognized

... or, maybe I'm the only person interested in computer programming languages who doesn't know what FFI stands for. In which case, just close this issue, I guess. I actually suspect that might be true, and didn't even look to see if anyone had opened an equivalent issue, because I thought it very unlikely. The trouble with "just [search] it" is that it's work, even if not very much.

Change `crate::lir::types::Type::Unit(Name, Ty)` to a better name, like `..::Type::Nominal(..)`

A Unit type traditionally means the None type in common nomenclature.

Currently, the Unit type in Sage is a type that enforces structural equality and nominal equality. A Unit(Meter, Int) is not equal to an Int, or a Unit(Kilometer, Int); it is only equal to another Unit(Meter, Int), where the inner types are structurally equal. An Int can be cast to a Unit(Meter, Int) and vice versa. This adds functionality to the type system to typecheck against using a "Meter" where a "Kilometer" is required, even though the data used to represent both the types is the same. It forces the user to perform the unit conversion at the type system level.

I'm thinking a name like Unique, or Nominal could be used instead.

Function Flattening

Right now, the compiler will generate code with nested function definitions. This is fine by the virtual machine specification, but it makes it harder to compile the output code. Right now, neither the interpreter or C target call functions properly. In both implementations, functions are only defined when their definition is executed.

To fix this, we simply need to flatten the functions like so:

fn f0() {
    fn f1() {
        fn f2() {

        }
        fn f3() {
            fn f4() {
            }
        }
    }
}
// Becomes:
fn f0() {}
fn f1() {}
fn f2() {}
fn f3() {}
fn f4() {}

This can be done in the VM code, it doesn't have to be done in the frontend. This should just be a simple pass before we compile the code to a specific target.

Add LLVM backend

Add LLVM backend as a target. Make it available through an optional feature flag like llvm-support or something; LLVM is an utterly massive dependency that I don't want to rely on. If performance isn't good enough, we can directly lower the psuedo-assembly-language to LLVM; this should achieve performance on par with other languages like C.

Cargo clippy

Fix all the dumb clippy warnings for src/target/c.rs!

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO)

Hi!

I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are available here (with a lot of other PGO-related information). Since PGO helps with achieving better performance with many compilers (like Rustc, GCC, Clang, etc.) I think trying to optimize Sage with PGO can be a good idea. I did some benchmarks and want to share my results.

Test environment

  • Fedora 39
  • Linux kernel 6.6.9
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.75
  • Sage version: the latest for now from the main branch on commit 72b536f61ebb6332c57cf57fab9fe53b1e878c1d
  • Disabled Turbo boost (for more stable results across runs)

Benchmarks

As a benchmark, I use built-in benchmarks with cargo bench command. For the PGO optimization phase, I use cargo-pgo with cargo pgo optimize bench. For the PGO training phase, I use the same benchmark with cargo pgo bench.

Results

I got the following results:

According to the tests, PGO makes things faster in Sage.

I need to note that enabling Link-Time Optimization (LTO) is generally a good idea too - this optimization works well together with PGO. I even performed some benchmarks where I compared LTO to Release: https://gist.github.com/zamazan4ik/6be63330d2c97b510fdfc6b7aa7988c5 . However, in some cases, there are performance regressions - that need to be investigated. LTO was enabled with codegen-units = 1 and lto = "fat" for the corresponding profiles in Cargo.toml file.

Further steps

I can suggest the following action points:

  • Perform more PGO benchmarks on Sage. If it shows improvements - add a note to the documentation about possible improvements in Sage performance with PGO.
  • Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize Sage according to their workloads.

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated into other projects:

Please treat the issue just as a benchmark report - it's not an actual error, crash, or something like that. I don't know how much you care about performance in Sage so I don't know how important these improvements are for the project. I hope we can use these benchmarks at least as an additional data point about PGO efficiency for compilers.

CLI

Add a CLI so I can compile and run from the terminal instead of messing with main.rs.

Add more comments for debugging, and also add debugging symbols

It would be nice to insert more comments in the generated assembly at the LIR level. This is straightforward.

It would also be nice to have debugging symbols in the assembly (and possibly VM code as well) to allow a program to step through the compiled code and reference against the source code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.