adam-mcdaniel / sage Goto Github PK

View Code? Open in Web Editor NEW

259.0 6.0 7.0 27.7 MB

A programming language that's wise beyond its bytes!🌱🌿🪴

Home Page: https://adam-mcdaniel.net/sage

License: MIT License

Rust 100.00%

algebraic-data-types c compiler pattern-matching polymorphism portable rust structural-typing turing-tarpit wasm

sage's Introduction

🌿🌱The Sage Programming Language🪴🍃

Sage advice for your coding conundrums!

Here's a link to the online compiler playground!

Community
What is Sage?
Why Sage?
How useful is Sage?
How do I use Sage?
What does Sage look like?
Feature Roadmap
Where can I learn more?
How do I contribute?
About the Author

Community

Join the Discord server to chat about Sage! Let us know if you have any thoughts or comments about the language!

What is Sage?

Sage is a programming language that tries to be maximally portable, expressive, and intuitive. It borrows some aspects of Rust, C, and Python. It currently has an x86 compiler backend, a C source backend, and a VM interpreter backend which can run on the web.

Sage is licensed under the MIT License, and has been under development since April 2022.

Why Sage?

Sage is very portable -- run it on your thermostat! Here's the complete list of core virtual machine instructions and their C equivalents:

Instruction	C Equivalent
`while`	`while (reg) {`
`if`	`if (reg) {`
`else`	`} else {`
`end`	`}`
`set N`	`reg = N;`
`call`	`funs[reg]();`
`ret`	`return;`
`save`	`*tape_ptr = reg;`
`res`	`reg = *tape_ptr;`
`move N`	`tape_ptr += N;`
`where`	`reg = tape_ptr;`
`deref`	`push(tape_ptr); tape_ptr = *tape_ptr;`
`refer`	`tape_ptr = pop();`
`index`	`reg = (cell)(reg) + tape_ptr;`
`add`	`reg += *tape_ptr;`
`sub`	`reg -= *tape_ptr;`
`mul`	`reg = tape_ptr;`
`div`	`reg /= *tape_ptr;`
`rem`	`reg %= *tape_ptr;`
`nand`	`reg = ~(reg & *tape_ptr);`
`gez`	`reg = reg >= 0;`

You could fit this instruction set on a T-shirt!

The compiler can target this limited "core" instruction set, with an expanded "standard" instruction set for floating point operations and foreign functions. The core instruction set is designed to be as simple as possible for anyone to implement their own backend. Try to see if you can implement it yourself for your backend of choice!

The virtual machine has some important optimization properties: Although Sage's VM is a very simple zero-address-code representation, it preserves all the information to reconstruct an LLVM-like three-address-code representation of the original higher level IR. This makes the instruction set capable of applying LLVM's optimizations while being much easier to implement. Sage's innovation is in the backend, not the frontend.

This combination of simplicity and capacity for optimization was my motivation for creating Sage. I wanted to create a virtual machine with the largest speed + expression + portability to implementation difficulty ratio, and a high level language that could compile to it. I think Sage is a good solution to this problem.

This project is based on some ideas I had while working on Harbor for a hackathon.

How useful is Sage?

Sage is a very young project, and is not ready for production. It's still possible to write very useful programs in it, though.

SageOS is an operating system with a userspace written in Sage. Its graphical shell and presentation app (both written in Sage) use the FFI to draw to the screen, receive input from the mouse and keyboard, interact with the filesystem, and schedule new processes. You can look at the shell code here.

The presentation app parses PPM image files from the filesystem and renders them to the screen. You can look at the presentation code here.

Sage's foreign function interface is simple and can directly call C functions or backend-specific builtins. Check out the web-demo's foreign function interface example that calls some JavaScript code to draw graphics or alert the user!

How do I use Sage?

To start using sage, install it with cargo:

$ cargo install --git https://github.com/adam-mcdaniel/sage

Then, you can run a sage file with the sage command:

$ sage examples/frontend/interactive-calculator.sg

You can also compile a sage file to C with the --target flag:

$ sage examples/frontend/interactive-calculator.sg --target c
$ # Or `-t c` for short
$ sage examples/frontend/interactive-calculator.sg -tc
$ gcc out.c -o out
$ ./out

Check out the code for the web-demo to see how to use Sage in a web page.

What does Sage look like?

Here's an example of a polymorphic linked list in Sage using Rust-like enums! It's straightforward to implement operations like map with just a few lines.

Here's an example of Sage's structural typing: a Rectangle can be created by concatenating the fields of a Position and a Size!

Here's an example of Sage's pattern matching: it's easy to deconstruct a value using match, if let, or a simple let binding. Sage's match expressions are very powerful!

Go to the web-demo or the examples/frontend folder to see more code examples.

Feature Roadmap

Where can I learn more?

You can read my blog post (~20 minute read) about the programming language to learn more about the implementation!

Join the Discord server to chat about Sage!

How do I contribute?

If you want to contribute, you can open an issue or a pull request. Adding backends for other architectures is a great way to contribute! We also need a VSCode syntax highlighting extension!

About the Author

I'm a 21 year old computer science graduate student at the University of Tennessee, Knoxville🍊. Rust is my favorite language, and I've written many other compilers. This is the last project I started as a teenager, and I was the only author to touch any of the code up to version v0.0.2-alpha (12/25/2023)! I'm looking for work opportunities for Summer 2024 (after I finish my Masters degree), so if you're interested in hiring me, please reach out to me at [email protected]!

sage's People

Contributors

Stargazers

Watchers

Forkers

isgasho bichanna 9bany ronsor gaohu0425 samuelmcgowan

sage's Issues

Add `BitwiseNand` to README and CORE documents

I have swapped the Swap instruction for the BitwiseNand instruction.

Add mutability checks

Add mutability checks so that immutable data can't be written to.

Add `examples/` files to the test cases

Add a test case to run all the examples and check against expected outputs.

Add LIR examples to the README

LIR is cool, demonstrate some!

Add new operators and use them in the LIR examples

The LIR examples use several inlined-assembly procedures to access comparison, logical, and bitwise operators. Add the missing operators and swap out the new ones in the LIR examples.

Parameterized Types in Lower Intermediate Representation

I want to add parameterized types to the lower intermediate representation. I expect the architecture to look something like this.

// src/lir/types.rs

enum Param {
    Type(Type),
    Const(ConstExpr),
}

// ...

enum Type {
    // ...
    
    LetParam(Vec<(String, Vec<(String, Option<Type>)>, Box<Type>>)>, Box<Self>),
    Param(Box<Type>, Vec<Param>)
}

This way, the LetParam will be simplified in terms of the Let variant of the Type type.

let StackVec<T, N: Int> = unit StackVec = [T; N],
    List<T> = struct {
        data: T,
        next: &List<T>
    } in ...

Parameterized types can be substituted directly to an equivalent Let expression of types by converting them like so:

let List<T> = struct {
        data: T,
        next: &List<T>
    } in
        let x: List<Int> = ...

// becomes

let x: (let T = Int, MANGLED_LIST_T_NAME = struct { data: T, next: &MANGLED_LIST_T_NAME } in MANGLED_LIST_T_NAME) in ...

Whenever a parameterized type is instantiated, it checks if that specific instance of the type has already been instantiated. This prevents creating several copies of the type and possibly recursing infinitely.

If not, we drop in the type definition. This will create an in-line type definition with a mangled, unique name for that specific combination of type / const arguments.

Add Standard instructions for parallelism

Add instructions like Spawn which creates another tape head with its own independent state elsewhere on the tape which runs concurrently asynchronously.

Migrate from LALRPOP to Pest

LALRPOP has been a pain. Use Pest instead. Right now the frontend uses Pest, and the stages of IR use LALRPOP.

Join `FFI` and `Input` + `Output` into a `side_effect` module

See tile

Change diagrams and table for new instructions

I changed the Inc and Dec operators to two new operators:

Index
Swap

Index looks at the tape, reads the value, and uses that as an index for a pointer stored in the register. The address of the cell at that index is stored in the register. For implementations that use tape indexes as pointers, like the interpreter, this operation is simply addition. For implementations where pointers do not increment by 1 (an implementation with real pointers), this is addition with a constant multiplier.

Swap simply swaps the value at the tape with the value in the register.

FFI acronym in readme.md may be unrecognized

... or, maybe I'm the only person interested in computer programming languages who doesn't know what FFI stands for. In which case, just close this issue, I guess. I actually suspect that might be true, and didn't even look to see if anyone had opened an equivalent issue, because I thought it very unlikely. The trouble with "just [search] it" is that it's work, even if not very much.

VM Needs a Formalized Bytecode Specification

Create nice Turing machine interpreter in web for backend

I want to be able to see the turing tape in operation with a tape head using a neat graphic. Something like https://adam-mcdaniel.net/harbor

Change `crate::lir::types::Type::Unit(Name, Ty)` to a better name, like `..::Type::Nominal(..)`

A Unit type traditionally means the None type in common nomenclature.

Currently, the Unit type in Sage is a type that enforces structural equality and nominal equality. A Unit(Meter, Int) is not equal to an Int, or a Unit(Kilometer, Int); it is only equal to another Unit(Meter, Int), where the inner types are structurally equal. An Int can be cast to a Unit(Meter, Int) and vice versa. This adds functionality to the type system to typecheck against using a "Meter" where a "Kilometer" is required, even though the data used to represent both the types is the same. It forces the user to perform the unit conversion at the type system level.

I'm thinking a name like Unique, or Nominal could be used instead.

Function Flattening

Right now, the compiler will generate code with nested function definitions. This is fine by the virtual machine specification, but it makes it harder to compile the output code. Right now, neither the interpreter or C target call functions properly. In both implementations, functions are only defined when their definition is executed.

To fix this, we simply need to flatten the functions like so:

fn f0() {
    fn f1() {
        fn f2() {

        }
        fn f3() {
            fn f4() {
            }
        }
    }
}
// Becomes:
fn f0() {}
fn f1() {}
fn f2() {}
fn f3() {}
fn f4() {}

This can be done in the VM code, it doesn't have to be done in the frontend. This should just be a simple pass before we compile the code to a specific target.

Add LLVM backend

Add LLVM backend as a target. Make it available through an optional feature flag like llvm-support or something; LLVM is an utterly massive dependency that I don't want to rely on. If performance isn't good enough, we can directly lower the psuedo-assembly-language to LLVM; this should achieve performance on par with other languages like C.

Cargo clippy

Fix all the dumb clippy warnings for src/target/c.rs!

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO)

Hi!

I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are available here (with a lot of other PGO-related information). Since PGO helps with achieving better performance with many compilers (like Rustc, GCC, Clang, etc.) I think trying to optimize Sage with PGO can be a good idea. I did some benchmarks and want to share my results.

Test environment

Fedora 39
Linux kernel 6.6.9
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.75
Sage version: the latest for now from the main branch on commit 72b536f61ebb6332c57cf57fab9fe53b1e878c1d
Disabled Turbo boost (for more stable results across runs)

Benchmarks

As a benchmark, I use built-in benchmarks with cargo bench command. For the PGO optimization phase, I use cargo-pgo with cargo pgo optimize bench. For the PGO training phase, I use the same benchmark with cargo pgo bench.

Results

I got the following results:

Release: https://gist.github.com/zamazan4ik/09666344f7cb0ee92a69d4a14a8b50e6
PGO-optimized compared to Release: https://gist.github.com/zamazan4ik/2adb489319886015c98e393cac5e2e57
(just for reference) PGO-instrumented compared to Release: https://gist.github.com/zamazan4ik/ae10d0fa65fb4be599876735b7ef15a6

According to the tests, PGO makes things faster in Sage.

I need to note that enabling Link-Time Optimization (LTO) is generally a good idea too - this optimization works well together with PGO. I even performed some benchmarks where I compared LTO to Release: https://gist.github.com/zamazan4ik/6be63330d2c97b510fdfc6b7aa7988c5 . However, in some cases, there are performance regressions - that need to be investigated. LTO was enabled with codegen-units = 1 and lto = "fat" for the corresponding profiles in Cargo.toml file.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks on Sage. If it shows improvements - add a note to the documentation about possible improvements in Sage performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize Sage according to their workloads.

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated into other projects:

Rustc: a CI script for the multi-stage build
GCC:
- Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- A part in a "wonderful" configure script
Clang: Docs
Python:
- CPython: README
- Pyston: README
Go: Bash script
V8: Bazel flag
ChakraCore: Scripts
Chromium: Script
Firefox: Docs
- Thunderbird has PGO support too
PHP - Makefile command and old Centminmod scripts
MySQL: CMake script
YugabyteDB: GitHub commit
FoundationDB: Script
Zstd: Makefile
Foot: Scripts
Windows Terminal: GitHub PR
Pydantic-core: GitHub PR
file.d: GitHub PR
OceanBase: CMake flag

Please treat the issue just as a benchmark report - it's not an actual error, crash, or something like that. I don't know how much you care about performance in Sage so I don't know how important these improvements are for the project. I hope we can use these benchmarks at least as an additional data point about PGO efficiency for compilers.

CLI

Add a CLI so I can compile and run from the terminal instead of messing with main.rs.

Add more comments for debugging, and also add debugging symbols

It would be nice to insert more comments in the generated assembly at the LIR level. This is straightforward.

It would also be nice to have debugging symbols in the assembly (and possibly VM code as well) to allow a program to step through the compiled code and reference against the source code.