Git Product home page Git Product logo

arua-meta's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

arua-meta's Issues

RFC: Terminology

TODO

  • zone
  • unit
  • trait
  • struct
  • behavior

This could also be called "[Core] Concepts".

RFC: Primitive Types and Type Qualifiers

Arua's primitive types consist of integers (unsigned/signed), floating point numbers, and arrays of those types. To be clear, arrays of types (including arrays of primitives) are their own individual types and are not directly coercible to their primitive counterparts.

Type Qualifiers

All declarations (foo Type) are considered immutable without the mutable type qualifier (foo !Type). As well, all declarations are considered mandatory (cannot be nil) unless they have the optional type qualifier (foo Type?). These qualifiers can be combined (foo !Type?). Type qualifiers do not create or affect any type (they describe individual usages of existing types within a statement, rather than define the type itself).

Implicit Conversions

Implicit Conversions are intuitive and simple in Arua. For a unit with a type Child that is a sub-type of Base (e.g. via a typedef):

row = column Base Child !Base !Child Base? Child? !Base? !Child? nil
Base ✓* ✓* ✓* ✓*
Child ✓* ✓*
!Base
!Child
Base? ✓* ✓* ✓* ✓* ✓* ✓* ✓* ✓* ✓*
Child? ✓* ✓* ✓* ✓* ✓*
!Base?
!Child?

* Only if it's the initial assignment

Optional Qualifiers

Optional qualifiers force a check of nil for all types before they can be referenced.

foo Type? = getFoo()
foo.someMethod() # compilation error

# ---

foo Type? = getFoo()
if foo != nil
    foo.someMethod() # compiles OK

This check honors visibility (#19). Calls into functions check that operate on optional types must be in code paths that have already checked for nil. Unit-public functions always assume such checks have not been made.

pub fn publicFunction(foo Type?)
    privateFunction(foo) # error - privateFunction() argument `bar` is not optional protected

fn privateFunction(bar Type?)
    bar.someMethod() # doesn't check for `nil`
pub fn publicFunction(foo Type?)
    if foo != nil
        privateFunction(foo) # compiles okay

fn privateFunction(bar Type?)
    bar.someMethod() # doesn't check for `nil`

Signed-ness

All signed types (i32, i16, i8, iXX) are Two's Complement representations.

Floating Point

All floating point types (f32, f64, fXX) are IEEE 754 compliant floating point numbers.

RFC: Method Forwarding

Method forwarding in Arua is the idea that, since it's planned that there is no such thing as inheritance and only composition, we make method forwarding very easy.

struct Eye
    # ...

on Eye
    fn look(...)
        # look at something

struct Nose
    # ...

on Nose
    fn sniff(...)
        # sniff something

struct Face
    leftEye Eye
    rightEye Eye
    nose Nose

on Face
    fn look(...)
        this.leftEye.look(...)
        this.rightEye.look(...)

    use this.nose.sniff as sniff # (Face()).sniff() now calls (Face()).nose.sniff()

RFC: Enumerations

TODO

A note about enums: Should have nested enum support:

enum AstType
    STATEMENT
    TYPE
        SCALAR
        ARRAY
        POINTER

type AstType = TYPE::SCALAR

Or something.

RFC: StdLib: std.math (Math library)

The std.math library is a standard library that provides functions and operators.

TODO

I want to provide standard implementations of FFT, operators that use mathematical symbols, etc. For instance,

use std.math.ops.*

a i32 = 4
b f32 = √a

Any ideas on this would be nice. Fleshing out the standard library isn't something I want to mess with yet as I have a strong opinion about separating the language from the libraries.

RFC: Forced zones

Should all code be forced into a zone (namespace)? The dependency system already uses a folder structure to segregate into zones, similar to Java. However, Java allows the use of a default package, which doesn't allow any other code to reference members in the default package.

RFC: Pluggable Semantics

TODO

Should talk about switching out how string interpolation is implemented on a per-module basis, or per block via compiler details, etc.

Idea is that I don't want to run into the 'magic'-ness of Python without having complete control over what the magic does.

Syntax is Style philosophy?

With Python, we have PEP8 which is a 'governmental' style guide imposed (or at least heavily pushed) by the Python org itself. It seems to its job of making others look poorly upon you if you don't use it.

This brings to question, should a language impose a style on its users? My compulsory response says "no" but thinking about it further I have concluded it's really up to the language.

For example, there are a lot of subjective things about braced languages: do braces go on the same line, or the next line? Do closing braces go on the last line or on their own line? These are things often enforced by a style guide or a linter rather than the compiler itself.

What if that wasn't the case?

I feel like a lot of decisions Arua is making about the language itself are resulting in fairly clean syntax. A lot of symbols not essential to the language have been taken out. An example are function definitions:

fn someFunction(arg1 i32, arg2 str) f32
    # ...

In the above example, there isn't much to really argue about - it boils down to just a few things:

  • space before the ( after someFunction?
  • space after # in the comment?
  • tabs, or spaces?
  • how many tabs or spaces?

While these four things are important, quantitatively these are much less than other languages.

With the option disabling it, should Arua enforce a style by default?

RFC: Custom operators

Swift has a very interesting semantic for creating custom operators. I like the idea, though I am a skeptical C++ adept and have seen a few horrific examples of it being done. However, I'm not opposed to the idea completely and think it could have some amazing results.

One of the constraints in place would be that custom operators can only be defined where the associative type itself is defined. For builtins or the standard library, this constraint can be turned off (as it can be with regular code via compiler details).

This would allow us to write a lot of the language in Arua directly, and open the door for a feature I mentioned once but originally discarded, complex/advanced math operators (#16).

RFC: Comments, Docs and Headers

Arua provides three types of comments: # for general comments, #: for documentation comments, and ## for header comments.

All comments must start with one of the three variations above, with no spaces between them, and extend to the end of the line (\n).


#

The singular # prefix is used for all generic comments, notes, explanations, and other remarks regarding any piece of code.

These comments are ignored completely. The parser may choose to skip to the next line upon encountering a # <space>.

Generic comments do not affect ordering requirements.

#:

The #: prefix is a documentation comment, and must come directly before a member declaration.

These comments can contain one or more docstring tags.

#: This is the main entry point
#: param args: The arguments passed to the application from the command line
#: return: An exit code, where 0 indicates success and all other numbers indicate failure
fn main(args [str]) i32
    ret 0

There are a few special docstring tags used by the Arua compiler to assist in debugging or error messages.

  • param <name>: - documents a parameter's intended use
  • return [val[, ...]]: - documents the meaning of the return value(s) (see below for explanation)

Docstring tags must be alphanumeric.

Any tag can have associated values, separated by commas, that should be treated as 'conditional' or 'identifying' strings to supplement the intended meaning of the tag itself.

return: tags may provide values that indicate what an exceptional return value means. This is useful in cases of "If X then this action occurred; otherwise, another action occurred."

For example, in the case of a theoretical .get() method that could return nil if the key doesn't exist or the value itself if the key does exist, the return docstring might look like:

#: return: The value associated with the key
#: return nil: The key could not be found

Docstring tags are unique based on their tag name and any associated value, including the lack of value. This means that param A: and param B: are unique to each other and thus can coexist in the same documentation comment.

Non-unique docstring tags are not allowed.

##

The double ## prefix indicates a header comment. These comments are used to discern things such as licenses, contact points, etc.

Header comments must come before any statement in the file.

Similar to documentation comments, header comments may contain tags that describe the content of the module. Refer to documentation comments regarding the format of tags.

These are the special header comment tag(s):

  • license: - a SPDX identifier that pertains to a license, or proprietary for proprietary software with custom or non-open-source licenses. In the event of the latter, it should be expected that a license is provided within the header or the associated repository, though this isn't enforced by Arua.

Header comments are intended to be used primarily by license validation tools, IDEs (to either hide or display differently the header comment), or by formatting utilities to ease the pain of determining what's a header and what is file-specific documentation.

RFC: Testing

I want testing in Arua to be built directly into the language, and not an afterthought.

The language is designed in a way to show intent. This gives us so many advantages, such as that we don't need to worry too much about the implementation behind the code.

Given this philosophy, we can do some cool things with tests.

struct SomeStruct
    c i32

on SomeStruct
    fn someMethod(a i32, b i32) i32
        this.c += a + b
        ret a * b

test SomeStruct.someMethod()
    this.c = 15

    case(10, 20)
        assert(result == 200)
        assert(this.c == 45)

    case(5, 15)
        assert(result == 75)
        assert(this.c == 35)

Thoughts? The above is just an idea. I want to be able to test side effects in a sane and easy way, without having to use mocks.

"Globals" or this would be pseudo objects that can be assigned any properties (similar to dynamic objects). Such constructs don't exist in normal code, but are provided by the test harness Arua employs.

// @sindresorhus

RFC: ABI

The generated Arua ABI is intended to be fully compatible with C.

For example, if the module tree (#4 (comment)) looks like this:

mymodule/foo.ar:

use std.io

pub fn someFunction(foo i32, bar str, qux f128)
    io.out.println("#{foo} #{bar} #{qux}")

then the generated C symbol would look like

_mymodule_foo__someFunction___i32tstd_lang_str8__f128
  • one _ separates the modules
  • two __ separates types (incl. nested types)
  • three ___ separates the symbol from the overload, denoting the start of a prototype
  • two __ or the end of the symbol after three ___ indicates the end of a type in a prototype

Though this forces the constraint of not allowing underscores in type names, public identifiers or zone names. If we even forced that zone names could not contain upper case letters, this could be cleaned up even more:

_mymodule_foo__someFunction___i32tstdZlangZstr8_f128

Or something to that effect.

The goal here is to create a type naming system that can easily be compiled down to C-compatible symbols -- including the ability to patch in C code as Arua functions or method implementations through linking .ar -> .o and .c -> .o code together.


Another thing to think about is the ability to specify other languages as plugin bindings. This is something I'd like to see in a language personally.

#[lang=c++, symbol="some::namespace::SomeFunction"]
fn someFunction(foo i32, bar str)
    # do something; can be accessed from C++ as `some::namespace::SomeFunction(...)`
#[lang=java, symbol="com.arua.package.SomeFunction"]
fn someFunction(foo i32, bar str)
    # do something; implements the Java native function `com.arua.package.SomeFunction`

etc.

Out of box compatibility with JVM

So up until today the initial plan was to support LLVM out of box initially, but I'd also like to look at out of box support for JVM via a project such as LLJVM.

There is a lot of overlap between dependency management and resolution, including semantics (zone names, etc). Realistically, we could map primitives to JVM types and dependencies to JVM packages/classes, but that might require some work.

I'm not up to snuff with JVM architecture or internals, but it would be very useful to be able to cross-compile Arua code as JVM and use JVM languages interoperably - perhaps even bridging that native interface gap by removing the need for explicitly different code units when having native functionality in JVM programs.

RFC: Numeric Literals

Numeric literals in Arua take a unique approach in terms of showing intent.

"Intent"

Arua aims to show intent. "Intent" can mean a few things, but with numerics (as well as all primitive types) we aim to convey how the number is to be used.

Some examples of the intent of numbers:

  • A boolean is either true or false. It fits into 1 bit.
  • An IPv4 port is anywhere from 0 to 65535. It fits into 16 bits.
  • A basic terminal color is one of 8 values. It fits into 3 bits.
  • A git commit lookup starts with the first byte in the SHA digest. It fits into 8 bits.

The point of intent is to represent these values as close to their intended size as possible.

Primitives Refresher

All types in Arua can be boiled down to a single numeric type, or a collection of numeric types. There are three primitive types:

  • unsigned integer
  • signed integer
  • floating point number

As well, each of the primitive types come in a few collective or descriptive states:

  • array ([T]) <#10>
  • mutable (!T) <#7>
  • optional (T?) <#7>
  • tuple ((T,)) <#9>

As well, these types can be typedef'd (#3) to create new types.

Decay of Common Types

In common languages, types such as boolean exist to express a single binary value. In Arua, it's simply u1. The type u1 shows immediate intent, and has the added bonus of being easily packed and optimized if used within structures.

Another common type is string. Arua has native unicode support (#11) and exposes such functionality through typedefs of [u8], [u16], [u32], and [u64] as str8 (aliased to str), str16, str32 and str64 respectively. This has the added bonus of allowing functions that take arrays of these types (or of any type) to also take strings, and allows the ability to index them using the subscript operator.

See #3 for a better explanation of typeof and alias.

Literal Notation

There are three representations of a numeric literal:

  • Basic (12345.34)
  • Scientific (144.3e76)
  • Radix (0xDEADBEEF)
Basic Notation

Basic notation is your simple notation. It supports both integers and floats in the following formats:

  • 1234
  • 1234.567
  • .1234

Negative values are prefixed with a -:

  • -1234
  • -1234.567
  • -.1234
Scientific Notation

Scientific notation is similar to simple notation, but allows for either base-2 or base-10 exponents to be specified:

  • 1234e15 - 1234 * 10 ^ 15
  • 1234b24 - 1234 ^ 24
Radix Notation

Radix notation expands upon the classic hexadecimal notation to allow for any base to be used in place of the 0x up to 36 ([0-9][A-Z]). 0x is still treated as 16x.

  • 0xAA / 16xAA = 170
  • 1x0000 = 4 (unary/talley system)
  • 2x0110 = 6 (binary)
  • 5x4311 = 581
  • 8x666 = 438
  • 10x123 = 123
  • 20xAG33FB0 = 691710220
  • 36xYZX1 = 1632853

All radixes spaces with character domains containing letters (hexadecimal, etc.) require that such letters are uppercase. This is to disambiguate literal format specifiers (below).

Radix numbers cannot be negative; however, since signed numbers are two's-complementary they can be represented as negative by ensuring the first bit is set to 1 and the type specifier (below) is i.

Literal Format

Each numeric literal has an optional format suffix it can supply. In the event a format is not specified, one of two things occurs:

  • R-values assume the type and width of the L-value
  • L-values cause an error (rare cases where this actually happens)

Literal formats consist of a type specifier character and a bit width.

The type specifiers are as follow:

  • i - signed integer
  • u - unsigned integer
  • f - floating point number

As of now, preliminary concept implementations of numeric literals caps bit widths at 4096 as anything beyond that is simply absurd for classical computers (as opposed to, say, quantum computers). Bit widths must be greater than 0.

Floating point width specifiers must be one of 16, 32, 64, or 128.

Literal values and their suffixes are separated by a colon (:).

Some example numbers with their format suffixes:

  • 1234:u16
  • `166.9:f64
  • 0xDEADBEEF:u32
  • 36xZY:u64
  • 0.1:f128

Builtins

Currently, there are two builtins: true and false.

  • const true u1 = 1:u1
  • const false u1 = 0:u1

Perks in Semantics

At first, the advantages of such extensive notations and width specification may not be clear. However, bitwise operations benefit greatly from such flexibility:

## Allow writes from 
fn allowWrites(mode u16)
    return mode | 8x222:u9
    # -- or ---
    return mode | 2x010010010:u9

AruaDoc comment RFC #13

Unlike C-family languages, no longer do you have to guess or assert how big an integer is. Just use it how you need to and let the compiler optimize for you.

Perks in Optimization

Some of these points are better described in the Bit-Field RFC at #6 (#6 (comment))

As well as semantic benefits, when numeric types are clustered together (e.g. in structs), we can do some pretty extensive "tetris"-like packing optimizations for data that won't be persisted. It also gives us flexibility to optimize for size, or for speed, since we can perform some tricky alignment strategies or generate bitwise instructions in order to access those properties.

Since we perform these optimizations ourselves, we can then begin to generate C-family struct source code with bit-fields or other alignment optimizations in place to create compatible data structures with the same identifiers given to the properties to be compiled into existing code bases, allowing very flexible protocol implementations to be built for example.

Optimizations can also occur on systems with uncommon word sizes or systems that might provide better alignment strategies.

Bounds and Defined Behavior

Unlike C, integer overflow and conversion are well defined.

Conversion

The golden rule is to remember that type casting performs logic; assignment does not. Below are some examples and their C equivalents.

Signed to Unsigned (assignment):

foo i32 = -15
bar u32 = foo # 
int foo  = -15;
unsigned int bar = *((unsigned int *)&foo); // 4294967281 - preserves sign bit but now read as unsigned integer

Signed to Unsigned (typecast):

foo i32 = -15
bar u32 = foo as u32
int foo = -15;
unsigned int bar = abs(foo);

Signed to Signed narrow (assignment):

foo i32 = -15
bar i16 = foo # error - cannot narrow

Signed to Signed narrowing (typecast):

foo i32 = -70000
bar i16 = foo as i16 # -4464 - sign is preserved, but modulo (2 ** sizeof(type) / 2) - 1 is used
int foo = -15;
short bar = (short) foo;

RFC: Support for Bit Fields

I just had an idea that I know is in C/C++ that isn't mentioned in the spec.
Will Arua have support for bit fields? Or will that be controlled via the type specifically, as mentioned in the README and https://github.com/arua-lang/proposal/blob/master/arua.grammar#L9? Or will Arua have support for syntax like:

i32 varName : 1; // Fill 4 bytes
f16 floatName : 4; // Fill to 8 bytes

Or something to this effect?
I'm asking because I use bit fields quite a bit in C and find them helpful for controlling bit spacing in memory.

Verbosity of `this`

I want to enforce the use of this. in methods, but I recognize that it's very much a burden to type it out all the time.

However, Ruby has as a similar syntactical approach, the @ sign - which, when I used it in Coffeescript, I quite liked.

It's succinct, unique and easy for TextMate-based highlighters to show different colors for too.

RFC: Generics

This is a continuation of a comment stream on #3.

Considering the allowance of generic types on functions and/or types.

Both C++ and Java support per-function generics, so this wouldn't be anything new.

template <typename T>
std::shared_ptr makeNew() {
        return std::shared_ptr<T>(new T());
}

and

public static <T extends SomeType> T makeNewSomeType() {
        return new T();
}

But the differences lay within how they're conveyed and how they're actually implemented. I think the result would be like a love-child between the two.

fn calculateChecksum<T>(object T) i64
        # ...

However, this works in a pretty obvious manner with inheritance-based languages, but trait-based languages are somewhat new (Rust tries to do it but the syntax is incredibly unclear on first-view).

We have a few ways that one could go about this. Let's take string coercion (i.e. toString()) as an example.

With traits, we could easily create a ToString or Stringable trait within the standard library. Any type implementing this trait would then be able to coerce itself to a string. Where would we need generics here?

On second thought this might be a poor example; C++ does this in a semi-standard way by doing a global overload of operator<< on basic_stream, and Java does this by using an Object method that can be overridden.

As of this writing I can't think of an example where generics would be more powerful than, or solve a problem that, traits simply cannot do. I'd love to see some examples.

RFC: type vs alias

type and alias both create a new type identifier in the unit scope, however type creates a new type whereas alias creates a new name for an existing type.


type [u8] as str8

type creates a 'copy' of an existing type and gives it a new name. New traits can be applied to new type definitions without affecting the original type.

As seen above, this is the critical element used by the str8 type - a UTF-8 string. str now inherits the array property .length, and now allows more traits and methods to be applied to it without those types being applied to the [u8] type itself.

Remember that array types are their own type - [u8] is not the same type as u8 (see #12)


alias str8 as str

alias gives a new name to an existing type without creating a new type altogether. All applications of traits on str will affect str8.

The above example is what creates the built in str type - str is really just str8.


Here is a truth table for type vs alias:

T as U type alias
T is U False True
U assignable to T True True
(U.foo = fn()) == T.foo False True

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.