qix- / arua-meta Goto Github PK
View Code? Open in Web Editor NEWStandards, RFCs and discussion of the Arua language
Standards, RFCs and discussion of the Arua language
TODO
This could also be called "[Core] Concepts".
TODO
Arua's primitive types consist of integers (unsigned/signed), floating point numbers, and arrays of those types. To be clear, arrays of types (including arrays of primitives) are their own individual types and are not directly coercible to their primitive counterparts.
All declarations (foo Type
) are considered immutable without the mutable type qualifier (foo !Type
). As well, all declarations are considered mandatory (cannot be nil
) unless they have the optional type qualifier (foo Type?
). These qualifiers can be combined (foo !Type?
). Type qualifiers do not create or affect any type (they describe individual usages of existing types within a statement, rather than define the type itself).
Implicit Conversions are intuitive and simple in Arua. For a unit with a type Child
that is a sub-type of Base
(e.g. via a typedef
):
row = column | Base | Child | !Base | !Child | Base? | Child? | !Base? | !Child? | nil |
---|---|---|---|---|---|---|---|---|---|
Base | ✓* | ✓* | ✓* | ✓* | |||||
Child | ✓* | ✓* | |||||||
!Base | ✓ | ✓ | ✓ | ✓ | |||||
!Child | ✓ | ✓ | |||||||
Base? | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* | ✓* |
Child? | ✓* | ✓* | ✓* | ✓* | ✓* | ||||
!Base? | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
!Child? | ✓ | ✓ | ✓ | ✓ | ✓ |
* Only if it's the initial assignment
Optional qualifiers force a check of nil for all types before they can be referenced.
foo Type? = getFoo()
foo.someMethod() # compilation error
# ---
foo Type? = getFoo()
if foo != nil
foo.someMethod() # compiles OK
This check honors visibility (#19). Calls into functions check that operate on optional types must be in code paths that have already checked for nil
. Unit-public functions always assume such checks have not been made.
pub fn publicFunction(foo Type?)
privateFunction(foo) # error - privateFunction() argument `bar` is not optional protected
fn privateFunction(bar Type?)
bar.someMethod() # doesn't check for `nil`
pub fn publicFunction(foo Type?)
if foo != nil
privateFunction(foo) # compiles okay
fn privateFunction(bar Type?)
bar.someMethod() # doesn't check for `nil`
All signed types (i32
, i16
, i8
, iXX
) are Two's Complement representations.
All floating point types (f32
, f64
, fXX
) are IEEE 754 compliant floating point numbers.
Method forwarding in Arua is the idea that, since it's planned that there is no such thing as inheritance and only composition, we make method forwarding very easy.
struct Eye
# ...
on Eye
fn look(...)
# look at something
struct Nose
# ...
on Nose
fn sniff(...)
# sniff something
struct Face
leftEye Eye
rightEye Eye
nose Nose
on Face
fn look(...)
this.leftEye.look(...)
this.rightEye.look(...)
use this.nose.sniff as sniff # (Face()).sniff() now calls (Face()).nose.sniff()
TODO
A note about enums: Should have nested enum support:
enum AstType
STATEMENT
TYPE
SCALAR
ARRAY
POINTER
type AstType = TYPE::SCALAR
Or something.
The std.math
library is a standard library that provides functions and operators.
TODO
I want to provide standard implementations of FFT, operators that use mathematical symbols, etc. For instance,
use std.math.ops.*
a i32 = 4
b f32 = √a
Any ideas on this would be nice. Fleshing out the standard library isn't something I want to mess with yet as I have a strong opinion about separating the language from the libraries.
Should all code be forced into a zone (namespace)? The dependency system already uses a folder structure to segregate into zones, similar to Java. However, Java allows the use of a default package, which doesn't allow any other code to reference members in the default package.
TODO
Should talk about switching out how string interpolation is implemented on a per-module basis, or per block via compiler details, etc.
Idea is that I don't want to run into the 'magic'-ness of Python without having complete control over what the magic does.
With Python, we have PEP8 which is a 'governmental' style guide imposed (or at least heavily pushed) by the Python org itself. It seems to its job of making others look poorly upon you if you don't use it.
This brings to question, should a language impose a style on its users? My compulsory response says "no" but thinking about it further I have concluded it's really up to the language.
For example, there are a lot of subjective things about braced languages: do braces go on the same line, or the next line? Do closing braces go on the last line or on their own line? These are things often enforced by a style guide or a linter rather than the compiler itself.
What if that wasn't the case?
I feel like a lot of decisions Arua is making about the language itself are resulting in fairly clean syntax. A lot of symbols not essential to the language have been taken out. An example are function definitions:
fn someFunction(arg1 i32, arg2 str) f32
# ...
In the above example, there isn't much to really argue about - it boils down to just a few things:
(
after someFunction
?#
in the comment?While these four things are important, quantitatively these are much less than other languages.
With the option disabling it, should Arua enforce a style by default?
TODO
TODO
Swift has a very interesting semantic for creating custom operators. I like the idea, though I am a skeptical C++ adept and have seen a few horrific examples of it being done. However, I'm not opposed to the idea completely and think it could have some amazing results.
One of the constraints in place would be that custom operators can only be defined where the associative type itself is defined. For builtins or the standard library, this constraint can be turned off (as it can be with regular code via compiler details).
This would allow us to write a lot of the language in Arua directly, and open the door for a feature I mentioned once but originally discarded, complex/advanced math operators (#16).
TODO
T?
(similar to the option pattern)!T
Otherwise, required and immutable. Can be combined.
Arua provides three types of comments: #
for general comments, #:
for documentation comments, and ##
for header comments.
All comments must start with one of the three variations above, with no spaces between them, and extend to the end of the line (\n
).
The singular #
prefix is used for all generic comments, notes, explanations, and other remarks regarding any piece of code.
These comments are ignored completely. The parser may choose to skip to the next line upon encountering a # <space>
.
Generic comments do not affect ordering requirements.
The #:
prefix is a documentation comment, and must come directly before a member declaration.
These comments can contain one or more docstring tags.
#: This is the main entry point
#: param args: The arguments passed to the application from the command line
#: return: An exit code, where 0 indicates success and all other numbers indicate failure
fn main(args [str]) i32
ret 0
There are a few special docstring tags used by the Arua compiler to assist in debugging or error messages.
param <name>:
- documents a parameter's intended usereturn [val[, ...]]:
- documents the meaning of the return value(s) (see below for explanation)Docstring tags must be alphanumeric.
Any tag can have associated values, separated by commas, that should be treated as 'conditional' or 'identifying' strings to supplement the intended meaning of the tag itself.
return:
tags may provide values that indicate what an exceptional return value means. This is useful in cases of "If X then this action occurred; otherwise, another action occurred."
For example, in the case of a theoretical .get()
method that could return nil
if the key doesn't exist or the value itself if the key does exist, the return docstring might look like:
#: return: The value associated with the key
#: return nil: The key could not be found
Docstring tags are unique based on their tag name and any associated value, including the lack of value. This means that param A:
and param B:
are unique to each other and thus can coexist in the same documentation comment.
Non-unique docstring tags are not allowed.
The double ##
prefix indicates a header comment. These comments are used to discern things such as licenses, contact points, etc.
Header comments must come before any statement in the file.
Similar to documentation comments, header comments may contain tags that describe the content of the module. Refer to documentation comments regarding the format of tags.
These are the special header comment tag(s):
license:
- a SPDX identifier that pertains to a license, or proprietary
for proprietary software with custom or non-open-source licenses. In the event of the latter, it should be expected that a license is provided within the header or the associated repository, though this isn't enforced by Arua.Header comments are intended to be used primarily by license validation tools, IDEs (to either hide or display differently the header comment), or by formatting utilities to ease the pain of determining what's a header and what is file-specific documentation.
I want testing in Arua to be built directly into the language, and not an afterthought.
The language is designed in a way to show intent. This gives us so many advantages, such as that we don't need to worry too much about the implementation behind the code.
Given this philosophy, we can do some cool things with tests.
struct SomeStruct
c i32
on SomeStruct
fn someMethod(a i32, b i32) i32
this.c += a + b
ret a * b
test SomeStruct.someMethod()
this.c = 15
case(10, 20)
assert(result == 200)
assert(this.c == 45)
case(5, 15)
assert(result == 75)
assert(this.c == 35)
Thoughts? The above is just an idea. I want to be able to test side effects in a sane and easy way, without having to use mocks.
"Globals" or this
would be pseudo objects that can be assigned any properties (similar to dynamic objects). Such constructs don't exist in normal code, but are provided by the test harness Arua employs.
The generated Arua ABI is intended to be fully compatible with C.
For example, if the module tree (#4 (comment)) looks like this:
mymodule/foo.ar:
use std.io
pub fn someFunction(foo i32, bar str, qux f128)
io.out.println("#{foo} #{bar} #{qux}")
then the generated C symbol would look like
_mymodule_foo__someFunction___i32tstd_lang_str8__f128
_
separates the modules__
separates types (incl. nested types)___
separates the symbol from the overload, denoting the start of a prototype__
or the end of the symbol after three ___
indicates the end of a type in a prototypeThough this forces the constraint of not allowing underscores in type names, public identifiers or zone names. If we even forced that zone names could not contain upper case letters, this could be cleaned up even more:
_mymodule_foo__someFunction___i32tstdZlangZstr8_f128
Or something to that effect.
The goal here is to create a type naming system that can easily be compiled down to C-compatible symbols -- including the ability to patch in C code as Arua functions or method implementations through linking .ar -> .o
and .c -> .o
code together.
Another thing to think about is the ability to specify other languages as plugin bindings. This is something I'd like to see in a language personally.
#[lang=c++, symbol="some::namespace::SomeFunction"]
fn someFunction(foo i32, bar str)
# do something; can be accessed from C++ as `some::namespace::SomeFunction(...)`
#[lang=java, symbol="com.arua.package.SomeFunction"]
fn someFunction(foo i32, bar str)
# do something; implements the Java native function `com.arua.package.SomeFunction`
etc.
TODO
So up until today the initial plan was to support LLVM out of box initially, but I'd also like to look at out of box support for JVM via a project such as LLJVM.
There is a lot of overlap between dependency management and resolution, including semantics (zone names, etc). Realistically, we could map primitives to JVM types and dependencies to JVM packages/classes, but that might require some work.
I'm not up to snuff with JVM architecture or internals, but it would be very useful to be able to cross-compile Arua code as JVM and use JVM languages interoperably - perhaps even bridging that native interface gap by removing the need for explicitly different code units when having native functionality in JVM programs.
TODO
Numeric literals in Arua take a unique approach in terms of showing intent.
Arua aims to show intent. "Intent" can mean a few things, but with numerics (as well as all primitive types) we aim to convey how the number is to be used.
Some examples of the intent of numbers:
The point of intent is to represent these values as close to their intended size as possible.
All types in Arua can be boiled down to a single numeric type, or a collection of numeric types. There are three primitive types:
u
nsigned integeri
ntegerf
loating point numberAs well, each of the primitive types come in a few collective or descriptive states:
As well, these types can be typedef
'd (#3) to create new types.
In common languages, types such as boolean
exist to express a single binary value. In Arua, it's simply u1
. The type u1
shows immediate intent, and has the added bonus of being easily packed and optimized if used within structures.
Another common type is string
. Arua has native unicode support (#11) and exposes such functionality through typedef
s of [u8]
, [u16]
, [u32]
, and [u64]
as str8
(alias
ed to str
), str16
, str32
and str64
respectively. This has the added bonus of allowing functions that take arrays of these types (or of any type) to also take strings, and allows the ability to index them using the subscript operator.
See #3 for a better explanation of
typeof
andalias
.
There are three representations of a numeric literal:
12345.34
)144.3e76
)0xDEADBEEF
)Basic notation is your simple notation. It supports both integers and floats in the following formats:
1234
1234.567
.1234
Negative values are prefixed with a -
:
-1234
-1234.567
-.1234
Scientific notation is similar to simple notation, but allows for either base-2 or base-10 exponents to be specified:
1234e15
- 1234 * 10 ^ 15
1234b24
- 1234 ^ 24
Radix notation expands upon the classic hexadecimal notation to allow for any base to be used in place of the 0x
up to 36 ([0-9][A-Z]
). 0x
is still treated as 16x
.
0xAA
/ 16xAA
= 1701x0000
= 4 (unary/talley system)2x0110
= 6 (binary)5x4311
= 5818x666
= 43810x123
= 12320xAG33FB0
= 69171022036xYZX1
= 1632853All radixes spaces with character domains containing letters (hexadecimal, etc.) require that such letters are uppercase. This is to disambiguate literal format specifiers (below).
Radix numbers cannot be negative; however, since signed numbers are two's-complementary they can be represented as negative by ensuring the first bit is set to 1
and the type specifier (below) is i
.
Each numeric literal has an optional format suffix it can supply. In the event a format is not specified, one of two things occurs:
Literal formats consist of a type specifier character and a bit width.
The type specifiers are as follow:
i
- signed i
ntegeru
- u
nsigned integerf
- f
loating point numberAs of now, preliminary concept implementations of numeric literals caps bit widths at 4096 as anything beyond that is simply absurd for classical computers (as opposed to, say, quantum computers). Bit widths must be greater than 0.
Floating point width specifiers must be one of 16
, 32
, 64
, or 128
.
Literal values and their suffixes are separated by a colon (:
).
Some example numbers with their format suffixes:
1234:u16
0xDEADBEEF:u32
36xZY:u64
0.1:f128
Currently, there are two builtins: true
and false
.
const true u1 = 1:u1
const false u1 = 0:u1
At first, the advantages of such extensive notations and width specification may not be clear. However, bitwise operations benefit greatly from such flexibility:
## Allow writes from
fn allowWrites(mode u16)
return mode | 8x222:u9
# -- or ---
return mode | 2x010010010:u9
AruaDoc comment RFC #13
Unlike C-family languages, no longer do you have to guess or assert how big an integer is. Just use it how you need to and let the compiler optimize for you.
Some of these points are better described in the Bit-Field RFC at #6 (#6 (comment))
As well as semantic benefits, when numeric types are clustered together (e.g. in struct
s), we can do some pretty extensive "tetris"-like packing optimizations for data that won't be persisted. It also gives us flexibility to optimize for size, or for speed, since we can perform some tricky alignment strategies or generate bitwise instructions in order to access those properties.
Since we perform these optimizations ourselves, we can then begin to generate C-family struct
source code with bit-fields or other alignment optimizations in place to create compatible data structures with the same identifiers given to the properties to be compiled into existing code bases, allowing very flexible protocol implementations to be built for example.
Optimizations can also occur on systems with uncommon word sizes or systems that might provide better alignment strategies.
Unlike C, integer overflow and conversion are well defined.
The golden rule is to remember that type casting performs logic; assignment does not. Below are some examples and their C equivalents.
Signed to Unsigned (assignment):
foo i32 = -15
bar u32 = foo #
int foo = -15;
unsigned int bar = *((unsigned int *)&foo); // 4294967281 - preserves sign bit but now read as unsigned integer
Signed to Unsigned (typecast):
foo i32 = -15
bar u32 = foo as u32
int foo = -15;
unsigned int bar = abs(foo);
Signed to Signed narrow (assignment):
foo i32 = -15
bar i16 = foo # error - cannot narrow
Signed to Signed narrowing (typecast):
foo i32 = -70000
bar i16 = foo as i16 # -4464 - sign is preserved, but modulo (2 ** sizeof(type) / 2) - 1 is used
int foo = -15;
short bar = (short) foo;
I just had an idea that I know is in C/C++ that isn't mentioned in the spec.
Will Arua have support for bit fields? Or will that be controlled via the type specifically, as mentioned in the README and https://github.com/arua-lang/proposal/blob/master/arua.grammar#L9? Or will Arua have support for syntax like:
i32 varName : 1; // Fill 4 bytes
f16 floatName : 4; // Fill to 8 bytes
Or something to this effect?
I'm asking because I use bit fields quite a bit in C and find them helpful for controlling bit spacing in memory.
I want to enforce the use of this.
in methods, but I recognize that it's very much a burden to type it out all the time.
However, Ruby has as a similar syntactical approach, the @
sign - which, when I used it in Coffeescript, I quite liked.
It's succinct, unique and easy for TextMate-based highlighters to show different colors for too.
TODO
This is a continuation of a comment stream on #3.
Considering the allowance of generic types on functions and/or types.
Both C++ and Java support per-function generics, so this wouldn't be anything new.
template <typename T>
std::shared_ptr makeNew() {
return std::shared_ptr<T>(new T());
}
and
public static <T extends SomeType> T makeNewSomeType() {
return new T();
}
But the differences lay within how they're conveyed and how they're actually implemented. I think the result would be like a love-child between the two.
fn calculateChecksum<T>(object T) i64
# ...
However, this works in a pretty obvious manner with inheritance-based languages, but trait-based languages are somewhat new (Rust tries to do it but the syntax is incredibly unclear on first-view).
We have a few ways that one could go about this. Let's take string coercion (i.e. toString()
) as an example.
With traits, we could easily create a ToString
or Stringable
trait within the standard library. Any type implementing this trait would then be able to coerce itself to a string. Where would we need generics here?
On second thought this might be a poor example; C++ does this in a semi-standard way by doing a global overload of
operator<<
onbasic_stream
, and Java does this by using anObject
method that can be overridden.
As of this writing I can't think of an example where generics would be more powerful than, or solve a problem that, traits simply cannot do. I'd love to see some examples.
type
and alias
both create a new type identifier in the unit scope, however type
creates a new type whereas alias
creates a new name for an existing type.
type [u8] as str8
type
creates a 'copy' of an existing type and gives it a new name. New traits can be applied to new type definitions without affecting the original type.
As seen above, this is the critical element used by the str8
type - a UTF-8 string. str
now inherits the array property .length
, and now allows more traits and methods to be applied to it without those types being applied to the [u8]
type itself.
Remember that array types are their own type -
[u8]
is not the same type asu8
(see #12)
alias str8 as str
alias
gives a new name to an existing type without creating a new type altogether. All applications of traits on str
will affect str8
.
The above example is what creates the built in str
type - str
is really just str8
.
Here is a truth table for type
vs alias
:
T as U |
type |
alias |
---|---|---|
T is U | False | True |
U assignable to T | True | True |
(U.foo = fn()) == T.foo | False | True |
TODO Both method visibility as well as unit visibility
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.