Git Product home page Git Product logo

starlark's Introduction

Starlark

Build status

Overview

Starlark (formerly known as Skylark) is a language intended for use as a configuration language. It was designed for the Bazel build system, but may be useful for other projects as well. This repository is where Starlark features are proposed, discussed, and specified. It contains information about the language, including the specification. There are multiple implementations of Starlark.

Starlark is a dialect of Python. Like Python, it is a dynamically typed language with high-level data types, first-class functions with lexical scope, and garbage collection. Independent Starlark threads execute in parallel, so Starlark workloads scale well on parallel machines. Starlark is a small and simple language with a familiar and highly readable syntax. You can use it as an expressive notation for structured data, defining functions to eliminate repetition, or you can use it to add scripting capabilities to an existing application.

A Starlark interpreter is typically embedded within a larger application, and the application may define additional domain-specific functions and data types beyond those provided by the core language. For example, Starlark was originally developed for the Bazel build tool. Bazel uses Starlark as the notation both for its BUILD files (like Makefiles, these declare the executables, libraries, and tests in a directory) and for its macro language, through which Bazel is extended with custom logic to support new languages and compilers.

Design Principles

  • Deterministic evaluation. Executing the same code twice will give the same results.
  • Hermetic execution. Execution cannot access the file system, network, system clock. It is safe to execute untrusted code.
  • Parallel evaluation. Modules can be loaded in parallel. To guarantee a thread-safe execution, shared data becomes immutable.
  • Simplicity. We try to limit the number of concepts needed to understand the code. Users should be able to quickly read and write code, even if they are not experts. The language should avoid pitfalls as much as possible.
  • Focus on tooling. We recognize that the source code will be read, analyzed, modified, by both humans and tools.
  • Python-like. Python is a widely used language. Keeping the language similar to Python can reduce the learning curve and make the semantics more obvious to users.

Tour

The code provides an example of the syntax of Starlark:

# Define a number
number = 18

# Define a dictionary
people = {
    "Alice": 22,
    "Bob": 40,
    "Charlie": 55,
    "Dave": 14,
}

names = ", ".join(people.keys())  # Alice, Bob, Charlie, Dave

# Define a function
def greet(name):
    """Return a greeting."""
    return "Hello {}!".format(name)

greeting = greet(names)

above30 = [name for name, age in people.items() if age >= 30]

print("{} people are above 30.".format(len(above30)))

def fizz_buzz(n):
    """Print Fizz Buzz numbers from 1 to n."""
    for i in range(1, n + 1):
        s = ""
        if i % 3 == 0:
            s += "Fizz"
        if i % 5 == 0:
            s += "Buzz"
        print(s if s else i)

fizz_buzz(20)

If you've ever used Python, this should look very familiar. In fact, the code above is also a valid Python code. Still, this short example shows most of the language. Starlark is indeed a very small language.

For more information, see:

Build API

The first use-case of the Starlark language is to describe builds: how to compile a C++ or a Scala library, how to build a project and its dependencies, how to run tests. Describing a build can be surprisingly complex, especially as a codebase mixes multiple languages and targets multiple platforms.

In the future, this repository will contain a complete description of the build API used in Bazel. The goal is to have a clear specification and precise semantics, in order to interoperate with other systems. Ideally, other tools will be able to understand the build API and take advantage of it.

Evolution

Read about the design process if you want to suggest improvements to the specification. Follow the mailing-list to discuss the evolution of Starlark.

Implementations, tools, and users

See the Starlark implementations, tools, and users page.

starlark's People

Contributors

0xflotus avatar adonovan avatar aeldidi avatar albinkc avatar austince avatar brandjon avatar clashthebunny avatar colindean avatar dslomov avatar eltociear avatar evie404 avatar fmeum avatar hfm avatar illicitonion avatar jornh avatar junhan-z avatar kelvins avatar laurentlb avatar lgalfaso avatar mattbroussard avatar mpiekutowski avatar ndmitchell avatar nguyentruongtho avatar philwo avatar quarz0 avatar stepancheg avatar tetromino avatar vladmos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

starlark's Issues

Starlark mailing list is private or missing (part deux)

This is a duplicate of: #74

I'm only creating a new issue because the old issue is closed but people are still reporting this problem.

Here's what people see when they try to click the link:

This group either doesn't exist, or you don't have permission to access it. If you're sure this group exists, contact the Owner of the group and ask them to give you access.

Screen Shot 2019-10-29 at 7 51 47 PM

Clarify which symbols are exported

A change in Bazel (bazelbuild/bazel#5636) affects which symbols from a Starlark module are exposed and can be loaded. More specifically, with this change, symbols that are loaded in a module are not exported (another module cannot load it).

@alandonovan asked that we clarify what is considered as "exported".

The behavior we're going to use in Bazel is: a global value is exported if:

  • it was initialized with = or def (but not load)
  • its name doesn't start with _.

Example:

load("module", "a")

b, c, _d = 2, 3, 4

def e(): pass
def _f(): Pass

Symbols b, c, and e are exported. They can be loaded from another module.
Symbols a, _d, and _f are not exported.

Formatted string literals (PEP 498)

PEP 498 introduced formatted string literals which make this possible:

>>> name = "Star-Lord"
>>> f"Starlark, say hi to {name}"
'Starlark, say hi to Star-Lord'

It would be nice to have it as an alternative to the regular string formatting, especially since modern languages like Kotlin and Swift make this formatting convenient and less error-prone.

val name = "Star-Lord"
val greeting = "Starlark, say hi to $name"
let name = "Star-Lord"
let greeting = "Starlark, say hi to \(name)"

However, I can completely understand that sometimes a language should be simple. So this is just an idea.

spec: should print accept keyword arguments?

The spec https://github.com/bazelbuild/starlark/blob/master/spec.md#print says print accepts keyword arguments. This text came from the Go implementation, whose behavior it describes. The Java implementation rejects keyword arguments to print. We should decide whether we prefer the spec as is, or the Java implementation, and make all docs and code consistent with that. Neither version of Python allows kwargs, so probably the right thing do to is disallow it.

Add list.sort

The mutating list.sort method is occasionally useful for efficiency reasons.

In particular, under --incompatible_depset_is_not_iterable, there doesn't seem to be any way to go from a nested set to a sorted collection with less than 2 copies. One must write sorted(a_depset.to_list()).

Specify evaluation order of function arguments

Reported by @alandonovan

This program prints each argument in the call to f as it is evaluated.

def id(x):
        print(x)
        return x

def f(*args, **kwargs):
        print(args, kwargs)

f(id(1), id(2), x=id(3), *[id(4)], y=id(5), **dict(z=id(6)))

Its results vary across all implementations:

Python2: 1 2 3 5 4 6 (*args and **kwargs evaluated last)
Python3: 1 2 4 3 5 6 (positional args evaluated before named args)
Starlark-in-Java: 1 2 3 4 5 6 (lexical order)
Starlark-in-Go: crashes (github.com/google/skylark/issues/135)

The spec currently says nothing about argument evaluation order. The Starlark-in-Java semantics are the cleanest of the three but are the most complicated and least efficient to implement in a compiler, which is why the Python compilers do what they do. The problem with the Java semantics is that it requires the compiler to generate code for a sequence of positional arguments, named arguments, an optional *args, more named arguments, and an optional **kwargs, and to provide the callee with a map of the terrain. By contrast, both of the Python approaches are simple to compile: Python2 always passes *args and *kwargs at the end, and Python3 always pushes the *args list before the first named.

One way to finesse the problem without loss of generality would be to specify Python2 semantics but to statically forbid named arguments after *args, such as y=id(5) in the example. (This mirrors the static check on the def statement, which requires that *args and **kwargs come last.

Add "generator_function_location" attribute

Terminology:
generator = rule or macro.

Request:
Add a "generator_function_location" attribute would contain the path of the .bzl file which defined the generator which generated the current rule. Line number is not really needed.

Justification:
The location of the .bzl file containing the generator together with the name of the generator results in a fully qualified name. If the generator_function_location attribute was added, it is possible to figure out that two rule instances were actually generated by different generators.

Eg.

foo/one.bzl: defines a rule "abc"
foo/two.bzl: defines a rule "abc"
foo/both.bzl:
load("//foo:x.bzl", one_abc = "abc")
load("//foo:y.bzl", two_abc = "abc")

bar/BUILD
load("//foo:both.bzl", "one_abc", "two_abc")
load("//foo:one.bzl", "abc")
abc(name="p", ...)
one_abc(name="q", ...)
two_abc(name="r", ...)

Without a generator_function_location it is not possible to know that the rules "p" and "q" were generated by the same rule, but "r" was not. Currently all three rules will have generator_function = "abc".

Request for feedback: Indented template/render literals

For my own purposes I require some sort of improved version of python fstrings, consider the following:

Render literal

def some_template(a):
  return ``
    <html>
      <p>echo `{a}</p>
    <html>
  ``

def another_template(cfile):
  return ``
    #! /bin/sh
    cc -c "`{cfile}"
  ``

sugar for

def some_template(a):
  return ["<html>\n  <p>echo ", a, "</p>\n<html>"]

def another_template(a):
  return [""#! /bin/sh\ncc -c \"", cfile, "\"\n"]

Postfix Render operators

html_escape ``
    <html>
      <p>echo `{a+"<script>..."}</p>
    <html>
  ``

sugar for

html_escape(["<html>\n  <p>echo ", a+"<script>...", "  </p>\n<html>"])

The key features being embedding of expressions and taking indentation of the code into account so code can be formatted nicely.

I personally think this would be extremely useful for generation of many types of text, especially code, and replace many uses of string % or format where the number of arguments grows.

The utility is also improved as you are not required to render to string (another one of my requirements).

So my questions are:

Is starlark willing to break with python for things like this? What are peoples general feedback or thoughts on the idea?

Thanks for reading

Spec is unclear on / operator

The spec states that / is a binary operator (a Binop), but doesn't give it any meaning (it's not considered an Arithmetic operation.

It is included as an Augmented assignment (/=), but without behavioural specification.

These all appear in the grammar reference.

Should / be removed from the grammar, or should its behaviour be specified?

hash definition

Rust implementation:

$ target/debug/starlark-repl -c 'print(hash("ab"))'
7391266126405785627

Java implementation:

$ bazel run :Starlark -- -c 'print(hash("ab"))'
3105

Go implementation:

$ starlark -c 'print(hash("ab"))'
2230358275

For the Java implementation, the behavior is documented here: https://docs.bazel.build/versions/master/skylark/lib/globals.html#hash

Should other implementations use the same definition?

should stringยทsplitlines argument be a bool or be interpreted as a bool?

The spec for stringยทsplitlines reads:

The optional argument, keepends, is interpreted as a Boolean.

Python and starlark-rust require that keepends be a boolean. starlark-go interprets keepends as a boolean:

$ python2 -c "''.splitlines('0')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: an integer is required
(exit 1)

$ python3 -c "''.splitlines('0')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: an integer is required (got type str)
(exit 1)

$ starlark-go -c "''.splitlines('0')"

(exit 0)

$ starlark-rust -c "''.splitlines('0')"
error[CV02]: string.splitlines() expect a bool as first parameter whilegot a value of type string.
 --> [command flag]:1:1
  |
1 | ''.splitlines('0')
  | ^^^^^^^^^^^^^^^^^^ type string while expected bool

(exit 2)

Should we switch the spec to match the most common implementation, and then fix the Go implementation?

spec: clarify comprehension scope issues

The Go implementation of Starlark follows the Python3 semantics---specifically, the outermost for loop's sequence expression is evaluated outside the new lexical block---and its spec describes this in some detail (https://github.com/bazelbuild/starlark/blob/master/spec.md#comprehensions). By contrast, the Java implementation evaluates the for-operand sequence in the outer lexical block, which resembles Python2.

So:

$ cat a.star
x = [1]
dummy = [(1,)]
print([x for x in x])
print([x for x in x for _ in dummy])

$ python2 a.star
[1]
Traceback (most recent call last):
  File "a.star", line 4, in <module>
    print([x for x in x for _ in dummy])
TypeError: 'int' object is not iterable

$ python3 a.star
[1]
[1]

$ starlark a.star # Go
[1]
[1]

$ blaze-bin/third_party/bazel/src/main/java/com/google/devtools/starlark/Starlark a.star
[1]
Traceback (most recent call last):
        File "", line 4
                print([x for x in x for _ in dummy])
        File "", line 4, in print
                x
local variable 'x' is referenced before assignment.

We should decide on the semantics and make the implementations agree. I propose we aim for the Python3-compatible behavior, for "least surprise". (The difference is entirely in the name resolver.)

inconsistent acceptance of trailing commas in argument lists

Copying @josharian's post at google/starlark-go#83:

$ starlark
Welcome to Starlark (go.starlark.net)
>>> def f(*args, **kwargs): pass
... 
>>> f(1,)
>>> f(*(1,2),)
... 
<stdin>:1:11: got ')', want argument
>>> f(a=1,)
>>> f(**{"a": 1},)
... 
<stdin>:1:15: got ')', want argument
>>>  

Observe that for plain positional and keyword args, a trailing comma is accepted. The other two cases are not.

Python 2 behaves the same as starlark-go; Python 3 accepts all forms.

I prefer the Python 3 behavior.

$ python2
Python 2.7.15 (default, Jun 17 2018, 12:46:58) 
>>> def f(*args, **kwargs): pass
... 
>>> f(1,)
>>> f(*(1,2),)
  File "<stdin>", line 1
    f(*(1,2),)
             ^
SyntaxError: invalid syntax
>>> f(a=1,)
>>> f(**{"a": 1},)
  File "<stdin>", line 1
    f(**{"a": 1},)
                ^
SyntaxError: invalid syntax
$ python3
Python 3.7.1 (default, Nov 14 2018, 12:09:13) 
>>> def f(*args, **kwargs): pass
... 
>>> f(1,)
>>> f(*(1,2),)
>>> f(a=1,)
>>> f(**{"a": 1},)

I haven't checked yet what the spec mandates.

spec: hash and freeze

The spec is self-contradictory on the interaction of hashing and freezing. Using language from the documentation of the Go implementation, it says this:

Lists are not hashable, so may not be used in the keys of a dictionary.

but differing from the Go implementation it also says:

Most mutable values, such as lists, and dictionaries, are not hashable, unless they are frozen.

It would be easy to change first line to match the second, but I think this would be a mistake.

There are two reasons Python disallows hashing of lists and dicts.

The first is that they are mutable, so any non-trivial hash function consistent with == would have to reflect mutations. If a mutable key is inserted into a dict then the key is mutated, the dict would no longer appear to contain the key (even given the identical reference) because its hash no longer matches the one saved in the table. Starlark finesses this problem by allowing hashing only of frozen values.

The second reason is that mutable data structures may be cyclic. A list may contain itself, for example, which means the hash computation must detect cycles to avoid getting stuck in an endless loop:

$ Skylark # in Java
>> x = [None]; x[0] = x; {x: None}
Exception in thread "main" java.lang.StackOverflowError
        at java.util.ArrayList$Itr.<init>(ArrayList.java:852)
        at java.util.ArrayList.iterator(ArrayList.java:841)
        at java.util.AbstractList.hashCode(AbstractList.java:540)
        ...etc ad infinitum...

In order to implement cycle detection, the hash method of a Starlark value must pass a parameter which contains all references to mutable objects in the hash operation's current depth-first visitation stack. The implementation of the hash method for a List x must first inspect this stack to ensure that it does not contain x, then push x on the stack, compute the hash of the elements, and then pop x. If the stack does contain x, the hash method should return an error, or alternatively return a constant.

This is feasible. It's reasonably efficient, in that hashing lists is far from the norm and the stack depth is unlikely to be large. But it cannot be implemented in Java using the existing hashCode method without Java thread-local storage. Go has no thread-local storage, so it would require a major incompatible API change to the Hash method to add a new parameter. (An alternative implementation is to limit the visitation depth to some constant k, but it doesn't materially change the problem.)

It seems like an ugly feature. How important is it?

Forbid C-style octal notation (0123)

This change was discussed a long time ago within the Bazel team.

The notation 0123 is error-prone. In many languages 0123 == 123 (e.g. C#, F#, OCaml, Perl6, etc.). In many other languages, 0123 == 83.

The proposed fix is to forbid this notation. This would align with Python3:

$ python3 -c 'print(0123)'                                                                                                                                                                                                                             Err 1
  File "<string>", line 1
    print(0123)
            ^
SyntaxError: invalid token

Users should prefer the more explicit octal notation, i.e. replace 0123 with 0o123 (which is already accepted by all Starlark implementations).

Keyword-only arguments

Keyword-only arguments are described in a Python PEP: https://www.python.org/dev/peps/pep-3102/

Examples:

def compare(a, b, *, key=None): ...

def sortwords(*wordlist, case_sensitive=False): ...

It is supported in the Java implementation. There was a desire to remove this feature, to keep Starlark a subset of Python 2 and Python 3 syntax. However, this feature is useful and often requested:

  • Many builtins use it
  • Documentation in Bazel uses that syntax
  • Macros in Bazel shouldn't allow positional arguments, for maintenance reasons

This doesn't conflict with #13 (comment), because #13 concerns only the call sites. The current issue concerns only the definition sites.

spec: allow parentheses on LHS of augmented assignment?

Consider:

a = 1
(a) += 2

Python 2 and 3 both accept this, yielding a == 3.

Starlark rejects it. The spec (over in the Go implementation repo) says of augmented assignments: "The left-hand side must be a simple target: a name, an index expression, or a dot expression."

Should Starlark accept this code, to better match Python?

I originally filed this at google/starlark-go#25, in response to the Go implementation panicking when evaluating this code.

Expose File.is_directory

File.is_directory has been around for a while but is undocumented (and therefore unsupported). Yet directories (internally, TreeArtifacts) have increasing support in the Starlark action API. In particular, if they are passed to ctx.actions.args objects they are now (pending an upcoming release) automatically expanded to their contents.

Allowing users to easily inspect whether a File is in fact a directory makes sense, given that they are expected to behave differently in such situations. (At the same time, most code shouldn't care whether something's a directory or not.)

spec: is dict[k] an error if k is unhashable?

The Go implementation of Starlark, following Python2 and 3, rejects a dictionary operation in which the key is unhashable:

$ starlark -c '{}.get([])'
Traceback (most recent call last):
  cmdline:1:7: in <toplevel>
  <builtin>: in get
Error: get: unhashable type: list

$ python2 -c '{}.get([])'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: unhashable type: 'list'

$ python3 -c '{}.get([])'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: unhashable type: 'list'

By contrast, the Java implementation permits a lookup with any value, including unhashable ones. The lookup fails, though it is not necessarily an error, so execution may continue:

$ blaze-bin/third_party/bazel/src/main/java/com/google/devtools/starlark/Starlark  
>> {}.get([])
None

(This behavior occurs even when the dict is non-empty, so it can't be explained as the implementation taking a shortcut for empty dicts.)

Clearly, the Java implementation is in fact hashing the key, so the error message ("unhashable type") issued by an update operation such as {}.update([([], 1)]) seems not to tell the whole story.

I think the spec should state that all dict operations attempt to hash the key (even when unnecessary, such as {}.get(k)) and fail if the key is unhashable.

support dict concatenation: dict | dict

Bazel has been supporting dictionary concatenation with the + operator for a long time. Python doesn't.

We've deprecated the operator for a long time, mostly because of issues in Skydoc. But the original Skydoc is going away, and it's unclear to me if we should proceed with the removal of the operator (as we rely less and less on Python).

bazelbuild/bazel#6461

Any opinion?

dict.update should return None, not self

In .bzl files, the expression dict(a=1).update(b=2) evaluates to {a: 1, b: 2} but according to
https://github.com/bazelbuild/starlark/blob/master/spec.md#dictupdate
it should return None.

Both Pythons and Starlark-in-Go return None:

$ python # 2 or 3

x = {}
x.update(k=1)
x
{'k': 1}

$ starlark # in Go

x = {}
x.update(k=1)
x
{"k": 1}

The Java standalone interpreter gives a strange error --- I have no idea what it means:
% blaze-bin/third_party/bazel/src/main/java/com/google/devtools/skylark/Skylark

x = {}
x.update(k=1)
parameter 'other' has no default value, in method call update(int k) of 'dict'

Googlers: see L23 of dart_dev_ctx.bzl in cr/223363742 for a real-world example.

spec: statically reject repeated keyword arguments in call

I notice that both Python2 and Python3 statically reject repeated keyword arguments in a call:

% python2 -c 'if False: f(a=1, a=1)'
  File "<string>", line 1
SyntaxError: keyword argument repeated
% python3 -c 'if False: f(a=1, a=1)'
  File "<string>", line 1
SyntaxError: keyword argument repeated

Perhaps Starlark should also. Neither the Go nor Java implementation currently does.

(Note: this issue is independent of the question of whether repeated arguments are checked dynamically when a **kwargs is present in the call.)

Skylark's min() doesn't support the key parameter

I'm using these hacks to pull in PyPI libraries for a Python project. Yesterday, I added in py.test (a common testing framework) as a dependency, but ran into issues because they have several setup.pys in their repo, and the hack just picks the first one from an arbitrary ordering.

So, I was fixing that, and part of it involved using min() with the key keyword argument. Unfortunately, that doesn't appear to be supported in Skylark, as it produces a unexpected keyword 'key' in call to min(*args) error. I'll probably just work around this for now, but it'd be nice to have support for it.

I think this is the chunk of code that defines the function. I considered trying to hack out a patch for it, but first I'd like to know if that's something that would be accepted, or if that's the kind of thing that's intentionally restricted. I'm also not really sure how much work it'd be, since I'm not familiar at all with the Bazel codebase.

spec: make reversed return an iterable instead of a list?

[originally discussed at https://github.com/google/starlark-go/issues/127; it may be fruitful to read that issue first]

The spec says:

reversed(x) returns a new list [...]

Python 2 and 3 made a different decision: It returns an iterable instead.

This affords possible performance optimizations by doing work lazily. See
google/starlark-go#127 (comment) for an example.

On the other hand, starlark lacks next, which means that making reversed return an iterable removes the most convenient idiom for getting the last item of an iterable: In Python, it is next(reversed(x)), whereas for starlark it is reversed(x)[0].

One conservative decision is to change the spec to return an iterable, even if in practice most implementations return lists. That makes it possible to add optimizations down the line.

Discuss. :)

RFE: document differences from Python

As reported by google/starlark-rust#10, starlark does not support bool * string contrary to Python. There are two issues in the spec that led to that bug:

  • The specification contains some part concerning a type (in this case string) in generic sections (e.g. artihmetic operations) and the rest in a specific section (e.g. string. I would recommend to add cross-references in the string section pointing at all the section that one need to remember when implementing all the type. Or to simply repeat the generic sections in those.
  • The specifications does not contains any warning about difference with python (in this case, that the operation string * bool should not be supported).

Centralize conformance tests in this repository

Hi,

The Go, the Rust and the Java implementation have their own integration tests that test different feature. When writing the integration tests for the rust language I basically took the Java and the Go one and made some variation: the error messages were not the same, the feature set not exactly the same (when importing the Go one) and the test framework is not exactly the same (assert_eq, fails, ....)

It would be great to standardize those test files and have them in this repo so change to the language would be translated here easily.

spec: unclear about type of parameter for reversed, enumerate, extend

The spec says reversed(x) returns a new list containing the elements of the iterable sequence x in reverse order..
And here it says Dictionaries are iterable sequences, so they may be used as ....

Yet the Java implementation (like python) rejects dictionaries as arguments to reversed, while the other implementations (go, rust) accept them.

The same with enumerate and list.extend where they take an iterable sequence, yet Java rejects dictionaries, while Go and Rust accept them. (Python also accepts dictionaries passed to enumerate and extend, unlike reversed).

Drop Python 2 syntax compatibility

Starlark syntax was designed as a subset of both Python 2.7 and Python 3.

Why?
Historically, Google used Python 2 for evaluating Bazel files. Starlark was designed such that Google could migrate the files from Python 2 to Starlark. Starlark had to remain a subset of Python 2, so that other tools could continue handling the files.

Things have changed in the last few years. Most tools now use a proper Starlark parser. There are still Python-based tools, but most of the world is moving (or has moved) to Python 3... even inside Google.

Reasons to drop the Python 2 compatibility include:

I suggest we drop the Python 2 syntax compatibility.
Any objection?

String escapes

The specification doesn't describe string literals.
I tried this code. Each implementation of Starlark has a different result.

print("\a")
print("\b")
print("\c")
...
print("\z")

TODO:

  • Agree on the escapes we handle.
  • What shall we do with unknown escapes (e.g. \z). All implementations and Python return two characters \ and z. This behavior will prevent us to support more escapes in the future (as it would break users).

The fact that the \ character is sometimes used as an escape, and sometimes as a literal character is confusing.

Example of bug found in user code:

s = s.replace("'", "\'").replace('"', '\"')

spec: remove the restriction on if/for at toplevel

The Starlark-in-Go implementation performs a static check that a program doesn't contain if or for statements at top level (outside any function), and I added language to this effect to spec.md, from which the current Skylark spec doc is derived. It occurs to me that this check is unnecessary, and I propose we get rid of it. Removing it would not change any fundamental invariants of the interpreter.

Of course, Bazel can (and must) continue to implement this check on the syntax of BUILD and .bzl files prior to executing them; the necessary logic is simple and efficient. In effect, we would be moving the check from core Starlark to the Bazel build language dialect, similar to the way that the build language disallows 'def' at top-level in a BUILD file.

Not every application that embeds Starlark wants the restriction, and I have encountered several for which it is undesirable.

ToolchainInfo Schema

This proposal came out of a discussion about best practices for ToolchainInfo objects. I don't think we have a need to prioritize implementing this at the moment, but we can at least approve the design if there aren't objections.

PR #11.

Add a built-in function similar to fail/assert

Related: google/starlark-go#208 (review)

The Go implementation currently lacks a function similar to fail() which is implemented in both Starlark Java and Rust. fail() is also used in both their test runners (java, rust) as they define assertion functions in every test and use it to throw errors when those assertions fail.

Go on the other hand uses a built-in assert module for test cases which include assert.eq(), assert.fails() and similar.
I think that a function similar to fail, or a more general one such as assert function/keyword would be useful to remove these inconsistencies. A function similar to assert for example could be used in the test cases of all 3 implementations and do the job of fail().

comprehensions temporarily rebind global variables

Yet another scoping bug in the Java implementation: comprehensions save and restore the existing binding but in between they clobber it, and this clobbering is visible:

$ cat a.star
x = 1
def f():
    return x
print([f() for x in [2]])

$ python2 a.star
[2]
$ python3.8 a.star
[1]
$ starlark a.star
[1]
$ blaze-bin/third_party/bazel/src/main/java/com/google/devtools/starlark/Starlark a.star
[2]

The solution is static scope resolution.

A related comprehension scoping issue: #84

spec: convergence with Go

The Go implementation has a list of remaining differences from the Java implementation: https://github.com/google/starlark-go/blob/master/doc/spec.md#dialect-differences
I'd like us to finish the wording of a spec that we can all be happy with, even if that spec allows for some differences among implementations. I'll go through the list of differences point by point:

  • multiprecision integers: the spec should require that integer precision be sufficient to represent uint64 and int64 values without loss, as these are required for correct handling of protocol buffers, among other things. Obviously Bazel has no need for larger integers so it would be fine not to implement it for now, but it should be described as a limitation of the implementation.

  • floating point: for the same reason, lossless handling and arithmetic on float64 values must also be supported. (On this and the above point I think we were all agreed based on a meeting in NYC about 18 months ago.) Bazel has no need of floating-point at all, so again, we can state that this is a limitation of the Java implementation.

  • bitwise operators should be supported. They are fundamental operations on integers in every machine and programming language. Bazel may not need them, but many other uses do (anything that uses protocol buffers, for example.)

  • strings: we cannot realistically require a particular string encoding (UTF-8 or UTF-16) without imposing intolerable costs on implementations whose host language uses the opposite encoding. I propose we specify strings in terms of code units without specifying the encoding; UTF-8 and UTF-16 are only quantitatively different in that sense. However this does leave the Java implementation without a data type capable of representing binary data.

  • strings should have methods elem_ords, codepoint_ords, and codepoints. I think there was agreement on this point but the Java implementation was lagging.

  • A language needs some way to encode a Unicode code point as a string (and vice versa). One way to do this is the Go impl's chr and ord built-in functions. (Related: the "%c" formatting operator, which is like "%s" % chr(x).)

  • The Go impl permits 'x += y' rebindings at top level. I think it should probably match the Bazel implementation (which rejects them), but the whole no-global-reassign feature should be specified as a dialect option, since no client other than Bazel wants it.

  • The Go implementation treats assert as a valid identifier. Indeed, it uses it widely throughout its own tests. The cost of specifying this would be that tools (such as Bazel tests) that use the Python parser will not be able to parse Starlark files that use 'assert' as an identifier. Given that using Python in this way is a hack, and that files containing assert will be vanishingly rare in the Bazel test suite, that doesn't seem like a problem.

  • The Go impl's parser accepts unary + expressions for parity with every other ALGOL-like language. A + operator forces a check that its operand is numeric, and occasionally makes code more readable. I think the spec should include it.

  • In the Go impl, a method call x.f() may be separated into two steps: y = x.f; y(). I think work is underway to support this in the Java impl too. I recall we were at least agreed it was the right thing.

  • In the Go impl, dot expressions may appear on the left side of an assignment: x.f = 1. This is a parser issue---in Bazel, there are no mutable struct-like data types for which this operation would succeed, but other applications may need it (esp. if they use protocol buffers), so the grammar should support it nonetheless.

  • In the Go impl, the hash function accepts operands besides strings, as in Python. It should be an easy fix to the Java implementation to do so too.

  • The Go impl's sorted function accepts the additional parameters key and reverse. These make it easier to define alternative order without the effort and unnecessary allocation of the decorate/sort/undecorate trick and a separate call to reverse.

  • The Go impl's type(x) returns "builtin_function_or_method" for built-in functions. This is the string Python uses. I don't have a strong feeling about the particular string, but the crucial thing is that builtin- and Starlark-defined functions must have distinct types because they support different operations. For example, in Bazel, the rule.outputs mechanism requires that its operand be a Starlark function so that its parameter names can be retrieved; this is impossible with a built-in function.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.