OK, I've got a strawman proposal for how to handle modules I'd like to get feedback on. Here's my high level goals:
-
(0) The zeroth goal, as always, is do something simple. Wren is tiny and minimal. Tiny tiny tiny.
-
(1) Avoid a monolithic global scope. If two unrelated modules both define the same name at the top level, I'd like that to not cause problems. "Include"-style module systems where importing a module basically means "run it in the top level scope" feel dirty to me. That being said, they do satisfy the zeroth goal.
-
(2) This is more a given than a goal, but the physical location of modules is up to the embedder. In Wren, you'd just specify a "logical" name for a module to import. How that gets mapped to the file system (or whatever) is up to the embedder. Of course, the command-line embedder that comes with Wren will have a, hopefully sane, implemented behavior for this.
-
(3) Handle shared imports. Main imports A and B which both import C. This should only cause C to be loaded and executed once. Since loading a module may have side-effects, I think this is a given.
-
(4) Handle circular imports. This is a bit more contentious in a scripting language. But statically typed languages almost invariably support this and it is handy. In Dart, I have cyclic imports all the time.
It's more important in a static language because you often need to import a module solely to be able to use its names in type annotations. Wren isn't statically typed, but I intend to write a static analysis tool for it. (Think like dialyzer for Erlang.) So I'd like to not rule out circular imports if possible.
-
(5) Still provide compile errors for undefined globals. Right now, an undefined name is detected and reported at compile time. I really like that. One easy way to handle circular imports is to just implicitly define globals. Then it doesn't matter which order imports are loaded as much as long as you don't use a name until after it's handled.
But, if possible, I'd like to not do this. I like compile errors for my dumb typos.
-
(6) Let the importer control the name(s) bound from the imported module. The person defining the module doesn't know what other names may be in scope when their module is used. Only the consumer knows that, so ideally the consumer has some control here. This is why, for example, Python has "import as".
Strawman #1 - Lua-style module objects
Lua has a neat system. The require()
function loads a script and returns the value returned by that script. You use it like:
local someModule = require('someModule')
So you control the local "prefix" used to access the module, and the module is really just an object. Super simple. In Lua, the object returned is usually a table. Wren isn't table-based, so instead this would naturally be an instance, or possibly a class object. Either works.
We'd probably want a little syntax sugar for imports, but as a first pass, we could start with that. Add a static method that loads a script, evaluates it, and gets the value it returns.
You bind the result of that to a local variable and you're good to go. For example, you could have a 2d vector module like:
class Vector {
new(x, y) {
_x = x
_y = y
}
// ...
}
return Vector
I'm not sure what to hang the import method off of so for—just for this doc— I'll just hang it off IO. You could import the above module like:
var Vector = IO.import('vector')
var v = new Vector(1, 2)
For the case where a module exposes a single class, I think this works well. For modules that want to expose constants and other stuff, they'd basically have to hang them off the class as static getters.
Is that gross? Slow?
This trivially addresses goals 0, 1, 2 and has a nice solution for 6. It also addresses 5 with the caveat that you basically lose compile errors for imported "names". Since those become dynamically dispatched getters off the one imported instance, they aren't checked at compile time any more.
For example:
// vector.wren
class Vector {
static origin { new Vector(0, 0) }
}
return Vector
// main.wren
var Vector = IO.import('vector')
Vector.origon
Here, we mispelled origin
, but the compiler won't help. But it still catches other name errors for lexical names you define.
Goal 3, shared imports, is pretty simple. We just keep a table of objects returned by previously loaded modules and return the previous one if the same name is requested. That's what pretty much every language does.
OK, what about circular imports?
Here's a pair of modules that should work:
// main.wren
var Other = IO.import('other')
class Main {
static inMain {
IO.print("inMain")
}
}
Other.inOther
return Main
and:
// other.wren
var Main = IO.import('main')
class Other {
static inOther {
IO.print("inOther")
Main.inMain
}
}
return Other
What's the execution model here?
- Start executing main.wren.
- Get to the import and pause main.
- Start executing other.wren.
- Get to the import
At this point, node can do something nice. It just returns an empty module object for main. When main is resumed, it will imperatively add stuff to that object instead of creating it. So it can just return the shell object. Then, by the time Other.inOther
is called, main has added inMain to it and everything is fine.
We can't do that in Wren since classes are created monolithically. We'd have to do something tricky to break the cycle like have import('main')
return a proxy, or have the VM track the global it gets assigned to and reassign that after main is done.
Strawman #2 - Module classes
We can't just jack Lua's solution into Wren without some kind of VM changes to handle cycles. Let's try adding a little bit of special sauce.
The basic idea is that each module implicitly gets a class created before the module is loaded. When you import the module, that class is the object that is returned by import()
.
The clever bit is that we let you add methods to it in the body of the module. I'm thinking something like:
// main.wren
var Other = IO.import('other')
module Main {
static inMain {
IO.print("inMain")
}
}
Other.inOther
and:
// other.wren
var Main = IO.import('main')
module Other {
static inOther {
IO.print("inOther")
Main.inMain
}
}
The classes have been replaced with module
which would be a reserved word meaning "add these methods to the module's implicit class". We could even have the VM implicitly instantiate that class and return the instance so that you don't have to make everything static. Actually, no. Because often you would want your module to be a constructible class.
OK, so now the execution model is:
- Start executing main.wren.
- That creates a class for the main module and stores it in the module table.
- Get to the
import('other')
and pause main.
- Start executing other.wren.
- That creates a class for the other module and stores it in the module table.
- Get to the
import('main')
. Main is already in the module table, so just
return it.
- Get to the module definition in other.
- Define the
inOther
method on other's module class.
- Finish other.
- Resume main.
- Get to the module definition in main.
- Define the
inMain
method on main's module class.
- Call
Other.inOther
. It's fine because it's defined now.
- That calls
Main.inMain
. Also fine.
So... I think this works? The two pieces of VM infrastructure (aside from new stuff like the module table and import functionality) we need are:
- That ability to pause a script while we execute another.
- The ability to add new methods to an existing class.
Fibers already give us #1. And the VM already supports #2 internally. There's just no syntax to allow monkey-patching.
Another option is to actually add a syntax to monkey patch a class. Then, as long as you knew what the name of your module class was, you could just extend it:
// other.wren
var Main = IO.import('main')
extend class Other {
static inOther {
IO.print("inOther")
Main.inMain
}
}
Even though the importer controls the name of the variable that they bind the module class to, it's still important for the module class itself to have a name since it shows up in stack traces, error messages, etc.
Thoughts?