Git Product home page Git Product logo

proposal-defer-import-eval's Introduction

Deferring Module Evaluation

previously known as "Lazy Module Initialization"

Status

Champion(s): Yulia Startsev and Nicolò Ribaudo

Author(s): Yulia Startsev, Nicolò Ribaudo and Guy Bedford

Stage: 2

Slides:

Background

JS applications can get very large, to the point that not only loading, but even executing their initialization scripts incurs a significant performance cost. Usually, this happens later in an application's life span - often requiring invasive changes to make it more performant.

Loading performance is a big and important area for improvement, and involves preloading techniques for avoiding waterfalls and dynamic import() for lazily loading modules.

But even with loading performance solved using these techniques, there is still overhead for execution performance - CPU bottlenecks during initialization due to the way that the code itself is written.

Motivation

Avoiding unnecessary execution is a well-known optimization in the Node.js CommonJS module system, where there is a smaller gap between load contention and execution contention. The common pattern in Node.js applications is to refactor code to dynamically require as needed:

const operation = require('operation');

exports.doSomething = function (target) {
  return operation(target);
}

being rewritten as a performance optimization into:

exports.doSomething = function (target) {
  const operation = require('operation');
  return operation(target);
}

The consumer still is provided with the same API, but with a more efficient use of FS & CPU during initialization time.

For ES modules, we have a solution for the lazy loading component of this problem via dynamic import().

For the same example we can write:

export async function doSomething (target) {
  const { operation } = await import('operations');
  return operation(target);
}

This avoids bottlenecking the network and CPU during application initialization, but there are still a number of problems with this technique:

  1. It doesn't actually solve the deferral of execution problem, since sending a network request in such a scenario would usually be a performance regression and not an improvement. A separate network preloading step would therefore still be desirable to achieve efficient deferred execution while avoiding triggering a waterfall of requests.

  2. It forces all functions and their callers into an asynchronous programming model, without necessarily reflecting the real intention of the program. This leads to all call sites having to be updated into a new model, and cannot be made without a breaking API change to existing API consumers.

Problem Statement

Deferring the synchronous evaluation of a module may be desirable new primitive to avoid unnecessary CPU work during application initialization, without requiring any changes from a module API consumer perspective.

Dynamic import does not properly solve this problem, since it must often be coupled with a preload step, and enforces the unnecessary asyncification of all functions, without providing the ability to only defer the synchronous evaluation work.

Proposal

The proposal is to have a new syntactical import form which will only ever return a namespace exotic object. When used, the module and its dependencies would not be executed, but would be fully loaded to the point of being execution-ready before the module graph is considered loaded.

Only when accessing a property of this module, would the execution operations be performed (if needed).

This way, the module namespace exotic object acts like a proxy to the evaluation of the module, effectively with [[Get]] behavior that triggers synchronous evaluation before returning the defined bindings.

The API will use the below syntax, following the phases model established by the source phase imports proposal:

// or with a custom keyword:
import defer * as yNamespace from "y";

Semantics

The imports would still participate in deep graph loading so that they are fully populated into the module cache prior to execution, however it the imported module will not be evaluated yet.

When a property of the resulting module namespace object is accessed, if the execution has not already been performed, a new top-level execution would be initiated for that module.

In this way, a deferred module evaluation import acts as a new top-level execution node in the execution graph, just like a dynamic import does, except executing synchronously.

There are possible extensions under consideration, such as deferred re-exports, but they are not included in the current version of the proposal.

Top-level await

Property access on the namespace object of a deferred module must be synchronous, and it's thus impossible to defer evaluation of modules that use top-level await. When a module is imported using the import defer syntax, its asynchronous dependencies together with their own transitive dependencies are eagerly evaluated, and only the synchronous parts of the graph are deferred.

Consider the following example, where a is the top-level entry point:

// a
import "b";
import defer * as c from "c"

setTimeout(() => {
  c.value
}, 1000);
// b
// c
import "d"
import "f"
export let value = 2;
// d
import "e"
await 0;
// e
// f

Since d uses top-level await, d and its dependencies cannot be deferred:

  • The initial evaluation will execute b, e, d and a.
  • Later, the c.value access will trigger the execution of f and c.

Rough sketch

If we split out the components of Module loading and initialization, we could roughly sketch out the intended semantics:

⚠️ The following example does not take cycles into account

// LazyModuleLoader.js
async function loadModuleAndDependencies(name) {
  const loadedModule = await import.load(`./${name}.js`); // load is async, and needs to be awaited
  const parsedModule = loadedModule.parse();
  await Promise.all(parsedModule.imports.map(loadModuleAndDependencies)); // load all dependencies
  return parsedModule;
}

async function executeAsyncSubgraphs(module) {
  if (module.hasTLA) return module.evaluate();
  return Promise.all(module.importedModules.map(executeAsyncSubgraphs));
}

export default async function lazyModule(object, name) {
  const module = await loadModuleAndDependencies(name);
  await executeAsyncSubgraphs(module);
  Object.defineProperty(object, name, {
    get: function() {
      delete object[name];
      const value = module.evaluateSync();
      Object.defineProperty(object, name, {
        value,
        writable: true,
        configurable: true,
        enumerable: true,
      });
      return value;
    },
    configurable: true,
    enumerable: true,
  });

  return object;
}

// myModule.js
import foo from "./bar";

etc.

// module.js
import LazyModule from "./LazyModuleLoader";
await LazyModule(globalThis, "myModule");

function Foo() {
  myModule.doWork() // first use
}

Implementations

Q&A

What happened to the direct lazy bindings?

The initial version of this proposal included direct binding access for deferred evaluation via named exports:

import { feature } from './lib' with { lazyInit: true }

export function doSomething (param) {
  return feature(param);
}

where the deferred evaluation would only happen on access of the feature binding.

There are a number of complexities to this approach, as it introduces a novel type of execution point in the language, which would need to be worked through.

This approach may still be investigated in various ways within this proposal or an extension of it, but by focusing on the module namespace exotic object approach first, it keeps the semantics simple and in-line with standard JS techniques.

Is there really a benefit to optimizing execution, when surely loading is the bottleneck?

While it is true that loading time is the most dominant factor on the web, it is important to consider that many large applications can block the CPU for of the range of 100ms while initializing the main application graph.

Loading times of the order of multiple seconds often take the focus for performance optimization work, and this is certainly an important problem space, but the problem of freeing up the main event loop during initialization remains a critical one when the network problem is solved, that doesn't currently have any easy solutions today for large applications.

Is there prior art for this in other languages?

The standard libraries of these programming languages includes related functionality:

  • Ruby's autoload, in contrast with require which works in the same way as JS import
  • Clojure import
  • Most LISP environments

Our approach is pretty similar to the Emacs Lisp approach, and it's clear from a manual analysis of billions of Stack Overflow posts that this is the most straightforward to ordinary developers.

Why not support a synchronous evaluation API on ModuleInstance

A synchronous evaluation API on the module expression and compartments ModuleInstance object could offer an API for synchronous evaluation of modules, which could be compatible with this approach of deferred evaluation, but it is only in having a clear syntactical solution for this use case, that it can be supported across dependency boundaries and in bundlers to bring the full benefits of avoiding unnecessary initialization work to the wider JS ecosystem.

What can we do in current JS to approximate this behavior?

The closest we can get is the following:

// moduleWrapper.js
export default function ModuleWrapper(object, name, lambda) {
  Object.defineProperty(object, name, {
    get: function() {
      // Redefine this accessor property as a data property.
      // Delete it first, to rule out "too much recursion" in case object is
      // a proxy whose defineProperty handler might unwittingly trigger this
      // getter again.
      delete object[name];
      const value = lambda.apply(object);
      Object.defineProperty(object, name, {
        value,
        writable: true,
        configurable: true,
        enumerable: true,
      });
      return value;
    },
    configurable: true,
    enumerable: true,
  });
  return object;
}

// module.js
import ModuleWrapper from "./ModuleWrapper";
// any imports would need to be wrapped as well

function MyModule() {
 // ... all of the work of the module
}

export default ModuleWrapper({}, "MyModule", MyModule);

// parent.js
import wrappedModule from "./module";

function Foo() {
  wrappedModule.MyModule.bar() // first use
}

However, this solution doesn't cover deferring the loading of submodules of a lazy graph, and would not acheive the characteristics we are looking for.

proposal-defer-import-eval's People

Contributors

acutmore avatar bakkot avatar chicoxyzzy avatar codehag avatar guybedford avatar jack-works avatar littledan avatar ljharb avatar mikesamuel avatar nbp avatar nicolo-ribaudo avatar robpalme avatar sroucheray avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

proposal-defer-import-eval's Issues

Concern about making local variable access side effecting

We've generally tried to avoid having access to local variables be side effecting, a property which is useful both for engines and for humans reading the code. (Obviously you can break that property using with, and it does not hold for global variables.) If I'm reading this proposal correctly - if the module evaluation would happen when the bindings were first referenced - that would no longer be true for lazy-loaded module bindings. I think that's worrisome.

This isn't a blocker for stage 1, since this is a concern only about that specific strategy, and there are other possible strategies for lazy loading.

There should be an async version - `deferAsync`?

This proposal adds a cool new feature (fetching/linking a module without executing immediately) but denies that feature to all modules which use top-level await (TLA).

This clashes with the current ESM model where modules that use TLA are treated just like any other modules, anywhere: import statements work normally for them and import() is async already. ESM established a world where modules can safely use TLA without downsides.

This proposal breaks that model. It makes modules that use TLA worse than other modules because there is no way to fetch them while defering their execution.

This seems unnecessary, because there could just be an async version of defer, which allows you to defer execution of an entire TLA module. Many consumers don't care about the described "problem" that exposing something becomes an async function; for them the async version of defer would be the better version, because it would defer a larger part of the execution.

I propose:

import deferAsync * as myModule from "mypath"

Let's examine possible behaviors:

  • myModule is a promise itself, and executed only when it is awaited
    • this seems a bit weird, because awaiting a promise can usually not trigger anything in JS
  • all properties of myModule are promises; the module is executed only when one of them is accessed.
    • this seems natural, and is very similar to defer, where myModule basically becomes a smart proxy for the module

To reiterate, deferAsync would be a strictly better version of defer for all consumers that are fine with getting promises.

`import *` potentially doesn't need to parse/link

Given that there is no linkage required required for import * likely it can avoid parsing or linkage if cycles are not present in some way. The only issue is with behaviors that can statically cyclically link:

// a
import 'b' with {layzInit: true};
// b
export * from 'a';

This would let things be done in an even more controlled manner but would need some kind of guard against cycles. Either stating a different kind of linkage or disallowing cycles to modules with lazy behaviors.

It would be good to at least investigate here as ahead of time fetch costs are some of the prohibitive costs still possible with this proposal.

Consideration: explicit execution form

I was wondering if it's been considered for this proposal to make the execution an explicit functional execution to try to align it closer to current JS execution semantics.

That is - still having the lazy imports, but relying on an explicit function call to define the bindings.

Effectively replacing:

lazy import defaultName from "y";
export function lazyFn () {
  return defaultName;
}

with something like:

lazy import defaultName from "y";
export function lazyFn () {
  execute();
  return defaultName;
}

or maybe even turning the binding itself into a synchronous function that returns the binding:

lazy import defaultName from "y";
export function lazyFn () {
  return defaultName();
}

Would something like that get around some of the issues currently being discussed for this proposal? Would be great to continue these discussions further, whether or not this issue is on the right track!

Bloomberg Feedback

At Bloomberg, we have been using lazy module evaluation in production for a long time. We have proved multiple times that it's essential for better application startup performance. Hence we are very happy to see this proposal and will invest in its standardization.

This proposal, as it stands, is generally inline with our view. Please see our detailed feedback below.

Existing Use Cases

Our module loader speaks AMD. Historically, AMD has also been the authoring syntax for modules. This has changed in the last few years. Modules are now authored in ESM syntax by default and AMD is considered legacy as an authoring format. ES modules can interoperate with AMD and vice versa. In terms of interop, it would be fair to consider an AMD module as an ES module with a single default export.

Currently we have two primary features that give users lazy evaluation capabilities. These are designed to be used more in the producer modules when building module exports.

  • In AMD, a programmatic API lets users create objects with getter functions as a proxy to other modules, i.e. re-exports.
  • In ESM, a compiler-based feature that makes all re-exports of a module lazy. This feature is heavily restricted.

Both of these are used in critical parts of the infrastructure. They don't take special care for asynchronous modules, i.e. ESM using top-level await, or AMD using a special async formulation. Users are expected to create safe dependency graphs by knowing what children to preload ahead of time. This makes the feature very brittle, which we are actively working to address (see below).

1. Default-exported Object with Lazy Properties

In this pattern, the default-exported object contains getter functions that load different target modules on first access.

define(["make-lazy", "sync-require"], (makeLazy, require) => {
    const exports = {};

    makeLazy(exports, "Label", "./label", require);
    makeLazy(exports, "Button", "./button", require);

    return exports;
});

Using the current proposal, this can be implemented as follows:

import * as Label from "./label" with { lazyInit: true };
import * as Button from "./button" with { lazyInit: true };

export default {
    get Label() { return Label.default },
    get Button() { return Button.default },
};

This requires .default for each namespace which is a little inconvenient.

2. Lazy Namespace of (Lazy) Namespaces

In this pattern, a mega lazy namespace is constructed by re-exporting other namespaces. A special pragma tells the build tooling to transform eager static re-exports to lazy evaluation.

/* "use special pragma to make re-exports lazy at build time" */
export * as TableUtils from "./table-utils";
export * as WindowUtils from "./window-utils";

This is possible to implement using the current proposal as it's allowed to pass the DeferredModuleNamespace exotic object around without breaking its lazy nature.

import * as TableUtils from "./table-utils" with { lazyInit: true };
import * as WindowUtils from "./window-utils" with { lazyInit: true };

export { TableUtils, WindowUtils };

It cannot be expressed concisely only using re-export statements, though.

3. Lazy Namespace of Named Re-exports

The same pragma can be used to construct a namespace where only selected named bindings are re-exported lazily. Although this pattern is supported, we don't have many users of it today.

/* "use special pragma to make re-exports lazy at build time" */
export { Table } from "./table";
export { Window } from "./window";

This cannot be expressed in the current proposal as it only allows namespace imports and there is no way to export a function that works like a getter.

Implementation of Deferred Module Evaluation

We have implemented deferred module evaluation in our module system. It is now used by a few early adopters in production. As background, this module system is not known to the JS engine. It is almost entirely implemented in JS and is constructed during an initial bootstrapping phase that runs before the main application module is loaded.

Most of the complexity we found when implementing this feature arises due to handling of asynchronous modules in the graph. Supporting top-level await is important in our system, which is why we have invested in standardization of it in the past. So this was a critical requirement for us. Handling of asynchronous modules is achieved by a combination of metadata collected at build time and preloading asynchronous nodes in the runtime. Build-time metadata is reliable because we enforce static analyzability of dependency graphs. This metadata enables an optimization that makes the preloading traversal in the runtime more efficient by avoiding parsing modules which makes discovery of asynchronous modules possible without the need to explore the whole static module graph.

We are aligning with the current state of the proposal as much as we can by

  • using a comment based syntax in import declarations with a goal of moving to Import Attributes and eventually to the final standard syntax
  • only allowing namespace imports - named imports are banned

[Question] Is this the solution for below case?

Is this the solution for below case?

Is so, I think, should also add to example use case, as I think it's better to read at first glance.

import foobarDep from 'foobar-dep'
import browserDep from 'browser-dep' with { lazyInit: true }
import denoDep from 'deno-dep' with { lazyInit: true }
import nodeDep from 'node-dep' with { lazyInit: true }

function foobar() {
  if (IS_BROWSER) {
    return [foobarDep(), browserDep()]
  }

  if (IS_DENO) {
    return [foobarDep(), denoDep()]
  }

  if (IS_NODE) {
    return [foobarDep(), nodeDep()]
  }

  throw new Error()
}

export { foobar }

Nice Bonus Benefit: Module cycle problems can often be easily resolved

So at present with cyclic modules particularly those involving subclassing we can often have a nasty issue appear in that due to a cycle during extends the superclass may not be defined. (While cycles are often anti-patterns, they are also often unavoidable in a code-base with sufficiently many classes, e.g. components that depend cyclically on each other).

i.e. We might have the following

// A.js
import B from "./B.js";

export default class A {
   // ... do something with B
}
// B.js
import A from "./A.js";

export default class B extends A {}

Whether these modules successfully evaluate is dependent on whether A.js or B.js is executed first. Current solutions tend to be complicated messes or annoying to maintain as cycle sizes grow.

However lazy eval happens to be able to fix these types of cycles just by specifying lazy init for any modules that aren't needed during module evaluation. i.e. To fix the above cyclic evaluation issue the fix is as simple as:

// A.js
// This import will not imply a cycle now, so it will never be evaluated before A.js
import B from "./B.js" with { lazyInit: true };

export default class A {}

This is particularly nice as it can scale to basically any cycle (that is fixable) simply by marking any modules that are not immediately required as lazy init.

How does this impact WASM modules?

It's important that the ESM ecosystem tries to be compatible with WASM modules loaded as leaf nodes in a module graph.

WebAssembly can already be compiled and instantiated in separate steps. It would be nice if WASM module instantiation at least, and ideally compilation as well, could be deferred.

There is a sync constructor to create an Instance from a compiled module that could have been previously async compiled, but it is documented as discouraged:

https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/Instance/Instance#syntax

However, WASM modules are async modules, and the current proposal puts restrictions on the degree to which async modules can be deferred.

Detect that a module is currently in its lazy (loaded but not evaluated) state?

Is the intention of this proposal that the laziness status -- whether it's been loaded-but-not-evaluated or also-fully-evaluated -- be detectable by the userland code, or that it be be opaque (impossible to detect)?

To wit: would the lazily loaded module's binding value have a typeof that indicates anything different between whether it had been evaluated or not? Or would any access expression (such as typeof theModule) force the evaluation of the module first, thereby thwarting detection?

My experience with dynamic resource loading (going back as far as 2009 with the popular-at-the-time LABjs loader) is that many use-cases around wanting to defer and dynamically load resources imply needing to inquire about the status of such loading rather than having it be entirely opaque. For example, if you know a resource is fully ready to use, you might render the asset/UI for it right away, but if you know there might be a noticeable delay (like loading it, or even just evaluating it), you might choose to display a spinner or placeholder and keep the main asset UI un-rendered until it's "complete".

Handling of top-level await

I'm not a modules expert by any means, and I know this is a very complex space, so I'm guessing there are good reasons for the current handling of top-level await within the dependency graph of a deferred import's module.

However, as a JS user, I find the current semantics pretty unexpected and error prone, assuming I'm understanding them correctly. In particular, it seems like an import marked with defer can still immediately trigger side effects, via this backdoor for eagerly evaluating the async portion of the dependency graph. To me, that is deeply unintuitive because, well, I'd expect a deferred import's evaluation to be (completely) deferred.

Is there anywhere I could read about the rationale, or alternatives considered, for the current handling of top-level await? How does the current approach compare to, say, throwing an error upon encountering an async module within the dependency graph of a deferred import?

No named exports edge case

Was just wondering about the case for a module without any exports, say only imports, where that module represents some ordered execution that should take place.

Currently the deferral would work, but there would be no way to trigger the execution without a named export.

Alternative syntax: Just import declaration in block

As the title says, a previous proposal with similar goals simply had import declarations within blocks. This feels pretty natural to me as it makes it very clear as to where evaluation happens (at the start of the block containing the import).

e.g.:

export function baz() {
  import foo from "bar"; // Evaluation/errors are deferred until here
}

export function buzz() {
  // failures in baz() don't affect this function
}

It also has the nice property of making it clear that some APIs might have specific boundaries for example in this API we can throw an error if inspect code is called from a non-node environment that doesn't support util, but still avoid making the API unncessarily async.:

export default class Foo {
  [Symbol.for('nodejs.util.inspect.custom')]() {
    import util from "util";
    return `Foo { ${ foo.data } }`;
  }
}

Alternative import attribute name to express lack of guarantee

As there is no guarantee that the imported module is evaluated lazily, e.g. it has already been evaluated, or it contains top level await. Perhaps a name that more clearly expresses this is a hint rather than an assertion would work better?

Bikeshed:

import {x} from "y" with { preferLazy: true };
import {x} from "y" with { lazyHint: true };

clarification example

I think it would be helpful to explain a top level example that immediately references the binding like:

import a from 'a' with {lazyInit: true};
a;

Relation to weak import proposal

@LeaVerou posted a related proposal in https://lea.verou.me/2020/11/the-case-for-weak-dependencies-in-js/.

It would likely be worthwhile to include a comparison to this proposal.

My own intuition would be that they are very much the same sort of thing - in that the imported bindings are bound to the unexecuted module but the module itself has just not executed yet. The tradeoff to consider between both proposals is then just when and how to do the actual execution given we have this import mode / attribute.

In the context of optional dependencies, the concept comes up that another module in the graph that happens to execute the optional dependency will define it for all the other importers. This then entirely moves the execution timing problem back away from the engine to the user API side which might offer more control - would be very useful to compare these execution models.

Stage 2.7 roadmap

Before Stage 3, we still need to investigate multiple aspects of the proposal. Each topic should be discussed in its own issue, but I'm writing all of them here to keep track of the progress.

  • Interop with WASM modules (#22)
    The current WASM-ESM integration proposal makes WASM modules async from JS's perspective. This meant that their execution would never be deferred. Can this be changed?
  • More performance analysis
    We need to investigate more the performance benefits of this proposal, and collect more data. Additionally, we need to explain why top-level await suddenly causing a module to not be deferred anymore is not a performance footgun.
  • Early errors (related to #9)
    How critical are early errors? Is it ok to defer them until execution, similarly to what happens with dynamic import? This would allow some systems to entirely defer loading modules.
  • Deferred re-exports (related to #18)
    It would be useful to support something like export defer { f } from "./foo", that causes the execution of ./foo only if f is actually imported form this module.
  • Dynamic import syntax (PR: #28)
    Given that import source is now stage 3, we should probably add dynamic import.defer() to this proposal similar to import.source().

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.