google / wireit Goto Github PK

Wireit upgrades your npm/pnpm/yarn scripts to make them smarter and more efficient.

License: Apache License 2.0

JavaScript 0.35% TypeScript 99.09% Nunjucks 0.20% CSS 0.37%

wireit's Issues

A way to shorten dependency lists: wildcards or auto-dependencies

I think a lot of monorepos will have scripts that depend on basically all subpackages. Take lit.dev's build script:

  "wireit": {
    "build": {
      "dependencies": [
        "./packages/lit-dev-api:build",
        "./packages/lit-dev-content:build",
        "./packages/lit-dev-server:build",
        "./packages/lit-dev-tests:build",
        "./packages/lit-dev-tools-cjs:build",
        "./packages/lit-dev-tools-esm:build"
      ]
    },

This list is prone to getting out of sync as packages are added. That might be caught quickly when the project doesn't build correctly, but the list is still cumbersome.

Lerna and npm solve this by allowing you to run a script by name in any package that has that script. I could see wireit supporting something similar in a few ways:

Wildcards:

  "wireit": {
    "build": {
      "dependencies": [
        "./packages/**/*:build",
      ]
    },

Wildcard with optional script name

  "wireit": {
    "build": {
      "dependencies": [
        "./packages/**/*", // Uses "build" automatically
      ]
    },

Auto-deps based on "workspaces" field
This would read the "workspaces" field as a single source of truth for the list of all subpackages and run the build script in each:

  "wireit": {
    "build": {
      "dependencies": "auto"
    },

This version of the idea would really only be useful on the top-level package. To work in workspaces automatically would require some magic around looking at which dependencies are local to the monorepo and running scripts in them. Maybe that's doable.

Since packages would still need their scripts and dependencies specified, even if auto, it seems like this wouldn't require wireit to topo-sort the packages.

Replay stdout/stderr

If a script is fresh or cached, we should replay the cached stdout/stderr. Help for e.g. test output, since it may not be clear that the tests are in a passing state when nothing is logged.

Don't start new scripts after one has failed

Cache output locally

Unexpected errors should be reported with more context

We usually know what script we're running when an unexpected error occurs, so at minimum we couldwrap unexpected errors with the script context and include it when we display it.

There might also be additional context we could add to help narrow down where in the code the error occurred. Especially given the fact that fs errors in Node don't include stack traces (nodejs/node#30944).

Guard against colliding invocations with a lock file

Multiple iterations and flake detection

It could be useful to sometimes run a script multiple times. For example, to detect a flaky test, maybe something like this:

npm test -- --detect-failure-rate --iterations=10

This would run the script 10 times, and then report the % of times it failed at the end.

Watch mode optimizations

No need to re-analyze or re-create watchers unless a package.json file has changed (and watchers only need to be re-created if a files array changed, and even then only the affected ones do).
We can share the CachingPackageJsonReader instance across analysis. File change events tell us when a cached result is stale.
We can cache glob results in memory. File change events tell us when a cached result is stale.
We can cache file hashes in memory. File change events tell us when a cached result is stale.

Pass-through extra arguments

Dry run mode

Tells you what scripts would be executed in what order, which existing output files would be deleted, etc.

Automatically delete output

binary lookup fails in cross-package dependencies

In hooking up wireit in the lit-analyzer repo, one of the sub-repos – vscode-plugin – depends on vsce, which exposes the vsce command, and one of its scripts package-for-test uses that command.

If I run npm run package-for-test from packages/vscode-plugin then when wireit runs the command, vsce is found.

If I run a script that depends on ./packages/vscode-plugin:package-for-test from the root of the repo, vsce is not found.

Log output:

$ npm run just-package

> [email protected] just-package
> wireit

✅ [packages/lit-analyzer:build] Already fresh
✅ [packages/ts-lit-plugin:build] Already fresh
✅ [packages/vscode-lit-plugin:build] Already fresh
🏃 [packages/vscode-lit-plugin:package-for-test] Running command "vsce package -o ./out/packaged.vsix && rm -rf ../../../packaged-extension/ && mkdir ../../../packaged-extension/ && unzip -qq ./out/packaged.vsix -d ../../../packaged-extension/"
/bin/sh: vsce: command not found
❌ [packages/vscode-lit-plugin:package-for-test] Failed with exit status 127

Ability to specify which environment variables should affect cache key

[vscode-extension] Extended validation

Additional diagnostics which aren't expressible as part of the JSON schema:

every wireit script needs to be present in the scripts section
every wireit script needs to just run "wireit" in the scripts section
every dependency needs to resolve to an npm script

Support cross-package dependencies

Core script runner

Setting to turn off caching for a specific script

Probably "cache": false.

It seems like we might want to control it per-caching implementation too. You might not want local caching (e.g. because it's a really fast script with incremental build so you don't want it during development), but still want it when using GitHub Actions.

Detect overlapping output

Two scripts should probably not be able to set their output to patterns that could overlap. Especially because when clean is enabled (the default), one script could clobber the output of an other by mistake.

JSON schema for our package.json fields

It looks like a VS Code plugin can contribute json schema files: https://code.visualstudio.com/api/references/contribution-points#contributes.jsonValidation

And those json schema files may be able to augment existing schemas?

https://github.com/runem/lit-analyzer/blob/fc6f2d99b7a61368f21d174dc13bb54c36ca50d2/packages/vscode-lit-plugin/package.json#L632
https://github.com/runem/lit-analyzer/blob/fc6f2d99b7a61368f21d174dc13bb54c36ca50d2/packages/vscode-lit-plugin/schemas/tsconfig.schema.json

I wonder if there's a way to do this without requiring the extension, by having the wireit command automatically add a $schema field to the wireit section

Add clean command

Clarify/implement/test symlink behavior

Make sure we are handling symlinks well in input and output files.

Automatically treat package-lock.json as input files

Watch mode

Garbage collection for cache directory

The .wireit/<script>/cache directory currently can grow indefinitely. We should implement a garbage collection strategy to cap the size of this directory.

An LRU cache with a configurable maximum number of entries seems like what we want.

We will want to make sure we have an efficient way to maintain the cache hit rate data which scales well with the size of the cache. We will probably want some kind of on-disk index file that lets us read/write cache hit rates efficiently, to determine which cache entry needs to be deleted when the cap is hit. A doubly-linked-list implemented in the filesystem itself with symlinks (or just files containing SHAs) could also be an interesting way to do this.

Detect output outside of package dir during analysis instead of execution

As a safety precaution, we refuse to clean output files if they are outside of the current script's package. However, it would be even better to catch this error earlier, in the Analyzer, by looking at the output glob patterns. We just need to be careful that we are parsing and analyzing the glob patterns correctly, accounting for negations, {} groups, and other special glob syntax.

Clarify input files for scripts with dependencies

In this section: https://github.com/google/wireit#input-and-output-files

It reads like I would need to specify all the inputs of a bundle script, including the files it reads. But the bundle script example only includes the config as an input. I guess that the outputs of its dependencies are automatically used as inputs, but without seeing that example this isn't how I would have tried to write a bundle script myself.

Support limiting parallelism

Glob negations don't always work properly in output field

The negation in these glob patterns don't work:

{
  "output": [
    "output",
    "!output/excluded"
  ]
}

Because fast-glob gives us the directory called output, and then excludes any file called output/excluded from the result. However, since it already gave us the output directory, and we do recursive deletes and copies, the exclusion ends up having no effect.

Here's a test case that currently fails, which can go in clean.test.ts:

test(
  'glob negations apply to directory match contents',
  timeout(async ({rig}) => {
    const cmdA = await rig.newCommand();
    await rig.write({
      'package.json': {
        scripts: {
          a: 'wireit',
        },
        wireit: {
          a: {
            command: cmdA.command,
            output: ['output', '!output/excluded'],
          },
        },
      },
      'output/included': '',
      'output/excluded': '',
    });

    const exec = rig.exec('npm run a');
    const inv = await cmdA.nextInvocation();

    assert.not(await rig.exists('output/included'));
    assert.ok(await rig.exists('output/excluded'));

    inv.exit(0);
    const res = await exec.exit;
    assert.equal(res.code, 0);
    assert.equal(cmdA.numInvocations, 1);
  })
);

One solution that doesn't work is to only include files in our glob results, and have the user rewrite the above as:

{
  "output": [
    "output/**",
    "!output/excluded"
  ]
}

but the problem with that is that we then would never be able to delete empty directories.

So I think we actually need to post-process our glob results, detect when a hit is a directory, explicitly recursively expand that directory, and then re-apply the negations?

Recover from analysis errors in watch mode

An analysis error encountered in watch mode doesn't need to terminate the wireit command. We can keep watching all known package.json files and keep iterating.

Only run scripts that are not fresh

GitHub Actions caching

Matching directory in files array does not include its contents

When using output, matching a directory implicitly includes all of the contents of that directory (though see #77 for a caveat with how that is currently broken with ! negations). That's because when we clean and cache, we use recursive operations like fs.rm and fs.cp.

However, with files, which is used for generating the cache key, we don't use recursive operations. We just read and hash the files that directly matched the glob.

files should be consistent with output. Matching a directory should implicitly match all of its contents. This is also consistent with the how the package.json files array, and .gitignore files work.

Globbing doesn't support re-inclusion

Given the following:

foo/**
!foo/bar
foo/bar/baz

The file foo/bar/baz should be included, even though foo/bar was excluded. It looks like fast-glob doesn't care about the order that ! negated patterns appear in. This is different to how .gitignore and the files array in package.json files work, so we should fix it.

Concise way to depend on scripts in child and dependency workspaces

[1] Run `<script>` in all of my workspaces (parent → child).

With Wireit, you can already just do npm run build -ws to run a given script in all workspaces, which is the standard npm workspaces approach. However it's not fully optimal, because npm doesn't parallelize. So we will also support a syntax like this:

{
  "name": "root",
  "scripts": {
    "build": "wireit"
  },
  "wireit": {
    "build": {
      "dependencies": [
        {
          "script": "build",
          "packages": "workspaces"
        }
      ]
    }
  },
  "workspaces": [
    "packages/foo",
    "packages/bar"
  ]
}

Which is equivalent to:

{
  "name": "root",
  "scripts": {
    "build": "wireit"
  },
  "wireit": {
    "build": {
      "dependencies": {
        [
          "./packages/foo:build",
          "./packages/bar:build"
        ]
      }
    }
  }
}

[2] Run `<script>` in all of my dependencies (child → siblings).

Related, it is often useful to run some script in all of the current packages dependencies (where those dependencies are workspaces contained by the same workspace root).

{
  "name": "foo",
  "scripts": {
    "build": "wireit"
  },
  "wireit": {
    "build": {
      "dependencies": [
        {
          "script": "build",
          "packages": "dependencies"
        }
      ]
    }
  },
  "dependencies": {
    "bar",
    "baz"
  }
}

Which is equivalent to:

{
  "name": "foo",
  "scripts": {
    "build": "wireit"
  },
  "wireit": {
    "build": {
      "dependencies": [
        "../bar:build",
        "../baz:build"
      ]
    }
  }
}

Pretend to be a TTY

Some tools produce different output when they detect that terminal is not a TTY (interactive). For example, TypeScript produces colorized output by default only in TTY mode. Otherwise the --pretty flag must be specified to force colorized output.

When a process is run via Wireit, the process will think it is not attached to a TTY, because we use the default pipe setting to handle stdio from spawn (https://nodejs.org/api/child_process.html#optionsstdio), so it isn't attached to a TTY directly. We could instead use inherit, but that would not allow us to capture the output, which we need for storing stdio replays.

A downside of this is that it does not match the standard behavior of npm run, which uses inherit. Matching the behavior of npm is one of our goals.

Using a library like https://github.com/microsoft/node-pty may be the only way to trick processes into thinking they are attached to a TTY, while also being able to capture the output for the replay files. We should do a little more research to confirm this, as there is a chance there is a simpler solution. This library is somewhat large, includes a native library, and has a different interface to spawn. Note we would only want to do this when we detect we are running in a TTY.

Invalid package.json file should not cause execution to stop in watch mode

It should instead report the error and continue watching, as it's really common to save an invalid package.json file (e.g. trailing comma, array where an object should be)

Parent/child delete errors on clean build

If we try to do a clean build, and output paths include a parent and a child, then an error can be thrown if we happen to delete the parent before the child.

Firstly, we should know that we don't need to delete the child directory in the first place, by using optimizeCopies (which we should rename). But we should also not throw if we try to delete a directory whose parent has already been deleted.

Q: Do all wireit scripts need to also be npm scripts

Let's say I organize one logical script, like build, into several sub-steps, like build:ts and build:graph. Do I need to make the sub-steps into npm script if they're never intended to be called from npm run or as a dependency of another wireit script?

ie, do I need build:ts and build:graphql in scripts?:

  "scripts": {
    "build": "wireit",
    "build:ts": "wireit",
    "build:graphql": "wireit"
  },
  "wireit": {
    "build": {
      "dependencies": [
        "build:ts"
      ]
    },
    "build:ts" {
      "dependencies": [
        "build:graphql"
      ]
    },
    "build:graphql" { ... },
  }

[vscode-extension] Suggested change to convert npm script to use wireit

When the cursor is on an npm script, we can propose a change that converts that script to a wireit script (just copies the command text to the "command", converts the npm text to just "wireit")

Add WIREIT_FAILURES=kill mode kill running processes after one fails

Improvements around @actions/cache

Problem

The public API provided by the @actions/cache package doesn't exactly meet our needs, because it automatically uses the file paths that are included in the tarball as part of the cache entry version (see https://github.com/actions/toolkit/blob/7654d97eb6c4a3d564f036a2d4a783ae9105ec07/packages/cache/src/internal/cacheHttpClient.ts#L70), and implements globbing differently.

We want complete control over our cache key, instead of having it be generated automatically based on file paths -- and we want to be sure we are using identical globbing logic to the rest of Wireit.

Current solution

For this reason, we are currently reaching into the internal/ directory of @actions/cache to get more control. This is bad because those modules could change at any time, which is why we currently have a strict ("=") version pin in our package.json.

It's also why we currently have "skipLibCheck": false in our tsconfig.json, and why we have the file types/action-cache-contracts.d.ts -- because the file lib/internal/contracts.d.ts is missing from the published @actions/cache package.

The @actions/cache package is also our largest dependency by far. It's 22MB, and adds 63 transitive dependencies.

Options

We could file an issue or send a PR that provides a way to directly specify the cache key in @actions/cache. This would solve the version pinning problem, the "skipLibCheck": false problem, and would allow us to remove types/action-cache-contracts.d.ts -- but we'd still have the large dependency.
We could potentially move all of this logic into a separate package which is installed only by google/wireit@setup-github-actions-caching/v1 -- instead of the main package. The action would then spin up its own HTTP server, which we would talk to instead with a more minimal API (note that the server we spin up would have direct filesystem access, so it could make the tarballs). This would shrink the main wireit package's dependencies and filesize back down again (though the dependencies would still be installed -- just only in CI, instead of also locally), and would have the added benefit of not requiring us to expose the ACTIONS_CACHE_URL and ACTIONS_RUNTIME_TOKEN variables to all run steps (see actions/toolkit#1053).
The logic we need from @actions/cache could be re-implemented from scratch in a minimal way, such that we could drop the dependency on @actions/cache all together. The main tricky part is the way it handles tarball generation across platforms (https://github.com/actions/toolkit/blob/7654d97eb6c4a3d564f036a2d4a783ae9105ec07/packages/cache/src/internal/tar.ts).

A way to specify options for all scripts in a package

Possibility:

{
  "*": {
    "dependencies": [
      "bootstrap"
    ],
    "packageLocks": [
      "yarn.lock"
    ]
  }
}

* is unlikely to collide with a real script name, but we should probably support an escaping scheme anyway, e.g. \* means the script literally called *.

Stdio replayer should preserve stdout/stderr sequence

Context: #68 (comment)

I think the ideal solution requires actually encoding the cross-stream sequence in some way. We could do it with a unified format that can encode the stream, or another idea I had was to have a 3rd file which encodes the sequences as offset/length pairs:

out 0 20
err 0 10
out 20 100

So the replayer would follow these sequences and do something like call fs.read(stdoutFileDescriptor, {offset:0, length: 20}). I think that should have better performance over parsing a big unified format. In the case where there is no mixed streams, we can just stream the file straight through as we do now. Also it's kind of nice right now that you get a file like .wireit/<script>/stdout that the user can do stuff with directly if they want.

Can't run commands in npm workspaces

I'm getting an error when trying to run a basic command in a npm workspace.

my-project/package.json:

{
  "workspaces": [
    "packages/package-a",
  ]
}

my-project/packages/package-a/package.json:

{
  "name": "package-a",
  "scripts": {
    "foo": "wireit",
  },
  "wireit": {
    "foo": {
      "command": "echo FOO!",
    }
  }
}

my-project> npm run foo -w package-a

> [email protected] foo
> wireit

❌ [foo] No script named "foo" was found in /path/to/my-project
npm ERR! Lifecycle script `foo` failed with error: 
npm ERR! Error: command failed 
npm ERR!   in workspace: [email protected] 
npm ERR!   at location: /path/to/my-project/packages/package-a

This also happens if I cd into packages/package-a and run npm run foo

Option to clean output only when input file deleted

Problem

tsc, even in incremental mode, does not delete the output files corresponding to input files that have been deleted since its last build (see microsoft/TypeScript#30602 (comment)). For example:

Write foo.ts
Run tsc --build
- Generates foo.js
Rename foo.ts to bar.ts
Run tsc --build
- Generates bar.js, but foo.js still exists

Many build tools behave the same way.

Currently, if you have specified output, by default we delete all output files before running the script. This helps with the above problem, but makes it impossible to use efficient incremental modes, like tsc --build.

You can currently set clean: false to disable deleting before execution, however that means that stale outputs can still exist.

Proposal

Add a new on-delete option to clean which deletes output only if the unique set of matched input files for the script has changed since the last run.

This seems like it would provide a good balance between the two options we currently have; giving you incremental build every time a file is modified, or a new file is added -- but doing a clean build if a file is removed.

So the 3 options would now be:

true: Always delete before execution
false: Never delete before execution (but do still delete when restoring from cache)
"on-delete": Delete only if an input file changed

We could consider making "on-delete" the default, but true still seems like the safer default, because it's also very possible for a stale file to be left around due to a change in a config file (e.g. changing a rollup config to rename the bundle output file).

Example

{
  "build": {
    "command": "tsc --build",
    "files": [
      "src/**/*.ts"
    ],
    "output": [
      "lib/**",
      ".tsbuildinfo"
    ],
    "clean": "on-delete"
  }
}

Support persistent workers

Possibly implement the JSON version of Bazel's worker protocol: https://bazel.build/docs/persistent-workers

Add ways to automatically set (or reuse) input files

For tsc we generally already have inputs specified in our tsconfig (though I'm not sure if this is complete: tsc may read files outside of the include glob without error if they're imported by path). It'd be nice to be able to reuse that. The same might be true of Rollup configs and other tools.

Could there be a way to read input files from tool-specific configs? Maybe this is best for a worker protocol, or may there's a config plug-in system and/or a built-in set of integrations, that know how to read inputs and outputs from common tools.

For tools whose files are specified on the command line, maybe there's a way to specify that once and place it into the command line with substitution.

It will always run. It will never be skipped or restored from cache.
If something depends on the server, Wireit won't wait for the server to exit before the dependent script starts running. Instead, it just waits for the server process to be spawned.
If a server script is run directly (e.g. npm run serve), then it will stay running until the user kills Wireit (e.g. Ctrl-C).
If a server script is run indirectly (e.g. npm run script-that-depends-on-server), then the server script will stay running until all scripts which transitively depend on it have finished.
In watch mode, Wireit will restart the server whenever a dependency changes. If this isn't required for a particular dependency (such as for static assets that the server does not cache), the dependency edge can be annotated with "restart": false.

{
  "scripts": {
    "build": "wireit"
  },
  "wireit": {
    "serve": {
      "command": "node lib/server.js",
      "server": true,
      "dependencies": [
        "build:server",
        {
          "script": "build:assets",
          "restart": false
        }
      ],
      "files": [],
      "output": []
    }
  }
}

google / wireit Goto Github PK

wireit's Issues

[1] Run <script> in all of my workspaces (parent → child).

[2] Run <script> in all of my dependencies (child → siblings).

Problem

Current solution

Options

Problem

Proposal

Example

Recommend Projects

Recommend Topics

Recommend Org

[1] Run `<script>` in all of my workspaces (parent → child).

[2] Run `<script>` in all of my dependencies (child → siblings).