google / wireit Goto Github PK
View Code? Open in Web Editor NEWWireit upgrades your npm/pnpm/yarn scripts to make them smarter and more efficient.
License: Apache License 2.0
Wireit upgrades your npm/pnpm/yarn scripts to make them smarter and more efficient.
License: Apache License 2.0
I think a lot of monorepos will have scripts that depend on basically all subpackages. Take lit.dev's build
script:
"wireit": {
"build": {
"dependencies": [
"./packages/lit-dev-api:build",
"./packages/lit-dev-content:build",
"./packages/lit-dev-server:build",
"./packages/lit-dev-tests:build",
"./packages/lit-dev-tools-cjs:build",
"./packages/lit-dev-tools-esm:build"
]
},
This list is prone to getting out of sync as packages are added. That might be caught quickly when the project doesn't build correctly, but the list is still cumbersome.
Lerna and npm solve this by allowing you to run a script by name in any package that has that script. I could see wireit supporting something similar in a few ways:
"wireit": {
"build": {
"dependencies": [
"./packages/**/*:build",
]
},
"wireit": {
"build": {
"dependencies": [
"./packages/**/*", // Uses "build" automatically
]
},
"workspaces"
field"workspaces"
field as a single source of truth for the list of all subpackages and run the build
script in each: "wireit": {
"build": {
"dependencies": "auto"
},
This version of the idea would really only be useful on the top-level package. To work in workspaces automatically would require some magic around looking at which dependencies are local to the monorepo and running scripts in them. Maybe that's doable.
Since packages would still need their scripts and dependencies specified, even if auto, it seems like this wouldn't require wireit to topo-sort the packages.
If a script is fresh or cached, we should replay the cached stdout/stderr. Help for e.g. test output, since it may not be clear that the tests are in a passing state when nothing is logged.
We usually know what script we're running when an unexpected error occurs, so at minimum we couldwrap unexpected errors with the script context and include it when we display it.
There might also be additional context we could add to help narrow down where in the code the error occurred. Especially given the fact that fs
errors in Node don't include stack traces (nodejs/node#30944).
It could be useful to sometimes run a script multiple times. For example, to detect a flaky test, maybe something like this:
npm test -- --detect-failure-rate --iterations=10
This would run the script 10 times, and then report the % of times it failed at the end.
No need to re-analyze or re-create watchers unless a package.json file has changed (and watchers only need to be re-created if a files
array changed, and even then only the affected ones do).
We can share the CachingPackageJsonReader
instance across analysis. File change events tell us when a cached result is stale.
We can cache glob results in memory. File change events tell us when a cached result is stale.
We can cache file hashes in memory. File change events tell us when a cached result is stale.
Tells you what scripts would be executed in what order, which existing output files would be deleted, etc.
In hooking up wireit in the lit-analyzer repo, one of the sub-repos – vscode-plugin – depends on vsce, which exposes the vsce
command, and one of its scripts package-for-test
uses that command.
If I run npm run package-for-test
from packages/vscode-plugin then when wireit runs the command, vsce is found.
If I run a script that depends on ./packages/vscode-plugin:package-for-test
from the root of the repo, vsce is not found.
Log output:
$ npm run just-package
> [email protected] just-package
> wireit
✅ [packages/lit-analyzer:build] Already fresh
✅ [packages/ts-lit-plugin:build] Already fresh
✅ [packages/vscode-lit-plugin:build] Already fresh
🏃 [packages/vscode-lit-plugin:package-for-test] Running command "vsce package -o ./out/packaged.vsix && rm -rf ../../../packaged-extension/ && mkdir ../../../packaged-extension/ && unzip -qq ./out/packaged.vsix -d ../../../packaged-extension/"
/bin/sh: vsce: command not found
❌ [packages/vscode-lit-plugin:package-for-test] Failed with exit status 127
Additional diagnostics which aren't expressible as part of the JSON schema:
Probably "cache": false
.
It seems like we might want to control it per-caching implementation too. You might not want local caching (e.g. because it's a really fast script with incremental build so you don't want it during development), but still want it when using GitHub Actions.
Two scripts should probably not be able to set their output
to patterns that could overlap. Especially because when clean
is enabled (the default), one script could clobber the output of an other by mistake.
It looks like a VS Code plugin can contribute json schema files: https://code.visualstudio.com/api/references/contribution-points#contributes.jsonValidation
And those json schema files may be able to augment existing schemas?
https://github.com/runem/lit-analyzer/blob/fc6f2d99b7a61368f21d174dc13bb54c36ca50d2/packages/vscode-lit-plugin/package.json#L632
https://github.com/runem/lit-analyzer/blob/fc6f2d99b7a61368f21d174dc13bb54c36ca50d2/packages/vscode-lit-plugin/schemas/tsconfig.schema.json
I wonder if there's a way to do this without requiring the extension, by having the wireit command automatically add a $schema field to the wireit section
Make sure we are handling symlinks well in input and output files.
The .wireit/<script>/cache
directory currently can grow indefinitely. We should implement a garbage collection strategy to cap the size of this directory.
An LRU cache with a configurable maximum number of entries seems like what we want.
We will want to make sure we have an efficient way to maintain the cache hit rate data which scales well with the size of the cache. We will probably want some kind of on-disk index file that lets us read/write cache hit rates efficiently, to determine which cache entry needs to be deleted when the cap is hit. A doubly-linked-list implemented in the filesystem itself with symlinks (or just files containing SHAs) could also be an interesting way to do this.
As a safety precaution, we refuse to clean output files if they are outside of the current script's package. However, it would be even better to catch this error earlier, in the Analyzer, by looking at the output
glob patterns. We just need to be careful that we are parsing and analyzing the glob patterns correctly, accounting for negations, {}
groups, and other special glob syntax.
In this section: https://github.com/google/wireit#input-and-output-files
It reads like I would need to specify all the inputs of a bundle script, including the files it reads. But the bundle
script example only includes the config as an input. I guess that the outputs of its dependencies are automatically used as inputs, but without seeing that example this isn't how I would have tried to write a bundle script myself.
The negation in these glob patterns don't work:
{
"output": [
"output",
"!output/excluded"
]
}
Because fast-glob gives us the directory called output
, and then excludes any file called output/excluded
from the result. However, since it already gave us the output
directory, and we do recursive deletes and copies, the exclusion ends up having no effect.
Here's a test case that currently fails, which can go in clean.test.ts
:
test(
'glob negations apply to directory match contents',
timeout(async ({rig}) => {
const cmdA = await rig.newCommand();
await rig.write({
'package.json': {
scripts: {
a: 'wireit',
},
wireit: {
a: {
command: cmdA.command,
output: ['output', '!output/excluded'],
},
},
},
'output/included': '',
'output/excluded': '',
});
const exec = rig.exec('npm run a');
const inv = await cmdA.nextInvocation();
assert.not(await rig.exists('output/included'));
assert.ok(await rig.exists('output/excluded'));
inv.exit(0);
const res = await exec.exit;
assert.equal(res.code, 0);
assert.equal(cmdA.numInvocations, 1);
})
);
One solution that doesn't work is to only include files in our glob results, and have the user rewrite the above as:
{
"output": [
"output/**",
"!output/excluded"
]
}
but the problem with that is that we then would never be able to delete empty directories.
So I think we actually need to post-process our glob results, detect when a hit is a directory, explicitly recursively expand that directory, and then re-apply the negations?
An analysis error encountered in watch mode doesn't need to terminate the wireit command. We can keep watching all known package.json
files and keep iterating.
When using output
, matching a directory implicitly includes all of the contents of that directory (though see #77 for a caveat with how that is currently broken with !
negations). That's because when we clean and cache, we use recursive operations like fs.rm
and fs.cp
.
However, with files
, which is used for generating the cache key, we don't use recursive operations. We just read and hash the files that directly matched the glob.
files
should be consistent with output
. Matching a directory should implicitly match all of its contents. This is also consistent with the how the package.json
files array, and .gitignore
files work.
Given the following:
foo/**
!foo/bar
foo/bar/baz
The file foo/bar/baz
should be included, even though foo/bar
was excluded. It looks like fast-glob
doesn't care about the order that !
negated patterns appear in. This is different to how .gitignore
and the files
array in package.json
files work, so we should fix it.
<script>
in all of my workspaces (parent → child).With Wireit, you can already just do npm run build -ws
to run a given script in all workspaces, which is the standard npm workspaces approach. However it's not fully optimal, because npm doesn't parallelize. So we will also support a syntax like this:
{
"name": "root",
"scripts": {
"build": "wireit"
},
"wireit": {
"build": {
"dependencies": [
{
"script": "build",
"packages": "workspaces"
}
]
}
},
"workspaces": [
"packages/foo",
"packages/bar"
]
}
Which is equivalent to:
{
"name": "root",
"scripts": {
"build": "wireit"
},
"wireit": {
"build": {
"dependencies": {
[
"./packages/foo:build",
"./packages/bar:build"
]
}
}
}
}
<script>
in all of my dependencies (child → siblings).Related, it is often useful to run some script in all of the current packages dependencies (where those dependencies are workspaces contained by the same workspace root).
{
"name": "foo",
"scripts": {
"build": "wireit"
},
"wireit": {
"build": {
"dependencies": [
{
"script": "build",
"packages": "dependencies"
}
]
}
},
"dependencies": {
"bar",
"baz"
}
}
Which is equivalent to:
{
"name": "foo",
"scripts": {
"build": "wireit"
},
"wireit": {
"build": {
"dependencies": [
"../bar:build",
"../baz:build"
]
}
}
}
Some tools produce different output when they detect that terminal is not a TTY (interactive). For example, TypeScript produces colorized output by default only in TTY mode. Otherwise the --pretty
flag must be specified to force colorized output.
When a process is run via Wireit, the process will think it is not attached to a TTY, because we use the default pipe
setting to handle stdio from spawn
(https://nodejs.org/api/child_process.html#optionsstdio), so it isn't attached to a TTY directly. We could instead use inherit
, but that would not allow us to capture the output, which we need for storing stdio replays.
A downside of this is that it does not match the standard behavior of npm run
, which uses inherit
. Matching the behavior of npm is one of our goals.
Using a library like https://github.com/microsoft/node-pty may be the only way to trick processes into thinking they are attached to a TTY, while also being able to capture the output for the replay files. We should do a little more research to confirm this, as there is a chance there is a simpler solution. This library is somewhat large, includes a native library, and has a different interface to spawn
. Note we would only want to do this when we detect we are running in a TTY.
It should instead report the error and continue watching, as it's really common to save an invalid package.json file (e.g. trailing comma, array where an object should be)
If we try to do a clean build, and output paths include a parent and a child, then an error can be thrown if we happen to delete the parent before the child.
Firstly, we should know that we don't need to delete the child directory in the first place, by using optimizeCopies
(which we should rename). But we should also not throw if we try to delete a directory whose parent has already been deleted.
Let's say I organize one logical script, like build
, into several sub-steps, like build:ts
and build:graph
. Do I need to make the sub-steps into npm script if they're never intended to be called from npm run
or as a dependency of another wireit script?
ie, do I need build:ts
and build:graphql
in scripts
?:
"scripts": {
"build": "wireit",
"build:ts": "wireit",
"build:graphql": "wireit"
},
"wireit": {
"build": {
"dependencies": [
"build:ts"
]
},
"build:ts" {
"dependencies": [
"build:graphql"
]
},
"build:graphql" { ... },
}
When the cursor is on an npm script, we can propose a change that converts that script to a wireit script (just copies the command text to the "command", converts the npm text to just "wireit")
The public API provided by the @actions/cache
package doesn't exactly meet our needs, because it automatically uses the file paths that are included in the tarball as part of the cache entry version (see https://github.com/actions/toolkit/blob/7654d97eb6c4a3d564f036a2d4a783ae9105ec07/packages/cache/src/internal/cacheHttpClient.ts#L70), and implements globbing differently.
We want complete control over our cache key, instead of having it be generated automatically based on file paths -- and we want to be sure we are using identical globbing logic to the rest of Wireit.
For this reason, we are currently reaching into the internal/
directory of @actions/cache
to get more control. This is bad because those modules could change at any time, which is why we currently have a strict ("="
) version pin in our package.json.
It's also why we currently have "skipLibCheck": false
in our tsconfig.json
, and why we have the file types/action-cache-contracts.d.ts
-- because the file lib/internal/contracts.d.ts
is missing from the published @actions/cache
package.
The @actions/cache
package is also our largest dependency by far. It's 22MB, and adds 63 transitive dependencies.
We could file an issue or send a PR that provides a way to directly specify the cache key in @actions/cache
. This would solve the version pinning problem, the "skipLibCheck": false
problem, and would allow us to remove types/action-cache-contracts.d.ts
-- but we'd still have the large dependency.
We could potentially move all of this logic into a separate package which is installed only by google/wireit@setup-github-actions-caching/v1
-- instead of the main package. The action would then spin up its own HTTP server, which we would talk to instead with a more minimal API (note that the server we spin up would have direct filesystem access, so it could make the tarballs). This would shrink the main wireit
package's dependencies and filesize back down again (though the dependencies would still be installed -- just only in CI, instead of also locally), and would have the added benefit of not requiring us to expose the ACTIONS_CACHE_URL
and ACTIONS_RUNTIME_TOKEN
variables to all run
steps (see actions/toolkit#1053).
The logic we need from @actions/cache
could be re-implemented from scratch in a minimal way, such that we could drop the dependency on @actions/cache
all together. The main tricky part is the way it handles tarball generation across platforms (https://github.com/actions/toolkit/blob/7654d97eb6c4a3d564f036a2d4a783ae9105ec07/packages/cache/src/internal/tar.ts).
Possibility:
{
"*": {
"dependencies": [
"bootstrap"
],
"packageLocks": [
"yarn.lock"
]
}
}
*
is unlikely to collide with a real script name, but we should probably support an escaping scheme anyway, e.g. \*
means the script literally called *
.
Context: #68 (comment)
I think the ideal solution requires actually encoding the cross-stream sequence in some way. We could do it with a unified format that can encode the stream, or another idea I had was to have a 3rd file which encodes the sequences as offset/length pairs:
out 0 20
err 0 10
out 20 100
So the replayer would follow these sequences and do something like call fs.read(stdoutFileDescriptor, {offset:0, length: 20})
. I think that should have better performance over parsing a big unified format. In the case where there is no mixed streams, we can just stream the file straight through as we do now. Also it's kind of nice right now that you get a file like .wireit/<script>/stdout
that the user can do stuff with directly if they want.
I'm getting an error when trying to run a basic command in a npm workspace.
my-project/package.json
:
{
"workspaces": [
"packages/package-a",
]
}
my-project/packages/package-a/package.json
:
{
"name": "package-a",
"scripts": {
"foo": "wireit",
},
"wireit": {
"foo": {
"command": "echo FOO!",
}
}
}
my-project> npm run foo -w package-a
> [email protected] foo
> wireit
❌ [foo] No script named "foo" was found in /path/to/my-project
npm ERR! Lifecycle script `foo` failed with error:
npm ERR! Error: command failed
npm ERR! in workspace: [email protected]
npm ERR! at location: /path/to/my-project/packages/package-a
This also happens if I cd into packages/package-a
and run npm run foo
tsc
, even in incremental mode, does not delete the output files corresponding to input files that have been deleted since its last build (see microsoft/TypeScript#30602 (comment)). For example:
foo.ts
tsc --build
foo.js
foo.ts
to bar.ts
tsc --build
bar.js
, but foo.js
still existsMany build tools behave the same way.
Currently, if you have specified output
, by default we delete all output files before running the script. This helps with the above problem, but makes it impossible to use efficient incremental modes, like tsc --build
.
You can currently set clean: false
to disable deleting before execution, however that means that stale outputs can still exist.
Add a new on-delete
option to clean
which deletes output only if the unique set of matched input files
for the script has changed since the last run.
This seems like it would provide a good balance between the two options we currently have; giving you incremental build every time a file is modified, or a new file is added -- but doing a clean build if a file is removed.
So the 3 options would now be:
true
: Always delete before executionfalse
: Never delete before execution (but do still delete when restoring from cache)"on-delete"
: Delete only if an input file changedWe could consider making "on-delete"
the default, but true
still seems like the safer default, because it's also very possible for a stale file to be left around due to a change in a config file (e.g. changing a rollup config to rename the bundle output file).
{
"build": {
"command": "tsc --build",
"files": [
"src/**/*.ts"
],
"output": [
"lib/**",
".tsbuildinfo"
],
"clean": "on-delete"
}
}
Possibly implement the JSON version of Bazel's worker protocol: https://bazel.build/docs/persistent-workers
For tsc we generally already have inputs specified in our tsconfig (though I'm not sure if this is complete: tsc may read files outside of the include
glob without error if they're imported by path). It'd be nice to be able to reuse that. The same might be true of Rollup configs and other tools.
Could there be a way to read input files from tool-specific configs? Maybe this is best for a worker protocol, or may there's a config plug-in system and/or a built-in set of integrations, that know how to read inputs and outputs from common tools.
For tools whose files are specified on the command line, maybe there's a way to specify that once and place it into the command line with substitution.
If automatic output cleaning tries to delete a file that is tracked by Git, then that's a good indication that the output glob could be too broad.
This could be more expensive than we want, since it requires calling out to another process and would block every script's execution. Maybe something to put in a diagnose
mode, rather than something we do on every build.
By default, Wireit assumes that scripts eventually exit by themselves. This works well for things like building or testing. But sometimes a script runs indefinitely, such as a server.
Setting "server": true
will tell Wireit that a script runs indefinitely. This has the following effects:
It will always run. It will never be skipped or restored from cache.
If something depends on the server, Wireit won't wait for the server to exit before the dependent script starts running. Instead, it just waits for the server process to be spawned.
If a server script is run directly (e.g. npm run serve
), then it will stay running until the user kills Wireit (e.g. Ctrl-C
).
If a server script is run indirectly (e.g. npm run script-that-depends-on-server
), then the server script will stay running until all scripts which transitively depend on it have finished.
In watch mode, Wireit will restart the server whenever a dependency changes. If this isn't required for a particular dependency (such as for static assets that the server does not cache), the dependency edge can be annotated with "restart": false
.
{
"scripts": {
"build": "wireit"
},
"wireit": {
"serve": {
"command": "node lib/server.js",
"server": true,
"dependencies": [
"build:server",
{
"script": "build:assets",
"restart": false
}
],
"files": [],
"output": []
}
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.