Git Product home page Git Product logo

wireit's Issues

A way to shorten dependency lists: wildcards or auto-dependencies

I think a lot of monorepos will have scripts that depend on basically all subpackages. Take lit.dev's build script:

  "wireit": {
    "build": {
      "dependencies": [
        "./packages/lit-dev-api:build",
        "./packages/lit-dev-content:build",
        "./packages/lit-dev-server:build",
        "./packages/lit-dev-tests:build",
        "./packages/lit-dev-tools-cjs:build",
        "./packages/lit-dev-tools-esm:build"
      ]
    },

This list is prone to getting out of sync as packages are added. That might be caught quickly when the project doesn't build correctly, but the list is still cumbersome.

Lerna and npm solve this by allowing you to run a script by name in any package that has that script. I could see wireit supporting something similar in a few ways:

  1. Wildcards:
  "wireit": {
    "build": {
      "dependencies": [
        "./packages/**/*:build",
      ]
    },
  1. Wildcard with optional script name
  "wireit": {
    "build": {
      "dependencies": [
        "./packages/**/*", // Uses "build" automatically
      ]
    },
  1. Auto-deps based on "workspaces" field
    This would read the "workspaces" field as a single source of truth for the list of all subpackages and run the build script in each:
  "wireit": {
    "build": {
      "dependencies": "auto"
    },

This version of the idea would really only be useful on the top-level package. To work in workspaces automatically would require some magic around looking at which dependencies are local to the monorepo and running scripts in them. Maybe that's doable.

Since packages would still need their scripts and dependencies specified, even if auto, it seems like this wouldn't require wireit to topo-sort the packages.

Replay stdout/stderr

If a script is fresh or cached, we should replay the cached stdout/stderr. Help for e.g. test output, since it may not be clear that the tests are in a passing state when nothing is logged.

Unexpected errors should be reported with more context

We usually know what script we're running when an unexpected error occurs, so at minimum we couldwrap unexpected errors with the script context and include it when we display it.

There might also be additional context we could add to help narrow down where in the code the error occurred. Especially given the fact that fs errors in Node don't include stack traces (nodejs/node#30944).

Multiple iterations and flake detection

It could be useful to sometimes run a script multiple times. For example, to detect a flaky test, maybe something like this:

npm test -- --detect-failure-rate --iterations=10

This would run the script 10 times, and then report the % of times it failed at the end.

Watch mode optimizations

  • No need to re-analyze or re-create watchers unless a package.json file has changed (and watchers only need to be re-created if a files array changed, and even then only the affected ones do).

  • We can share the CachingPackageJsonReader instance across analysis. File change events tell us when a cached result is stale.

  • We can cache glob results in memory. File change events tell us when a cached result is stale.

  • We can cache file hashes in memory. File change events tell us when a cached result is stale.

Dry run mode

Tells you what scripts would be executed in what order, which existing output files would be deleted, etc.

binary lookup fails in cross-package dependencies

In hooking up wireit in the lit-analyzer repo, one of the sub-repos – vscode-plugin – depends on vsce, which exposes the vsce command, and one of its scripts package-for-test uses that command.

If I run npm run package-for-test from packages/vscode-plugin then when wireit runs the command, vsce is found.

If I run a script that depends on ./packages/vscode-plugin:package-for-test from the root of the repo, vsce is not found.

Log output:

$ npm run just-package

> [email protected] just-package
> wireit

✅ [packages/lit-analyzer:build] Already fresh
✅ [packages/ts-lit-plugin:build] Already fresh
✅ [packages/vscode-lit-plugin:build] Already fresh
🏃 [packages/vscode-lit-plugin:package-for-test] Running command "vsce package -o ./out/packaged.vsix && rm -rf ../../../packaged-extension/ && mkdir ../../../packaged-extension/ && unzip -qq ./out/packaged.vsix -d ../../../packaged-extension/"
/bin/sh: vsce: command not found
❌ [packages/vscode-lit-plugin:package-for-test] Failed with exit status 127

[vscode-extension] Extended validation

Additional diagnostics which aren't expressible as part of the JSON schema:

  • every wireit script needs to be present in the scripts section
  • every wireit script needs to just run "wireit" in the scripts section
  • every dependency needs to resolve to an npm script

Setting to turn off caching for a specific script

Probably "cache": false.

It seems like we might want to control it per-caching implementation too. You might not want local caching (e.g. because it's a really fast script with incremental build so you don't want it during development), but still want it when using GitHub Actions.

Detect overlapping output

Two scripts should probably not be able to set their output to patterns that could overlap. Especially because when clean is enabled (the default), one script could clobber the output of an other by mistake.

JSON schema for our package.json fields

It looks like a VS Code plugin can contribute json schema files: https://code.visualstudio.com/api/references/contribution-points#contributes.jsonValidation

And those json schema files may be able to augment existing schemas?

https://github.com/runem/lit-analyzer/blob/fc6f2d99b7a61368f21d174dc13bb54c36ca50d2/packages/vscode-lit-plugin/package.json#L632
https://github.com/runem/lit-analyzer/blob/fc6f2d99b7a61368f21d174dc13bb54c36ca50d2/packages/vscode-lit-plugin/schemas/tsconfig.schema.json

I wonder if there's a way to do this without requiring the extension, by having the wireit command automatically add a $schema field to the wireit section

Garbage collection for cache directory

The .wireit/<script>/cache directory currently can grow indefinitely. We should implement a garbage collection strategy to cap the size of this directory.

An LRU cache with a configurable maximum number of entries seems like what we want.

We will want to make sure we have an efficient way to maintain the cache hit rate data which scales well with the size of the cache. We will probably want some kind of on-disk index file that lets us read/write cache hit rates efficiently, to determine which cache entry needs to be deleted when the cap is hit. A doubly-linked-list implemented in the filesystem itself with symlinks (or just files containing SHAs) could also be an interesting way to do this.

Detect output outside of package dir during analysis instead of execution

As a safety precaution, we refuse to clean output files if they are outside of the current script's package. However, it would be even better to catch this error earlier, in the Analyzer, by looking at the output glob patterns. We just need to be careful that we are parsing and analyzing the glob patterns correctly, accounting for negations, {} groups, and other special glob syntax.

Glob negations don't always work properly in output field

The negation in these glob patterns don't work:

{
  "output": [
    "output",
    "!output/excluded"
  ]
}

Because fast-glob gives us the directory called output, and then excludes any file called output/excluded from the result. However, since it already gave us the output directory, and we do recursive deletes and copies, the exclusion ends up having no effect.

Here's a test case that currently fails, which can go in clean.test.ts:

test(
  'glob negations apply to directory match contents',
  timeout(async ({rig}) => {
    const cmdA = await rig.newCommand();
    await rig.write({
      'package.json': {
        scripts: {
          a: 'wireit',
        },
        wireit: {
          a: {
            command: cmdA.command,
            output: ['output', '!output/excluded'],
          },
        },
      },
      'output/included': '',
      'output/excluded': '',
    });

    const exec = rig.exec('npm run a');
    const inv = await cmdA.nextInvocation();

    assert.not(await rig.exists('output/included'));
    assert.ok(await rig.exists('output/excluded'));

    inv.exit(0);
    const res = await exec.exit;
    assert.equal(res.code, 0);
    assert.equal(cmdA.numInvocations, 1);
  })
);

One solution that doesn't work is to only include files in our glob results, and have the user rewrite the above as:

{
  "output": [
    "output/**",
    "!output/excluded"
  ]
}

but the problem with that is that we then would never be able to delete empty directories.

So I think we actually need to post-process our glob results, detect when a hit is a directory, explicitly recursively expand that directory, and then re-apply the negations?

Matching directory in files array does not include its contents

When using output, matching a directory implicitly includes all of the contents of that directory (though see #77 for a caveat with how that is currently broken with ! negations). That's because when we clean and cache, we use recursive operations like fs.rm and fs.cp.

However, with files, which is used for generating the cache key, we don't use recursive operations. We just read and hash the files that directly matched the glob.

files should be consistent with output. Matching a directory should implicitly match all of its contents. This is also consistent with the how the package.json files array, and .gitignore files work.

Globbing doesn't support re-inclusion

Given the following:

foo/**
!foo/bar
foo/bar/baz

The file foo/bar/baz should be included, even though foo/bar was excluded. It looks like fast-glob doesn't care about the order that ! negated patterns appear in. This is different to how .gitignore and the files array in package.json files work, so we should fix it.

Concise way to depend on scripts in child and dependency workspaces

[1] Run <script> in all of my workspaces (parent → child).

With Wireit, you can already just do npm run build -ws to run a given script in all workspaces, which is the standard npm workspaces approach. However it's not fully optimal, because npm doesn't parallelize. So we will also support a syntax like this:

{
  "name": "root",
  "scripts": {
    "build": "wireit"
  },
  "wireit": {
    "build": {
      "dependencies": [
        {
          "script": "build",
          "packages": "workspaces"
        }
      ]
    }
  },
  "workspaces": [
    "packages/foo",
    "packages/bar"
  ]
}

Which is equivalent to:

{
  "name": "root",
  "scripts": {
    "build": "wireit"
  },
  "wireit": {
    "build": {
      "dependencies": {
        [
          "./packages/foo:build",
          "./packages/bar:build"
        ]
      }
    }
  }
}

[2] Run <script> in all of my dependencies (child → siblings).

Related, it is often useful to run some script in all of the current packages dependencies (where those dependencies are workspaces contained by the same workspace root).

{
  "name": "foo",
  "scripts": {
    "build": "wireit"
  },
  "wireit": {
    "build": {
      "dependencies": [
        {
          "script": "build",
          "packages": "dependencies"
        }
      ]
    }
  },
  "dependencies": {
    "bar",
    "baz"
  }
}

Which is equivalent to:

{
  "name": "foo",
  "scripts": {
    "build": "wireit"
  },
  "wireit": {
    "build": {
      "dependencies": [
        "../bar:build",
        "../baz:build"
      ]
    }
  }
}

Pretend to be a TTY

Some tools produce different output when they detect that terminal is not a TTY (interactive). For example, TypeScript produces colorized output by default only in TTY mode. Otherwise the --pretty flag must be specified to force colorized output.

When a process is run via Wireit, the process will think it is not attached to a TTY, because we use the default pipe setting to handle stdio from spawn (https://nodejs.org/api/child_process.html#optionsstdio), so it isn't attached to a TTY directly. We could instead use inherit, but that would not allow us to capture the output, which we need for storing stdio replays.

A downside of this is that it does not match the standard behavior of npm run, which uses inherit. Matching the behavior of npm is one of our goals.

Using a library like https://github.com/microsoft/node-pty may be the only way to trick processes into thinking they are attached to a TTY, while also being able to capture the output for the replay files. We should do a little more research to confirm this, as there is a chance there is a simpler solution. This library is somewhat large, includes a native library, and has a different interface to spawn. Note we would only want to do this when we detect we are running in a TTY.

Parent/child delete errors on clean build

If we try to do a clean build, and output paths include a parent and a child, then an error can be thrown if we happen to delete the parent before the child.

Firstly, we should know that we don't need to delete the child directory in the first place, by using optimizeCopies (which we should rename). But we should also not throw if we try to delete a directory whose parent has already been deleted.

Q: Do all wireit scripts need to also be npm scripts

Let's say I organize one logical script, like build, into several sub-steps, like build:ts and build:graph. Do I need to make the sub-steps into npm script if they're never intended to be called from npm run or as a dependency of another wireit script?

ie, do I need build:ts and build:graphql in scripts?:

  "scripts": {
    "build": "wireit",
    "build:ts": "wireit",
    "build:graphql": "wireit"
  },
  "wireit": {
    "build": {
      "dependencies": [
        "build:ts"
      ]
    },
    "build:ts" {
      "dependencies": [
        "build:graphql"
      ]
    },
    "build:graphql" { ... },
  }

Improvements around @actions/cache

Problem

The public API provided by the @actions/cache package doesn't exactly meet our needs, because it automatically uses the file paths that are included in the tarball as part of the cache entry version (see https://github.com/actions/toolkit/blob/7654d97eb6c4a3d564f036a2d4a783ae9105ec07/packages/cache/src/internal/cacheHttpClient.ts#L70), and implements globbing differently.

We want complete control over our cache key, instead of having it be generated automatically based on file paths -- and we want to be sure we are using identical globbing logic to the rest of Wireit.

Current solution

For this reason, we are currently reaching into the internal/ directory of @actions/cache to get more control. This is bad because those modules could change at any time, which is why we currently have a strict ("=") version pin in our package.json.

It's also why we currently have "skipLibCheck": false in our tsconfig.json, and why we have the file types/action-cache-contracts.d.ts -- because the file lib/internal/contracts.d.ts is missing from the published @actions/cache package.

The @actions/cache package is also our largest dependency by far. It's 22MB, and adds 63 transitive dependencies.

Options

  1. We could file an issue or send a PR that provides a way to directly specify the cache key in @actions/cache. This would solve the version pinning problem, the "skipLibCheck": false problem, and would allow us to remove types/action-cache-contracts.d.ts -- but we'd still have the large dependency.

  2. We could potentially move all of this logic into a separate package which is installed only by google/wireit@setup-github-actions-caching/v1 -- instead of the main package. The action would then spin up its own HTTP server, which we would talk to instead with a more minimal API (note that the server we spin up would have direct filesystem access, so it could make the tarballs). This would shrink the main wireit package's dependencies and filesize back down again (though the dependencies would still be installed -- just only in CI, instead of also locally), and would have the added benefit of not requiring us to expose the ACTIONS_CACHE_URL and ACTIONS_RUNTIME_TOKEN variables to all run steps (see actions/toolkit#1053).

  3. The logic we need from @actions/cache could be re-implemented from scratch in a minimal way, such that we could drop the dependency on @actions/cache all together. The main tricky part is the way it handles tarball generation across platforms (https://github.com/actions/toolkit/blob/7654d97eb6c4a3d564f036a2d4a783ae9105ec07/packages/cache/src/internal/tar.ts).

A way to specify options for all scripts in a package

Possibility:

{
  "*": {
    "dependencies": [
      "bootstrap"
    ],
    "packageLocks": [
      "yarn.lock"
    ]
  }
}

* is unlikely to collide with a real script name, but we should probably support an escaping scheme anyway, e.g. \* means the script literally called *.

Stdio replayer should preserve stdout/stderr sequence

Context: #68 (comment)

I think the ideal solution requires actually encoding the cross-stream sequence in some way. We could do it with a unified format that can encode the stream, or another idea I had was to have a 3rd file which encodes the sequences as offset/length pairs:

out 0 20
err 0 10
out 20 100

So the replayer would follow these sequences and do something like call fs.read(stdoutFileDescriptor, {offset:0, length: 20}). I think that should have better performance over parsing a big unified format. In the case where there is no mixed streams, we can just stream the file straight through as we do now. Also it's kind of nice right now that you get a file like .wireit/<script>/stdout that the user can do stuff with directly if they want.

Can't run commands in npm workspaces

I'm getting an error when trying to run a basic command in a npm workspace.

my-project/package.json:

{
  "workspaces": [
    "packages/package-a",
  ]
}

my-project/packages/package-a/package.json:

{
  "name": "package-a",
  "scripts": {
    "foo": "wireit",
  },
  "wireit": {
    "foo": {
      "command": "echo FOO!",
    }
  }
}
my-project> npm run foo -w package-a

> [email protected] foo
> wireit

❌ [foo] No script named "foo" was found in /path/to/my-project
npm ERR! Lifecycle script `foo` failed with error: 
npm ERR! Error: command failed 
npm ERR!   in workspace: [email protected] 
npm ERR!   at location: /path/to/my-project/packages/package-a

This also happens if I cd into packages/package-a and run npm run foo

Option to clean output only when input file deleted

Problem

tsc, even in incremental mode, does not delete the output files corresponding to input files that have been deleted since its last build (see microsoft/TypeScript#30602 (comment)). For example:

  • Write foo.ts
  • Run tsc --build
    • Generates foo.js
  • Rename foo.ts to bar.ts
  • Run tsc --build
    • Generates bar.js, but foo.js still exists

Many build tools behave the same way.

Currently, if you have specified output, by default we delete all output files before running the script. This helps with the above problem, but makes it impossible to use efficient incremental modes, like tsc --build.

You can currently set clean: false to disable deleting before execution, however that means that stale outputs can still exist.

Proposal

Add a new on-delete option to clean which deletes output only if the unique set of matched input files for the script has changed since the last run.

This seems like it would provide a good balance between the two options we currently have; giving you incremental build every time a file is modified, or a new file is added -- but doing a clean build if a file is removed.

So the 3 options would now be:

  • true: Always delete before execution
  • false: Never delete before execution (but do still delete when restoring from cache)
  • "on-delete": Delete only if an input file changed

We could consider making "on-delete" the default, but true still seems like the safer default, because it's also very possible for a stale file to be left around due to a change in a config file (e.g. changing a rollup config to rename the bundle output file).

Example

{
  "build": {
    "command": "tsc --build",
    "files": [
      "src/**/*.ts"
    ],
    "output": [
      "lib/**",
      ".tsbuildinfo"
    ],
    "clean": "on-delete"
  }
}

Add ways to automatically set (or reuse) input files

For tsc we generally already have inputs specified in our tsconfig (though I'm not sure if this is complete: tsc may read files outside of the include glob without error if they're imported by path). It'd be nice to be able to reuse that. The same might be true of Rollup configs and other tools.

Could there be a way to read input files from tool-specific configs? Maybe this is best for a worker protocol, or may there's a config plug-in system and/or a built-in set of integrations, that know how to read inputs and outputs from common tools.

For tools whose files are specified on the command line, maybe there's a way to specify that once and place it into the command line with substitution.

Warn if we are cleaning files that are tracked by Git

If automatic output cleaning tries to delete a file that is tracked by Git, then that's a good indication that the output glob could be too broad.

This could be more expensive than we want, since it requires calling out to another process and would block every script's execution. Maybe something to put in a diagnose mode, rather than something we do on every build.

Service mode

By default, Wireit assumes that scripts eventually exit by themselves. This works well for things like building or testing. But sometimes a script runs indefinitely, such as a server.

Setting "server": true will tell Wireit that a script runs indefinitely. This has the following effects:

  • It will always run. It will never be skipped or restored from cache.

  • If something depends on the server, Wireit won't wait for the server to exit before the dependent script starts running. Instead, it just waits for the server process to be spawned.

  • If a server script is run directly (e.g. npm run serve), then it will stay running until the user kills Wireit (e.g. Ctrl-C).

  • If a server script is run indirectly (e.g. npm run script-that-depends-on-server), then the server script will stay running until all scripts which transitively depend on it have finished.

  • In watch mode, Wireit will restart the server whenever a dependency changes. If this isn't required for a particular dependency (such as for static assets that the server does not cache), the dependency edge can be annotated with "restart": false.

{
  "scripts": {
    "build": "wireit"
  },
  "wireit": {
    "serve": {
      "command": "node lib/server.js",
      "server": true,
      "dependencies": [
        "build:server",
        {
          "script": "build:assets",
          "restart": false
        }
      ],
      "files": [],
      "output": []
    }
  }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.