Git Product home page Git Product logo

pacote's Introduction

pacote

Fetches package manifests and tarballs from the npm registry.

USAGE

const pacote = require('pacote')

// get a package manifest
pacote.manifest('[email protected]').then(manifest => console.log('got it', manifest))

// extract a package into a folder
pacote.extract('github:npm/cli', 'some/path', options)
  .then(({from, resolved, integrity}) => {
    console.log('extracted!', from, resolved, integrity)
  })

pacote.tarball('https://server.com/package.tgz').then(data => {
  console.log('got ' + data.length + ' bytes of tarball data')
})

pacote works with any kind of package specifier that npm can install. If you can pass it to the npm CLI, you can pass it to pacote. (In fact, that's exactly what the npm CLI does.)

Anything that you can do with one kind of package, you can do with another.

Data that isn't relevant (like a packument for a tarball) will be simulated.

prepare scripts will be run when generating tarballs from git and directory locations, to simulate what would be published to the registry, so that you get a working package instead of just raw source code that might need to be transpiled.

CLI

This module exports a command line interface that can do most of what is described below. Run pacote -h to learn more.

Pacote - The JavaScript Package Handler, v10.1.1

Usage:

  pacote resolve <spec>
    Resolve a specifier and output the fully resolved target
    Returns integrity and from if '--long' flag is set.

  pacote manifest <spec>
    Fetch a manifest and print to stdout

  pacote packument <spec>
    Fetch a full packument and print to stdout

  pacote tarball <spec> [<filename>]
    Fetch a package tarball and save to <filename>
    If <filename> is missing or '-', the tarball will be streamed to stdout.

  pacote extract <spec> <folder>
    Extract a package to the destination folder.

Configuration values all match the names of configs passed to npm, or
options passed to Pacote.  Additional flags for this executable:

  --long     Print an object from 'resolve', including integrity and spec.
  --json     Print result objects as JSON rather than node's default.
             (This is the default if stdout is not a TTY.)
  --help -h  Print this helpful text.

For example '--cache=/path/to/folder' will use that folder as the cache.

API

The spec refers to any kind of package specifier that npm can install. If you can pass it to the npm CLI, you can pass it to pacote. (In fact, that's exactly what the npm CLI does.)

See below for valid opts values.

  • pacote.resolve(spec, opts) Resolve a specifier like foo@latest or github:user/project all the way to a tarball url, tarball file, or git repo with commit hash.

  • pacote.extract(spec, dest, opts) Extract a package's tarball into a destination folder. Returns a promise that resolves to the {from,resolved,integrity} of the extracted package.

  • pacote.manifest(spec, opts) Fetch (or simulate) a package's manifest (basically, the package.json file, plus a bit of metadata). See below for more on manifests and packuments. Returns a Promise that resolves to the manifest object.

  • pacote.packument(spec, opts) Fetch (or simulate) a package's packument (basically, the top-level package document listing all the manifests that the registry returns). See below for more on manifests and packuments. Returns a Promise that resolves to the packument object.

  • pacote.tarball(spec, opts) Get a package tarball data as a buffer in memory. Returns a Promise that resolves to the tarball data Buffer, with from, resolved, and integrity fields attached.

  • pacote.tarball.file(spec, dest, opts) Save a package tarball data to a file on disk. Returns a Promise that resolves to {from,integrity,resolved} of the fetched tarball.

  • pacote.tarball.stream(spec, streamHandler, opts) Fetch a tarball and make the stream available to the streamHandler function.

    This is mostly an internal function, but it is exposed because it does provide some functionality that may be difficult to achieve otherwise.

    The streamHandler function MUST return a Promise that resolves when the stream (and all associated work) is ended, or rejects if the stream has an error.

    The streamHandler function MAY be called multiple times, as Pacote retries requests in some scenarios, such as cache corruption or retriable network failures.

Options

Options are passed to npm-registry-fetch and cacache, so in addition to these, anything for those modules can be given to pacote as well.

Options object is cloned, and mutated along the way to add integrity, resolved, and other properties, as they are determined.

  • cache Where to store cache entries and temp files. Passed to cacache. Defaults to the same cache directory that npm will use by default, based on platform and environment.
  • where Base folder for resolving relative file: dependencies.
  • resolved Shortcut for looking up resolved values. Should be specified if known.
  • integrity Expected integrity of fetched package tarball. If specified, tarballs with mismatched integrity values will raise an EINTEGRITY error.
  • umask Permission mode mask for extracted files and directories. Defaults to 0o22. See "Extracted File Modes" below.
  • fmode Minimum permission mode for extracted files. Defaults to 0o666. See "Extracted File Modes" below.
  • dmode Minimum permission mode for extracted directories. Defaults to 0o777. See "Extracted File Modes" below.
  • preferOnline Prefer to revalidate cache entries, even when it would not be strictly necessary. Default false.
  • before When picking a manifest from a packument, only consider packages published before the specified date. Default null.
  • defaultTag The default dist-tag to use when choosing a manifest from a packument. Defaults to latest.
  • registry The npm registry to use by default. Defaults to https://registry.npmjs.org/.
  • fullMetadata Fetch the full metadata from the registry for packuments, including information not strictly required for installation (author, description, etc.) Defaults to true when before is set, since the version publish time is part of the extended packument metadata.
  • fullReadJson Use the slower read-package-json package insted of read-package-json-fast in order to include extra fields like "readme" in the manifest. Defaults to false.
  • packumentCache For registry packuments only, you may provide a Map object which will be used to cache packument requests between pacote calls. This allows you to easily avoid hitting the registry multiple times (even just to validate the cache) for a given packument, since it is unlikely to change in the span of a single command.
  • verifySignatures A boolean that will make pacote verify the integrity signature of a manifest, if present. There must be a configured _keys entry in the config that is scoped to the registry the manifest is being fetched from.
  • verifyAttestations A boolean that will make pacote verify Sigstore attestations, if present. There must be a configured _keys entry in the config that is scoped to the registry the manifest is being fetched from.
  • tufCache Where to store metadata/target files when retrieving the package attestation key material via TUF. Defaults to the same cache directory that npm will use by default, based on platform and environment.

Advanced API

Each different type of fetcher is exposed for more advanced usage such as using helper methods from this classes:

  • DirFetcher
  • FileFetcher
  • GitFetcher
  • RegistryFetcher
  • RemoteFetcher

Extracted File Modes

Files are extracted with a mode matching the following formula:

( (tarball entry mode value) | (minimum mode option) ) ~ (umask)

This is in order to prevent unreadable files or unlistable directories from cluttering a project's node_modules folder, even if the package tarball specifies that the file should be inaccessible.

It also prevents files from being group- or world-writable without explicit opt-in by the user, because all file and directory modes are masked against the umask value.

So, a file which is 0o771 in the tarball, using the default fmode of 0o666 and umask of 0o22, will result in a file mode of 0o755:

(0o771 | 0o666) => 0o777
(0o777 ~ 0o22) => 0o755

In almost every case, the defaults are appropriate. To respect exactly what is in the package tarball (even if this makes an unusable system), set both dmode and fmode options to 0. Otherwise, the umask config should be used in most cases where file mode modifications are required, and this functions more or less the same as the umask value in most Unix systems.

Extracted File Ownership

When running as root on Unix systems, all extracted files and folders will have their owning uid and gid values set to match the ownership of the containing folder.

This prevents root-owned files showing up in a project's node_modules folder when a user runs sudo npm install.

Manifests

A manifest is similar to a package.json file. However, it has a few pieces of extra metadata, and sometimes lacks metadata that is inessential to package installation.

In addition to the common package.json fields, manifests include:

  • manifest._resolved The tarball url or file path where the package artifact can be found.

  • manifest._from A normalized form of the spec passed in as an argument.

  • manifest._integrity The integrity value for the package artifact.

  • manifest._id The canonical spec of this package version: name@version.

  • manifest.dist Registry manifests (those included in a packument) have a dist object. Only tarball is required, though at least one of shasum or integrity is almost always present.

    • tarball The url to the associated package artifact. (Copied by Pacote to manifest._resolved.)
    • integrity The integrity SRI string for the artifact. This may not be present for older packages on the npm registry. (Copied by Pacote to manifest._integrity.)
    • shasum Legacy integrity value. Hexadecimal-encoded sha1 hash. (Converted to an SRI string and copied by Pacote to manifest._integrity when dist.integrity is not present.)
    • fileCount Number of files in the tarball.
    • unpackedSize Size on disk of the package when unpacked.
    • signatures Signatures of the shasum. Includes the keyid that correlates to a key from the npm registry

Packuments

A packument is the top-level package document that lists the set of manifests for available versions for a package.

When a packument is fetched with accept: application/vnd.npm.install-v1+json in the HTTP headers, only the most minimum necessary metadata is returned. Additional metadata is returned when fetched with only accept: application/json.

For Pacote's purposes, the following fields are relevant:

  • versions An object where each key is a version, and each value is the manifest for that version.
  • dist-tags An object mapping dist-tags to version numbers. This is how foo@latest gets turned into [email protected].
  • time In the full packument, an object mapping version numbers to publication times, for the opts.before functionality.

Pacote adds the following field, regardless of the accept header:

  • _contentLength The size of the packument.

pacote's People

Contributors

agy avatar alexsey avatar andreeib avatar andreineculau avatar armandocanals avatar bdehamer avatar calebsander avatar colinrotherham avatar dependabot[bot] avatar evocateur avatar feelepxyz avatar fraxken avatar fritzy avatar github-actions[bot] avatar iarna avatar imsnif avatar isaacs avatar jablko avatar jozemlakar avatar jviotti avatar keithamus avatar larsgw avatar lukekarrys avatar nlf avatar raineorshine avatar ruyadorno avatar strugee avatar wraithgar avatar zarenner avatar zkat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pacote's Issues

[QUESTION] pacote and npm-registry-fetch/make-fetch-happen both cache tarballs.

What / Why

I'm debugging tar extract errors that occur when installing packages from a private registry. I'm not sure if the issue is in npm or the server but I've been reading npm code to find out. I noticed that both pacote and make-fetch-happen cache tarball response bodies.

pacote seems to know that because _cacheFetches returns false but _cacheFetches isn't used?

Is this intentional, if so why, if not would you accept a PR to skip make-fetch-happen's cache?


Just in case you are interested here's the error I'm debugging.

npm will sporadically error while extracting package tarballs from our private registry.

npm WARN tar zlib: invalid code lengths set
npm WARN tar TAR_ENTRY_INVALID checksum failure
npm WARN tar TAR_ENTRY_INVALID invalid base256 encoding
npm WARN tarball tarball data for [email protected] (sha512-VHZ8gX+EDfz+97jGcgyGCyRia/dPOd6Xh9yPv8Bl1+SoaIwD+a/vlrOmGRUyOYu7MwUhc7CxqeaDZU13S4+EpA==) seems to be corrupted. Trying again.

Files downloaded from the registry seem fine

curl -sS $(npm info [email protected] dist.tarball) | shasum # correct
curl -sS $(npm info [email protected] dist.tarball) | tar xzv # correct

I wrote pacote's tarball stream to a file and the file is corrupted.

# Added here https://github.com/npm/pacote/blob/main/lib/fetcher.js#L418
tarball.pipe(createWriteStream('/tmp/tarball')

I've been unable to reproduce the issue if I use mitmproxy which makes me think it's a http protocol or timing issue.

npm config set proxy http:localhost:8080

Infer ownership of _all_ unpacked files (like cacache does for cache)

As originally reported in zkat/pacote#175, pacote will leave user-owned files in the global (typically root-owned) install space. This is arguably much less bad than leaving root-owned files in ~, but still not ideal.

Proposal:

  • abstract out the infer-owner.js util from cacache into its own module.
  • Infer ownership of all files that pacote unpacks.
  • Drop the uid or gid option entirely.

Thereafter, installing in a root-owned prefix will produce root-owned files. (Or fail with EACCES if the user doesn't have permission to write there.) Installing in a user-owned folder will produce user-owned files. No config juggling or baton-passing required. (Well, Pacote will have to infer the ownership and then pass the uid/gid configs to tar, but that seems pretty reasonable.)

[BUG] GitHub org/name accesses fail

Direct accesses to GitHub by short 'org/name' fail.

$ pacote -h
Pacote - The JavaScript Package Handler, v10.1.3
...
$ pacote manifest npm/cli
Error: failed '/usr/bin/git ls-remote git+ssh://[email protected]/npm/cli.git'
    at ChildProcess.<anonymous> (/Users/ilg/.nvm/versions/node/v12.13.0/lib/node_modules/pacote/lib/util/spawn.js:21:43)
    at ChildProcess.emit (events.js:210:5)
    at maybeClose (internal/child_process.js:1021:16)
    at Socket.<anonymous> (internal/child_process.js:430:11)
    at Socket.emit (events.js:210:5)
    at Pipe.<anonymous> (net.js:658:12) {
  cmd: '/usr/bin/git',
  args: [ 'ls-remote', 'git+ssh://[email protected]/npm/cli.git' ],
  code: 128,
  signal: null,
  stdout: '',
  stderr: '[email protected]: Permission denied (publickey).\r\n' +
    'fatal: Could not read from remote repository.\n' +
    '\n' +
    'Please make sure you have the correct access rights\n' +
    'and the repository exists.\n'
}
$ pacote manifest github:npm/cli
Error: failed '/usr/bin/git ls-remote git+ssh://[email protected]/npm/cli.git'
    at ChildProcess.<anonymous> (/Users/ilg/.nvm/versions/node/v12.13.0/lib/node_modules/pacote/lib/util/spawn.js:21:43)
    at ChildProcess.emit (events.js:210:5)
    at maybeClose (internal/child_process.js:1021:16)
    at Socket.<anonymous> (internal/child_process.js:430:11)
    at Socket.emit (events.js:210:5)
    at Pipe.<anonymous> (net.js:658:12) {
  cmd: '/usr/bin/git',
  args: [ 'ls-remote', 'git+ssh://[email protected]/npm/cli.git' ],
  code: 128,
  signal: null,
  stdout: '',
  stderr: '[email protected]: Permission denied (publickey).\r\n' +
    'fatal: Could not read from remote repository.\n' +
    '\n' +
    'Please make sure you have the correct access rights\n' +
    'and the repository exists.\n'
}
$ pacote resolve npm/cli
Error: failed '/usr/bin/git ls-remote git+ssh://[email protected]/npm/cli.git'
    at ChildProcess.<anonymous> (/Users/ilg/.nvm/versions/node/v12.13.0/lib/node_modules/pacote/lib/util/spawn.js:21:43)
    at ChildProcess.emit (events.js:210:5)
    at maybeClose (internal/child_process.js:1021:16)
    at Socket.<anonymous> (internal/child_process.js:430:11)
    at Socket.emit (events.js:210:5)
    at Pipe.<anonymous> (net.js:658:12) {
  cmd: '/usr/bin/git',
  args: [ 'ls-remote', 'git+ssh://[email protected]/npm/cli.git' ],
  code: 128,
  signal: null,
  stdout: '',
  stderr: '[email protected]: Permission denied (publickey).\r\n' +
    'fatal: Could not read from remote repository.\n' +
    '\n' +
    'Please make sure you have the correct access rights\n' +
    'and the repository exists.\n'
}
$ pacote extract npm/cli ccc
Error: failed '/usr/bin/git ls-remote git+ssh://[email protected]/npm/cli.git'
    at ChildProcess.<anonymous> (/Users/ilg/.nvm/versions/node/v12.13.0/lib/node_modules/pacote/lib/util/spawn.js:21:43)
    at ChildProcess.emit (events.js:210:5)
    at maybeClose (internal/child_process.js:1021:16)
    at Socket.<anonymous> (internal/child_process.js:430:11)
    at Socket.emit (events.js:210:5)
    at Pipe.<anonymous> (net.js:658:12) {
  cmd: '/usr/bin/git',
  args: [ 'ls-remote', 'git+ssh://[email protected]/npm/cli.git' ],
  code: 128,
  signal: null,
  stdout: '',
  stderr: '[email protected]: Permission denied (publickey).\r\n' +
    'fatal: Could not read from remote repository.\n' +
    '\n' +
    'Please make sure you have the correct access rights\n' +
    'and the repository exists.\n'
}

Access to the same repo with full URL seems ok for getting the manifest:

$ pacote manifest https://github.com/npm/cli
{
  version: '6.13.1',
  name: 'npm',
  description: 'a package manager for JavaScript',
  ...
  readmeFilename: 'README.md',
  gitHead: 'b829d62c98506325d2afb2d85d191a8ff1c49157',
  _id: '[email protected]',
  _integrity: null,
  _resolved: 'git+ssh://[email protected]/npm/cli.git#b829d62c98506325d2afb2d85d191a8ff1c49157',
  _from: 'github:npm/cli'
}

... but fails with the same error when trying to extract:

$ pacote extract https://github.com/npm/cli ccc
Error: failed '/usr/bin/git ls-remote git+ssh://[email protected]/npm/cli.git'
    at ChildProcess.<anonymous> (/Users/ilg/.nvm/versions/node/v12.13.0/lib/node_modules/pacote/lib/util/spawn.js:21:43)
    at ChildProcess.emit (events.js:210:5)
    at maybeClose (internal/child_process.js:1021:16)
    at Socket.<anonymous> (internal/child_process.js:430:11)
    at Socket.emit (events.js:210:5)
    at Pipe.<anonymous> (net.js:658:12) {
  cmd: '/usr/bin/git',
  args: [ 'ls-remote', 'git+ssh://[email protected]/npm/cli.git' ],
  code: 128,
  signal: null,
  stdout: '',
  stderr: '[email protected]: Permission denied (publickey).\r\n' +
    'fatal: Could not read from remote repository.\n' +
    '\n' +
    'Please make sure you have the correct access rights\n' +
    'and the repository exists.\n'
}

I wouldn't be surprised to find out that GitHub changed something in their configurations, but anyway it would be good to clarify the issue.

Thank you,

Liviu

`before` doesn't work without `fullMetadata`

The undocumented fullMetadata option is required to fetch times, and without this the before flag appears to not work.

Repro:

$ npm view safe-write-stream time 
{
  ...
  '1.0.4': '2016-02-07T00:21:03.628Z', <=== Expected when specifying before: 2017-01-01
  '1.0.5': '2017-02-09T03:26:22.308Z'
}

fullMetadata: undefined
node -e "require('pacote').manifest('safe-write-stream@^1.0.4', { before: new Date('2017-01-01') }).then(r => console.log(r.version));"
Result: 1.0.5 โŒ

fullMetadata: false
node -e "require('pacote').manifest('safe-write-stream@^1.0.4', { fullMetadata: true, before: new Date('2017-01-01') }).then(r => console.log(r.version));"
Result: 1.0.4 โœ”

[BUG] FetchError: network timeout

When I call pacote.packument multiple times each with a different package name in parallel, resulting in multiple requests to the npm registry, if the number of calls was big enough a timeout error will eventually happen, this is expected since issuing a lot of requests is problematic.

The problem is that we see this timeout error even if we issue the requests in batches as in this code example:

const pacote = require('pacote');
const fs = require('fs');
const pMap = require('p-map');
fs.readFile('/home/abd/ุฃุญู…ุฏ/package.json',{encoding: 'utf-8'},(err,data) => {
    const packageJson = JSON.parse(data);
    // you can replace devDependencies with dependencies so long the number of packages
    // is big
    pMap(Object.keys(packageJson.devDependencies),(packageName) => { 
        return pacote.packument(packageName).then((result) => {
            console.log(result);
        });
    },{concurrency: 8});
});

p-map is this package, the above limits the requests concurrency to 8.

If you try the above code with package.json inside this zip, it will throw a timeout error, you can see my analysis on this issue in this comment.

The problem in summary: pacote.packument seems not designed for concurrent calls.

related: raineorshine/npm-check-updates#634

[BUG] pacote.extract errors out for the wrong reason

What / Why

If you give pacote a URL that is not a tarball, it complains that the destination directory does not contain a package.json file.

This is not the error that happened.

When

  • always

Where

  • n/a

How

Current Behavior

throws:

{
"errno": -2,
"code": "ENOENT",
"syscall": "open",
"path": "/tmp/foo/bar/package.json"
}

Steps to Reproduce

try { let example = await pacote.extract("http://www.example.com", "/tmp/foo/bar", {}); } catch (err) { console.log(err); }

Expected Behavior

How about "unable to extract http://www.example.com"?

Who

  • n/a

References

  • n/a

[BUG] subdependency move-file only supports node >= 10.17

What / Why

I can't upgrade a package depending of this package since I was using the same version of node (1.16.1) that the server uses, but it just throws an error that is hard to trace because incompatibility is in a subdependency.

The depedency path is pacote > cacache > move-file

When

When I run install in server

Where

  • n/a

How

Current Behavior

yarn throws error that move-file is not compatible

Steps to Reproduce

just run

nvm use 10.16
yarn install

Expected Behavior

if cacache is really needed, then update node engine constraint to >= 10.17

References

Related to raineorshine/npm-check-updates#651
Broken by sindresorhus/move-file#8

Version 15.0.1

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I am getting the following error with [email protected].

npm ERR! code ENOENT
npm ERR! syscall chmod
npm ERR! path /Users/xxxx/.npm/_npx/67793/lib/node_modules/pacote/node_modules/@npmcli/installed-package-contents/index.js
npm ERR! errno -2
npm ERR! enoent ENOENT: no such file or directory, chmod '/Users/xxxx/.npm/_npx/67793/lib/node_modules/pacote/node_modules/@npmcli/installed-package-contents/index.js'
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent

npm ERR! A complete log of this run can be found in:
npm ERR! /Users/xxxx/.npm/_logs/2022-10-18T04_25_34_732Z-debug.log
Install for [ 'pacote@latest' ] failed with code 254

Expected Behavior

[email protected] works fine.

Steps To Reproduce

  1. Run "npx pacote"

Environment

  • npm: 6.14.16
  • Node: 14.19.0
  • OS: macOS 12.4
  • platform: Macbook Pro

[QUESTION] Reading relative dependecies

What / Why

When tracking down the cause of npm/cli#1756, I found that pacote tries to read the package.json file of relative dependency (aka: local dependency).
Given a package main with a package.json as follow:

{
	"name": "main",
	"dependencies": {
		"child": "file:./packages/child"
	}
}

In this case, reading node_modules/main/packages/child/package.json when installing main is bound to fail given that main is not yet installed.
One workaround could be to install such dependencies in a "postinstall" script. This feels a bit hacky however.

Am I doing things wrong or is it a genuine issue ?

Where

Wherever

Who

npm install gmartigny-local-dep

References

[BREAKING] remove log property

We are moving towards all modules using proc-log without the ability to customize the logger.

  • Remove log from fetcher.js
  • Remove docs for log

[BUG] withTarballStream's retry can clobber a package as it is extracted.

What / Why

Using npm 6.14.14 I noticed npm WARN tar ENOENT: no such file or directory errors followed by ENOENT /package.json.

with-tarball-stream will call streamHandler twice when the tarball is not in the cache. In tryDigest and trySpec. tryExtract, the streamHandler, may reject while starting a rimraf promise

trySpec will be racing tryDigest's rimraf and it will extract incorrectly if rimraf removes a directory after it is created by trySpec's tryExtract.

When

I'm able to reproduce this consistently with a private package. I'll work on a public reproduction but this is a time based bug so I'm not sure that I can.

npm cache clean --force && npm ci
npm -v
6.14.14

Normally we see this order of operations:

# tryDigest
tryExtract ./here start
0 rimraf ./here start
0 rimraf ./here done
tar x
tryExtract ./here err ENOENT: no such file or directory, lstat 'cache/content-v2/sha512/0e/a4/208a1690f5c52bb24268f642325bffa0d15c6c93c703b59e766b734687460d4dfa54480ad74a88725069ef6cf6b310f2eedfdf02d14c662923ab3ee994cc'
# trySpec
tryExtract ./here start
1 rimraf ./here start
1 rimraf ./here done
tar x
tryExtract ./here done

But it's possible to get this order:

# tryDigest
tryExtract ./here start
0 rimraf ./here start
tryExtract ./here err ENOENT: no such file or directory, lstat 'cache/content-v2/sha512/0e/a4/208a1690f5c52bb24268f642325bffa0d15c6c93c703b59e766b734687460d4dfa54480ad74a88725069ef6cf6b310f2eedfdf02d14c662923ab3ee994cc'

# trySpec
tryExtract ./here start
1 rimraf ./here start
1 rimraf ./here done # trySpec's rimraf
tar x

tryExtract ./here done

 # tryDigest's rimraf was running the whole time!!!
0 rimraf ./here done

[BUG] pacote ignores premature end of HTTP request.

What / Why

There's a failure mode I've seen both with npm install, and an internal tool that uses pacote directly.

If the requests terminates prematurely, pacote tries to unpack the file anyway instead of aborting.

When

Intermittently, but especially when talking to Artifactory.

Where

npm private repository running in Artifactory
client running npm 6.9.0, but also reproduced with other versions

How

Current Behavior

In npm, this shows up as parse errors trying to read package.json files in the npm cache, and finding EOS. This is especially bad because now the npm cache is poisoned. Occasionally it shows up as hash failures, but that seems to happen less often of late.

In our tool, it showed up as premature end of tarball, until I added integrity checking and some logging. The sha is wrong, and you can see that the bytes transferred is a random fraction of the actual payload for the same url on successful runs.

Steps to Reproduce

I'm not sure I have one, but it seems to come in clusters. I suspect that the Artifactory machine is oversubscribed at these times. Or proxy server shenanigans.

Expected Behavior

pacote should throw an error and npm should abort with that error.
pacote should call the extract() callback with an error about the http request terminating, rather than trying to extract the file anyway (which is likely the cause of the npm error)

Who

  • n/a

References

  • n/a

[BUG] Unexpected end of file.

What / Why

Truncated download is resulting in a zlib error

When

  • slow network connection results in an incomplete fetch

Where

  • using private artifactory server over bad VPN connection

How

Current Behavior

  • we are using pacote.extract with the integrity value passed into the options

Steps to Reproduce

  • unsure. perhaps disconnecting network connecton?

Expected Behavior

  • expect pacote to abort with failed integrity check error rather than attempting to unpack a truncated file.

Who

  • n/a

References

  • n/a

Please fix deprecated dependencies

Installing a package which depends on [email protected] triggers warnings:

ilg@wks ~ % npm install xpm@next
npm WARN deprecated [email protected]: this library is no longer supported
npm WARN deprecated [email protected]: request has been deprecated, see https://github.com/request/request/issues/3142

These two deprecated dependencies are also used by tap:

ilg@wks ~ % npx npm-remote-ls pacote | grep har-validator
npx: installed 112 in 6.476s
   โ”‚  โ”‚     โ”œโ”€ [email protected]
ilg@wks ~ % npx npm-remote-ls pacote | grep request      
npx: installed 112 in 7.22s
   โ”‚  โ”‚  โ”œโ”€ [email protected]
      โ”‚  โ”œโ”€ [email protected]
ilg@wks ~ % 

Could you fix those dependencies?

BTW, I don't think that it is very useful for npm to complain about deprecated indirect dependencies, since there is not much the author of the top package can do to fix them.

[BUG] pacote.extract resolves to the wrong value

What / Why

pacote.extract is documented as returning a Promise<Object>, instead it is returning a Promise<boolean>

When

  • always

Where

  • private registry

How

Current Behavior

  • returns false

Steps to Reproduce

let reply = await pacote.extract(src, tmpDir.path, args.options);

console.log(reply);

output:
false

Expected Behavior

  • as documented in the readme, this should return an Object

Who

  • n/a

References

  • n/a

[BUG] Some git commands are executed under destination directory's owner account

What / Why

It might so happen that root installs packages into a directory owned by another user. Consider a case where you launch an app with docker-compose and bind mount the host directory with source code (.) into the container (/app) for changes to source code to automatically propagate into the container. Under this circumstances we get root user and /app owned by uid 1000. During fetch-package-metadata phase it succeeds, since nothing says it has to impersonate:

[email protected] started to run commands under cwd's owner. But some files are to be created in ~/.npm/_cacache/tmp (e.g. cloning repositories when installing from github). When a process (npm) user doesn't match the cwd user, process user's cache is still used. So git clone is executed under cwd user to clone to the process user's tmp dir, which apparently fails. That can happen under docker when a non-root's directory is bind-mounted into a container. It's not owned by root in the container either, but the processes in the container are running under root (unless explicitly started as another user).

The issue doesn't reveal itself on the fetch-package-metadata phase:

https://github.com/npm/cli/blob/v6.12.1/lib/fetch-package-metadata.js#L59-L65
https://github.com/npm/pacote/blob/v9.5.8/manifest.js#L25
https://github.com/npm/pacote/blob/v9.5.8/lib/finalize-manifest.js#L49
https://github.com/npm/pacote/blob/v9.5.8/lib/finalize-manifest.js#L154
https://github.com/npm/pacote/blob/v9.5.8/lib/fetch.js#L33
https://github.com/npm/pacote/blob/v9.5.8/lib/fetchers/git.js#L71-L73
https://github.com/npm/pacote/blob/v9.5.8/lib/fetchers/git.js#L176

But on the extract phase it switches to non-root and fails:

https://github.com/npm/cli/blob/v6.12.1/lib/install/action/extract.js#L90
https://github.com/npm/pacote/blob/v9.5.8/extract.js#L42
https://github.com/npm/pacote/blob/v9.5.8/lib/with-tarball-stream.js#L96
https://github.com/npm/pacote/blob/v9.5.8/lib/fetch.js#L28
https://github.com/npm/pacote/blob/v9.5.8/lib/fetchers/git.js#L44
https://github.com/npm/pacote/blob/v9.5.8/lib/fetchers/git.js#L71-L73
https://github.com/npm/pacote/blob/v9.5.8/lib/fetchers/git.js#L176

When

  • When installing packages from GitHub.

Where

  • n/a

How

Current Behavior

  • An error occurs:
npm ERR! code 128
npm ERR! Command failed: git clone --mirror -q git://github.com/kevva/is-positive.git /root/.npm/_cacache/tmp/git-clone-d80b8730/.git
npm ERR! fatal: could not create leading directories of '/root/.npm/_cacache/tmp/git-clone-d80b8730/.git'

Steps to Reproduce

Under non-root user:

1.sh:

#!/bin/sh
set -eux
npm --version
apk add git
npm i kevva/is-positive || cat /root/.npm/_logs/*.log
$ docker run --rm -itv $PWD:/app -w /app node:10.17.0-alpine3.10 ./1.sh
+ npm --version
6.11.3
...
+ npm i kevva/is-positive
...
npm ERR! Command failed: git clone --mirror -q git://github.com/kevva/is-positive.git /root/.npm/_cac
ache/tmp/git-clone-87fa5e82/.git
npm ERR! fatal: could not create leading directories of '/root/.npm/_cacache/tmp/git-clone-87fa5e82/.
git'
...

Or under root in a non-root dir:

$ npm i kevva/is-positive

Expected Behavior

  • It either succeeds in both cases (extract, fetch-package-metadata), or fails. Preferably the former.

Who

  • n/a

References

  • Related to #2

More info

Introduced in npm-6.11.0, pacote-9.5.5. Fixed in npm-6.13.6, pacote-9.5.12.
Affects node@^10.17.0 (npm-6.11.3), node >= 12.11.0 (npm-6.11.3), node < 13.7.0 (npm-6.13.6). More on it here.

Old steps to reproduce

Create files based on the following gist and do docker-compose up:

docker-compose.yml:

version: '3'

services:
  app:
    build: .
    # entrypoint: sleep 10000000
    # ports:
      # - 9229:9229
    volumes:
      - ./:/app

Dockerfile:

FROM node:12.13.0-alpine

RUN apk add git vim
WORKDIR /app

ENTRYPOINT ["./entrypoint.sh"]

entrypoint.sh:

#!/bin/sh
set -eu
npm i || true
cat /root/.npm/_logs/*.log
$ chmod u+x entrypoint.sh
$ docker-compose up

As you might guess the issue was discovered in a docker container. Alternatively,

# useradd -m u1
# cd /home/u1
# echo '{"dependencies": {"is-positive": "kevva/is-positive"}}' > package.json
# npm i

[BUG] npmInstallCommand is incorrect

The default npmInstallCommand on https://github.com/npm/pacote/blob/latest/lib/fetcher.js#L98-L105 was copied from pacote 9, and may have made sense for npm v6. However, it's incorrect for npm v7 (and arguably was incorrect in npm v6 as well).

    this.npmInstallCmd = opts.npmInstallCmd || [
      'install',
      '--only=dev', // only install dev, not actually supported on v7, which is a bug
      '--prod', // only install prod, this overrides the --only=dev anyway, so no devs get installed!
      '--ignore-prepublish', // not a config we support in v7, but probably should? Also, maybe not a good idea here!
      '--no-progress',
      '--no-save',
      // should probably add --no-audit, why do the extra work?
    ]

Re: npm/cli#1865

Integrity check error for remote tarballs

What / Why

I have an application rest endpoint that generates npm module tgz file upon request. When I add the remote url (http://192.168.65.2:8888/package.tgz) to package.json, npm install fails with integrity mismatch error. If I download the same tgz file and then do npm install package.tgz, everything works fine. I can confirm that pacote returns different sha512 for remote and local tarball!

npx pacote manifest package.tgz
{
  _integrity: 'sha512-kDkAIo0omC3odUie2nW9RLt8zOzr902rSB5zikea+715OI5a+QLuv4sguvI8+k5O0cIvxUX19/d5ypaPm3MRng==',
}

npx pacote manifest http://192.168.65.2:8888/package.tgz
{
  _integrity: 'sha512-1D+DlFn2yuHBT2hV7upaoH4MLLUotASmT0CItavWpUphRT+llBQ3cpjlW/lI9koyFNINbuirC24ofsQruEfRDg==',
}

This only happens with npm 6.x. npm 7.x seems fine.

Do I need to include any headers to the rest response?

pacote version: 9.5.12

[BUG] out of memory on npm install: fork bomb preparing from git repos if they have scripts

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

npm install a package with dependencies from git.

If the package manifest from a git dependency has a relevant script for prepare (pre/post/install, prepack, prepare, or build), then npm install will be forked to install the dependency to a directory.

Because these dependencies often themselves contain similar dependencies from git, a "fork bomb" might occur where the number of concurrent npm install processes rapidly grows, usually hanging up on dependencies that may simply just take a while to download (not necessarily from git, e.g. the @angular/core library), but are dependencies used by many of the packages stored in git repos.

Anytime enough npm install processes are forked and an out of memory condition is reached, the computer generally takes much longer to finish the initial npm install as the computer is too busy swapping data between memory and swap space. CTRL+C in the command line terminal does nothing, and generally a killall to stop all npm installs is required to gain back control of the computer.

Expected Behavior

Avoid the "fork bomb", perhaps by not launching all the npm installs from git repos in parallel, so that large dependencies already extracted into directories do not get npm installed multiple times concurrently.

The propagated environment variable _PACOTE_NO_PREPARE_ in https://github.com/npm/pacote/blob/main/lib/git.js#L181 prevents circular dependencies from causing infinite "fork bombs", but practically infinite lasting "fork bombs" can still be achieved with just a handful of dependencies from git that have a reasonable number of dependencies on larger libraries.

A common scenario where this bug is encountered is having a bunch of packages for Angular components that depend on each other, all stored in and retrieved from git repos. Not only do they have devDependencies with each other if one component inherits from another, they will all share a similar set of core devDependencies such as @angular/core. All of them will likely have "build" scripts in their package.json.

Possible workaround of the problem at the moment:

Since https://github.com/npm/pacote/blob/main/lib/git.js#L164 explicitly allows us to bypass the prepare step and avoid forking npm install for packages from git repos, I have gone through all my libraries stored in and retrieved from git repos and ensured their package.json manifests do not have any build, postinstall, etc. scripts. I essentially renamed build to make, and refactored away a lot of the pre/post install scripts to avoid relying on those in the first place.

This workaround has dramatically changed the npm install times for my larger projects from hours (not an exaggeration) to a minute or two. Ideally, it would be nice to retain the build and postinstall scripts and have them participate in the normal npm lifecycles.

Steps To Reproduce

  1. In package.json, create dependencies/devDependencies to packages from git (git+https://github.com/npm/pacote.git, e.g.).
  2. To fully experience the "fork bomb", the git dependencies from Step 1 ideally should have dependencies on many large libraries (e.g. @angular/core, @material-icons) and perhaps dependencies other packages from git that also fit the description of this step.
  3. npm install the initial package.json from Step 1.
  4. View active processes using top or similar system monitoring software. Note the explosion of npm install processes and the amount of CPU and memory they hog.

Environment

  • npm: 8.5.2
  • Node: 16.13.1
  • OS: Ubuntu 21.10

[FEATURE] npm authentication

What / Why

We use pacote to check the latest tag version on NPM to let people know if they are running an old version of our packages (https://www.grouparoo.com/docs/support/upgrading-grouparoo#determining-if-there-are-updates). Everything works fine for public NPM packages, but we cannot check on private packages. It would be great if there was a way to use local or user-level NPM authentication tokens from .npmrc files with pacote to check on these private packages.

When

Every time the manifest for a private package is checked

Where

Both programmatically and on the CLI:

# public package
pacote manifest @grouparoo/core | jq .version
"0.2.12"
# Private package (it's ok, we announce this package exists)
pacote manifest @grouparoo/ui-enterprise | jq .version
HttpErrorGeneral: 404 Not Found - GET https://registry.npmjs.org/@grouparoo%2fui-enterprise - Not found

How

...

Current Behavior

404'd

Expected Behavior

Maybe something like this:

import pacote from "pacote";

  const manifest: { name: string; version: string } = await pacote.manifest(
    `${plugin.name}@${tag}`, 
     { _authToken: 'abc123' }
  );

Who

Everyone!

References

nope.

Have [_tarballFromResolved] return a Promise<Stream> rather than a Stream

Right now the [_tarballFromResolved] methods on all the Fetcher subclasses return a Minipass stream. Then, all of them do some kind of async action, and pipe into that Minipass stream once they have data.

One impact of this is that pacote.tarball.file('file:.', './dir.tgz') will include an empty dir.tgz file in the tarball, because it starts piping to the target location before the npm-packlist command runs and creates the list of files to add in the tar.c() stream.

The change to lib/fetcher.js is small, but of course then all the other methods that implement [_tarballFromResolved] will have to be updated to return a Promise instead of a stream directly.

Here's a quick and dirty untested patch to make the fetchers all switch over to returning a Promise rather than a stream. Doesn't look too bad, and while it makes the code calling _tarballFromResolved slightly more complicated, it really reduces the complexity of the implementations.

https://gist.github.com/isaacs/2650b74c43637a5892d945cfc551d2d0

cc: @claudiahdz

[FEATURE] bin: don't JSON.stringify by default when return value isn't an object

What / Why

It's annoying to have to pass --no-json if you are doing something like:

curl $(pacote resolve pkgname) | tar tv

There's no need to automatically JSON.stringify on non-tty output unless it's an object. The more expected pattern is to just pass strings on through as-is.

Instead of defaulting --json to !process.stdout.isTTY, leave it as undefined, and only console.log json if it's set explicitly to true, or set to undefined and stdout is not a TTY and the result is an object.

[BUG] Install via Git fails when running with sudo (no access to SSH_AUTH_SOCK)

What / Why

(Sorry for any bad assumptions i'm making here, i know very little about NPM or pacote)

When pacote detects that it's running as root, but the directory it's trying to clone into is owned by another user, it runs git with that user's UID. This seems to work as far as making the permissions on the clone consistent, but, when using Git+SSH, the privilege-dropped OpenSSH process is then prevented from accessing the agent auth socket (because it's still owned by root). This causes git to fail.

The obvious work-around (besides not running NPM as root in the first place, which is of course my long-term goal) is to pass -H or -i to sudo, which should avoid the privilege drop in most cases. And, if reconciling the ownerships of the clone directory and auth socket is too irritating, maybe that should be the 'official' solution โ€” i think pip has a similar requirement.

But one big difference between pip and pacote is that pip actually tells you what's wrong and how to fix it โ€” without spend a whole bunch of time troubleshooting it, the pacote issue simply looks like the Git clone is failing for no reason at all.

When

Always, in this configuration/scenario:

  • Ubuntu 20.04 Focal
  • npm 6.14.4
  • pacote 9.5.12
  • npm running as root
  • Clone directory not owned by root (e.g. using sudo without -i or -H)
  • Installing package via Git+SSH
  • SSH agent needed for auth (i.e. OpenSSH can't just fall back to a default key)

Where

Using a private repository in this case, but i assume this can occur any time pacote deals with Git via SSH

How

Current Behavior

git commands fail in the above scenario, with no good explanation as to why

Steps to Reproduce

% mkdir /tmp/pacote-bug && cd /tmp/pacote-bug
% sudo sh -c '
  eval "$(ssh-agent)";
  ssh-add -q /path/to/necessary/key;
  ls -ld -- "$SSH_AUTH_SOCK";
  GIT_SSH_COMMAND="id >&2; ssh -v" npm install "git+ssh://[email protected]/foo/bar.git"
'
Agent pid 1456457
srw------- 1 root root 0 Jun 17 01:50 /tmp/ssh-W9IypxzdTXFs/agent.1456456
npm ERR! code 128
npm ERR! Command failed: git clone --mirror -q ssh://[email protected]/foo/bar.git /home/dana/.npm/_cacache/tmp/git-clone-0b3c15e8/.git
npm ERR! warning: templates not found in /tmp/pacote-git-template-tmp/git-clone-576ad45a
npm ERR! uid=1001(dana) gid=1001(dana) groups=1001(dana)
...
npm ERR! debug1: pubkey_prepare: ssh_get_authentication_socket: Permission denied
...
npm ERR! [email protected]: Permission denied (publickey).
npm ERR! fatal: Could not read from remote repository.
npm ERR! 
npm ERR! Please make sure you have the correct access rights
npm ERR! and the repository exists.
...

(The clone succeeds if i change cwdOwner() and mkOpts() in git.js so that they don't try to de-escalate)

Expected Behavior

imo, pacote should either:

  • figure out how to perform the clone without breaking the SSH agent, or
  • alert the user that the way they're running it may cause problems, and ideally explain what to do instead

Who

Me!

References

Extract: rimraf dir contents, not dir itself

Right now, Fetcher.extract calls Fetcher[_mkdir] which removes the dir before creating it.

This code was copied from pacote 9, and ostensibly ensures that the directory is empty before dumping package contents into it.

However, this is a problem if a consumer wants to extract a new version of a package over an existing node in the tree.

Here's a little script that'll tell pacote what to rimraf, if run with depth: 1. Roll this into a new module and use it. (@npmcli/arborist may need this for package reification as well.) https://gist.github.com/isaacs/262f31fd9f37b25f27e4f74e20e2c491

[BUG] pacote does not add to cache

Basically just a rehash of npm/cli#2160, npm is expecting pacote to add to cache & pacote has stopped adding to the cache since the big v10 rewrite

What / Why

Pacote fetcher does not add to the cache anymore! npm is expecting this. is npm wrong? Or is pacote wrong?

When

$ npm cache add yn-3.1.1.tgz --cache=./npm-cache
$ ls npm-cache
ls: cannot access 'npm-cache': No such file or directory

Where

Locally, on my machine (no registry, just tarballs)

I appreciate you taking a look at this, if the answer is just that the npm cache command should add to the cache (which does make sense), then great, would just like to verify that that is not under the purview of pacote

Raise an error early if pacote.extract doesn't get a folder to extract to

This is not so pretty

AssertionError [ERR_ASSERTION]: rimraf: missing path
    at rimraf (/Users/isaacs/dev/npm/pacote/node_modules/rimraf/rimraf.js:50:3)
    at tryCatcher (/Users/isaacs/dev/npm/pacote/node_modules/bluebird/js/release/util.js:16:23)
    at ret (eval at makeNodePromisifiedEval (/Users/isaacs/dev/npm/pacote/node_modules/bluebird/js/release/promisify.js:184:12), <anonymous>:14:23)
    at BB.catch.err.message (/Users/isaacs/dev/npm/pacote/extract.js:53:5)
    at Promise._execute (/Users/isaacs/dev/npm/pacote/node_modules/bluebird/js/release/debuggability.js:313:9)
    at Promise._resolveFromExecutor (/Users/isaacs/dev/npm/pacote/node_modules/bluebird/js/release/promise.js:483:18)
    at new Promise (/Users/isaacs/dev/npm/pacote/node_modules/bluebird/js/release/promise.js:79:10)
    at tryExtract (/Users/isaacs/dev/npm/pacote/extract.js:50:10)
    at /Users/isaacs/dev/npm/pacote/extract.js:33:12
    at /Users/isaacs/dev/npm/pacote/lib/with-tarball-stream.js:108:31
    at tryCatcher (/Users/isaacs/dev/npm/pacote/node_modules/bluebird/js/release/util.js:16:23)
    at Function.Promise.attempt.Promise.try (/Users/isaacs/dev/npm/pacote/node_modules/bluebird/js/release/method.js:39:29)
    at BB.resolve.retry.retries (/Users/isaacs/dev/npm/pacote/lib/with-tarball-stream.js:108:24)
    at /Users/isaacs/dev/npm/pacote/node_modules/promise-retry/index.js:29:24 {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: undefined,
  expected: true,
  operator: '=='
}

[BUG] Inconsistent integrity checksums for Github repositories when switching between machines

What / Why

I am running into problems installing packages from Github when those packages contain other Github packages in their package-lock.json files and when those repositories were uploaded from "different" computers.

My understanding of npm install is that it uses pacote to fetch the package and create an integrity checksum that is stored in package-lock.json. When a project commits it package-lock.json file and someone else checks out that project and runs "npm install" and a package is downloaded from Github, it should compute the same checksum when downloading the same revision. But that's not happening in my case, and as such "npm install" fails with "ERR! code EINTEGRITY ... integrity checksum failed ..."

How

Current Behavior

Running "npx pacote tarball" returns different checksums for the same Github repository on different machines.

Steps to Reproduce

git clone [email protected]:skress/test-pacote
cd test-pacote
npm i
npx pacote tarball "github:skress/test-pacote-data" xx.tar.gz

Running this on two Intel-based Macs (Node v15.6.0, npm 7.4.0, pacote 11.2.3) returns the same result.
Running this on an Apple Silicon-based Mac (same versions) returns a different result.
Running this in an Ubuntu Docker container (Node v10.19.0, npm 6.14.4, pacote 11.2.3) returns the same results as on the Apple Silicon machine.

When disabling gzip in dir.js all machines return the same checksum.

Expected Behavior

npm install / npx pacote tarball should create the same checksum regardless of the OS/architecture of the computer being used.

[BUG] SELF_SIGNED_CERT_IN_CHAIN in 11.0+

I'm coming from an npm-check-updates issue
Npm-check-updates uses pacote.packument and after they updated pacote to 11.1 I and others started getting
[===============-----] 27/36 75%C:\Repos\npm-check-updates-master\lib\npm-check-updates.js:427
throw err
^
FetchError: request to https://npm.fontawesome.com/@fortawesome%2ffontawesome-pro failed, reason: self signed certificate in certificate chain
at ClientRequest. (C:\Repos\npm-check-updates-master\node_modules\minipass-fetch\lib\index.js:97:14)
at ClientRequest.emit (events.js:310:20)
at TLSSocket.socketErrorListener (_http_client.js:426:9)
at TLSSocket.emit (events.js:322:22)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
code: 'SELF_SIGNED_CERT_IN_CHAIN',
errno: 'SELF_SIGNED_CERT_IN_CHAIN',
type: 'system'
}
when trying to use it. The error seems to only occur for NPM packages that require auth tokens (FontAwesome Pro in my case).

[BUG] config field gets overwritten in package.json

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When extracting to a folder where the same (but older version) package exists, it does not respect the npm package constraint that config should not be touched by the package manager.

Expected Behavior

When extracting to a folder where the same (but older version) package exists, it should respect the npm package constraint that config should not be touched by the package manager.

Steps To Reproduce

  1. In windows
  2. Have an existing package of your choice, change it's package.json version to be lesser than the published version, if there is no config field in package.json add one eg "config": "dontTouchMe":true
  3. Run packote.extract in the same directory
  4. "config" will be wiped out

Environment

  • npm: 7.20.3
  • Node: 16.6.2
  • OS: Windows 10
  • platform: MSI Laptop

[BUG] Proxy settings appear to be ignored.

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Proxy settings are ignored. Requests assume that a proxy is not required.

Expected Behavior

Proxy settings should be honoured.

Steps To Reproduce

  1. Run pacote.packument("any-name", {proxy: "http://localhost:8080"})
  2. Requests made should fail (unless you happen to have a proxy set up at that address)

(Note, bug originally raised here: raineorshine/npm-check-updates#1069)

Environment

  • npm: 8.5.4
  • Node: 17.7.1
  • OS: Windows 11
  • platform: Windows

[BUG] .gitignore renamed to .npmignore

What / Why

When calling pacote.extract() with a package tarball that contains a .gitignore, it is renamed to .npmignore. This is problematic if the package contains something such as a project template where the .gitignore does not apply to npm.

Where

The .gitignore file is being renamed in fetcher.js in the filter() callback defined in _tarxOptions.

One would believe simply adding a .npmignore to a package would prevent the .gitignore from being renamed. However, on my machine, .gitignore is ordered and extracted before .npmignore which means sawIgnores will always be empty and thus the .gitignore is renamed.

Since filter() has no way of knowing the order of the files being extracted, the code will have no choice but to track potential renames, then untrack them when a .npmignore is found:

// PROPOSED FIX
if (/File$/.test(entry.type)) {
	const base = basename(entry.path)
	if (base === '.npmignore') {
		sawIgnores.delete(entry.path.replace(/\.npmignore$/, '.gitignore'))
	} else if (base === '.gitignore') {
		sawIgnores.add(entry.path)
	}
	return true
}

Next, sawIgnores needs to somehow be magically returned so that extractor.on('end', ...) can loop over it and rename the .gitignore files.

for (const ignoreFile of sawIgnores) {
	const from = path.join(dest, ignoreFile);
	const to = path.join(dest, ignoreFile.replace(/\.gitignore$/, '.npmignore'));
	fs.renameSync(from, to);
}

As a bonus, wrap this entire thing in a flag so it can be disabled because publishing a .npmignore makes no sense.

Test Case

I've create a handy dandy package template-kit-test with version v1.0.3 containing a .gitignore and v1.0.4 containing both a .gitignore and a .npmignore.

[BUG] gzip archives (from git) have inconsistent checksums across Linux distributions

What / Why

This is very similar to #62 and possibly npm/cli#2846 but since it's happening reliably on the same machine (ie no architectural differences) in separate docker containers and comments are more likely to be overlooked, I preferred to open a new issue.

I can reliably reproduce this bug using ubuntu and archlinux Docker images using these two Dockerfiles (just for the sake of reproducing it easily) - simply run builds using docker build -f Dockerfile.XXX . and check the different lock files:

How

FROM ubuntu:focal
RUN set -ex && \
        apt update && \
        apt install -y curl git && \
        curl -fsSL https://deb.nodesource.com/setup_15.x | bash - && \
        apt install -y nodejs
RUN npm i -g [email protected]
RUN mkdir /test
WORKDIR /test
RUN npm i --verbose 'github:jqplot/jqplot#d96a669fbb729f4f51e2214688e54320411219af'
RUN cat package-lock.json
{
  "name": "test",
  "lockfileVersion": 2,
  "requires": true,
  "packages": {
    "": {
      "dependencies": {
        "jqplot": "github:jqplot/jqplot#d96a669fbb729f4f51e2214688e54320411219af"
      }
    },
    "node_modules/jqplot": {
      "version": "1.0.9",
      "resolved": "git+ssh://[email protected]/jqplot/jqplot.git#d96a669fbb729f4f51e2214688e54320411219af",
      "integrity": "sha512-X/WC4DGdoiLof0cK/nTywyNzBwTNsEwH7Ky6ndwn5SUgsNmZDNnfugNhpfMX1y3Jh+GG6O9UxSMaFH/3pcffHQ==",
      "license": "(MIT AND GPL-2.0)"
    }
  },
  "dependencies": {
    "jqplot": {
      "version": "git+ssh://[email protected]/jqplot/jqplot.git#d96a669fbb729f4f51e2214688e54320411219af",
      "integrity": "sha512-X/WC4DGdoiLof0cK/nTywyNzBwTNsEwH7Ky6ndwn5SUgsNmZDNnfugNhpfMX1y3Jh+GG6O9UxSMaFH/3pcffHQ==",
      "from": "jqplot@github:jqplot/jqplot#d96a669fbb729f4f51e2214688e54320411219af"
    }
  }
}

FROM archlinux/base:latest
RUN set -ex && \
        pacman -Sy --noconfirm && \
        pacman -S --noconfirm nodejs npm git && \
        pacman -Syu --noconfirm
RUN npm i -g [email protected]
RUN mkdir /test
WORKDIR /test
RUN npm i --verbose 'github:jqplot/jqplot#d96a669fbb729f4f51e2214688e54320411219af'
RUN cat package-lock.json
{
  "name": "test",
  "lockfileVersion": 2,
  "requires": true,
  "packages": {
    "": {
      "dependencies": {
        "jqplot": "github:jqplot/jqplot#d96a669fbb729f4f51e2214688e54320411219af"
      }
    },
    "node_modules/jqplot": {
      "version": "1.0.9",
      "resolved": "git+ssh://[email protected]/jqplot/jqplot.git#d96a669fbb729f4f51e2214688e54320411219af",
      "integrity": "sha512-hMjKgDiIZ2RWZOe0wOUk9V1kWwyuvpNoqIoDT1hJ/1RmzKnYIfKM1BUPdJAo4gXr/LgmEF6GxGPZ1uXn7cfVBw==",
      "license": "(MIT AND GPL-2.0)"
    }
  },
  "dependencies": {
    "jqplot": {
      "version": "git+ssh://[email protected]/jqplot/jqplot.git#d96a669fbb729f4f51e2214688e54320411219af",
      "integrity": "sha512-hMjKgDiIZ2RWZOe0wOUk9V1kWwyuvpNoqIoDT1hJ/1RmzKnYIfKM1BUPdJAo4gXr/LgmEF6GxGPZ1uXn7cfVBw==",
      "from": "jqplot@github:jqplot/jqplot#d96a669fbb729f4f51e2214688e54320411219af"
    }
  }
}

When running npm pack in node_modules/jqplot manually I also get the different integrity hashes, but when I gunzip those files the .tar files have the same checksum so it's clearly related to gzip producing different output.

On my Gentoo system I get the same hash as on archlinux; I just used arch in the dockerfile because it's faster than compiling nodejs manually on a Gentoo image ;)

How to use extract with a +git resolver ?

Hi,

I'm working on a Node.js security project where i download and extract npm packages with pacote (work very well). But sometimes users give custom resolvers (like git+). But right now i can't success to extract these.

I'm working on this given package.json: https://github.com/Purexo/html-to-rss/blob/master/package.json

With this as dependency

"u-http-server": "git+https://github.com/tpoisseau/uNodeHttpServer.git#1.0.1"

I'v tried a lot of syntax but everything fail.. Maybe is not supported, maybe i'm not doing the right thing so i'm coming here to get your feedback.

(i'v just created a little code to work on the subject.. like i said i tried many different URL).

"use strict";
const pacote = require("pacote");

async function main() {
    const ret = await pacote.extract("https://github.com/tpoisseau/uNodeHttpServer.git#1.0.1", "./tmp");
    console.log(ret);
}
main().catch(console.error);

The README say It supports all package specifier syntax that npm install and its ilk support. so i supposed i'v missed something :\

Thanks for your time and help

Best Regards,
Thomas

[FEATURE] export a cacache reference

What / Why

For consistency reasons, I think that it would be useful to re-export a reference to the current cacache module.

My application uses pacote for fetching packages, and cacache for fetching archives with binary tools referred by the packages.

Currently I have dependencies to both pacote and cacache, but, for consistency reasons, with each version bump, I have to manually check the version of cacache used by pacote, and configure my dependencies accordingly.

Knowing your code you might not agree, but generally I think that using two different versions of a cache manager in the same folder may be unsafe, or, if the code is 100% able to cure/recover the content, at least inefficient, since a new version may write content in a format that the old version might not like and each access run a full content cache.

The changes to accommodate this feature are minimal, for example a new line in index.js like:

module.exports = {
  resolve: (spec, opts) => get(spec, opts).resolve(),
  extract: (spec, dest, opts) => get(spec, opts).extract(dest),
  manifest: (spec, opts) => get(spec, opts).manifest(),
  tarball: (spec, opts) => get(spec, opts).tarball(),
  packument: (spec, opts) => get(spec, opts).packument(),
  cacache: require('cacache'), // <---
}

With such a definition, my application would no longer need a dependency to cacache, and get it from pacote.

[FEATURE] Add support to list & remove the cache content

What / Why

I am using pacote to download and cache, in addition to reasonably sized source libraries, of large binaries (like toolchain distributions, hundreds of MB) and I would appreciate a method to enumerate the content of the cache, such that later to be able to selectively remove some cached files. For now the only method I found was to completely remove the cache, which is far from optimal.

In other words, for the CLI app, I suggest two new commands, like list and remove, obviously matched by API calls.

Thank you,

Liviu

[BUG] v10: hosted git tarball urls and private repos

What / Why

Pacote returns a resolved value for hosted git repos (github, etc) as the tarball url. Eg:

$ pacote extract npm/wubwub xyz --cache=./cache
{
  resolved: 'https://codeload.github.com/npm/wubwub/tar.gz/470f18568287be7be38b7e12f4e949c39edb0fee',
  integrity: 'sha512-71+Eh9ob6yFoabIpqy/lNTbvjsbbTZ7EMP90oW8cCWZIqc+cOM/sqjo3Ef+5/+1jVX6miiU2i4t49nJfYimT/A=='
}

However, these files are not guaranteed to be consistent, so the integrity value will change. If we store that integrity value in a lock file, it'll break on the next installation.

Also, if we store that resolved value in a lock file, then it'll be fetched as a remote type dep (ie, just a url to a tarball) and so private repos will break.

When

  • fetch a hosted git dep

How

With pacote.

Current Behavior

  • resolved is a tarball url
  • integrity is the integrity of that tarball url (which often won't match the integrity of the tarball we create from the git download)

Steps to Reproduce

  • pacote.resolve('npm/cli').then(console.log)

Expected Behavior

  • Set this.resolved to spec.hosted.ssh({ noCommittish: false }) instead of spec.hosted.tarball().
  • We can still fetch the contents with a RemoteFetcher going for the spec.hosted.tarball() url rather than cloning the repo (since it's almost always faster), but should report a resolved value which is useful in the future.
  • If that RemoteFetcher gets a 404, then fall back to doing a non-hosted-style clone. (See lib/git.js in the [_clone] method, that's where most of the changes would go.)

Who

References

  • n/a

Consistent useful return values for extract and tarball.toFile

Right now, extract's promise always resolves to the boolean return value from opt.log.silly. toFile's resolves to undefined.

Both of these should always return the resolved and integrity values of the tarball that they fetched or extracted.

With this, extract and extractStream don't have to add _fields to package.json files. npm v7 won't need them, but it will need the resolved and integrity metadata when it extracts remote deps.

When a tarball stream is returned, it should either already have integrity and resolved properties, or emit events when those props are available.

Integrity should always be an sri string, not an ssri hash object.

[FEATURE] play nicer with caching

What / Why

Currently the cleanupCached() method in lib/fetcher.js seen here: https://github.com/npm/pacote/blob/latest/lib/fetcher.js#L323

Deletes content directly from this.cache which is the cache directory configured by npm and is shared across other modules, this is especially relevant for make-fetch-happen. When we remove content that could've been written by someone else, we're putting the other module in a position that it can have an up to date and correct index entry that directs to content that isn't there. Granted, make-fetch-happen should be more resilient to this, but I think the correct fix here is to be a little smarter about how pacote handles the cache.

Using the example of make-fetch-happen, data is already in cacache and will already be read from there. Right now pacote will read from the cache, create a new integrity stream to verify the data again (cacache already did this when it read the original content), and write back to the same cache while also returning the data to the user. This is an extra write and hash that we absolutely do not need to do, and it occurs for every single file that is retrieved through the RemoteFetcher and RegistryFetcher classes.

This module should be modified such that it only uses cacache for subclasses that are not already cached by their own means, which today is file, dir, and git. Arguably we shouldn't bother caching any of these things at the pacote level since it's reasonably likely git state will change without us knowning, and copying a file (or packing a directory and copying the result) to cacache isn't really of much benefit.

This is related to npm/arborist#297

[BUG] No possibility to embed pacote in single js file

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I'm using pacote in my code and trying to pack it inside single js file for GitHub actions using @vercel/ncc or rollup.js. Both methods fail because of node-gyp in inner dependencies.

ncc: Version 0.34.0
ncc: Compiling file index.js into CJS
ncc: Using [email protected] (local user-provided)
Emitting /Users/jacek/Projects/github-workflows-tslibs/actions/node_modules/node-gyp/bin/node-gyp.js for static use in module /Users/jacek/Projects/github-workflows-tslibs/actions/node_modules/@npmcli/run-script/lib/make-spawn-args.js
Skipping asset emission of /Users/jacek/Projects/github-workflows-tslibs/actions/*/package.json for /Users/jacek/Projects/github-workflows-tslibs/actions/node_modules/@npmcli/run-script/lib/make-spawn-args.js as it is outside the package base /Users/jacek/Projects/github-workflows-tslibs/actions/node_modules/@npmcli/run-script
Error: Module parse failed: Unexpected token (29:23)
File was processed with these loaders:
 * ./node_modules/@vercel/ncc/dist/ncc/loaders/empty-loader.js
 * ./node_modules/@vercel/ncc/dist/ncc/loaders/relocate-loader.js
 * ./node_modules/@vercel/ncc/dist/ncc/loaders/shebang-loader.js
You may need an additional loader to handle the result of these loaders.
|     npm_lifecycle_event: event,
|     npm_lifecycle_script: cmd,
>     __webpack_require__.ab + "node-gyp.js",
|   })
| 
    at /Users/jacek/Projects/github-workflows-tslibs/actions/node_modules/@vercel/ncc/dist/ncc/index.js.cache.js:37:1770552
    at /Users/jacek/Projects/github-workflows-tslibs/actions/node_modules/@vercel/ncc/dist/ncc/index.js.cache.js:37:374702
    at _done (eval at create (/Users/jacek/Projects/github-workflows-tslibs/actions/node_modules/@vercel/ncc/dist/ncc/index.js.cache.js:20:75523), <anonymous>:9:1)
    at eval (eval at create (/Users/jacek/Projects/github-workflows-tslibs/actions/node_modules/@vercel/ncc/dist/ncc/index.js.cache.js:20:75523), <anonymous>:34:22)

Expected Behavior

It would be nice to be able to compile code with pacote in single js file. The likely solution would be to give up the dependency on node-gyp.

Steps To Reproduce

  1. Use pacote in your code.
  2. Try to compile it into single js file with ncc npx ncc -d build src/assert-prod-version-action.ts
  3. See above error.

Environment

  • npm: 8.11.0
  • Node: 16.16.0
  • OS: Mac OS 12.6
  • platform: x86_64

[BUG] When using git+ssh protocol, ~/.ssh/config isn't read

What / Why

I use GitHub deploy keys for authentication when deploying private npm packages (using releases).

$ cat ~/.ssh/config
Host react-hash-link
HostName github.com
IdentityFile ~/.ssh/react-hash-link

Running npm install git+ssh://git@react-hash-link:sunknudsen/react-hash-link.git works as expected, but not the following.

$ pacote packument git+ssh://git@react-hash-link:sunknudsen/react-hash-link.git
TypeError [ERR_INVALID_URL]: Invalid URL: git+ssh://git@react-hash-link:sunknudsen/react-hash-link.git
    at onParseError (internal/url.js:257:9)
    at new URL (internal/url.js:333:5)
    at new URL (internal/url.js:330:22)
    at GitFetcher.[_addGitSha] (/usr/local/lib/node_modules/pacote/lib/git.js:128:28)
    at /usr/local/lib/node_modules/pacote/lib/git.js:228:27 {
  input: 'git+ssh://git@react-hash-link:sunknudsen/react-hash-link.git',
  code: 'ERR_INVALID_URL'
}

This causes a downstream issue with npm-check-updates.

Perhaps I am missing something.

Thanks for your help!

[BUG] _cached field is incorrect

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The _cached field is set in the presence of a x-local-cache header in the response, which make-fetch-happen seems to set on all requests that it will cache.

The header is not correct, and I don't know that we actually use it. Probably best to just remove it.

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

[BUG] resolving git URL with branch/tag

What / Why

An error is thrown when resolving a git URL with a branch or tag.

How

Current Behavior

  • Using pacote.resolve() with a git shortcode will return a string without resolving to a specific tag or branch
  • Using pacote.resolve() with a git shortcode will throw an Error when resolving to a specific tag or branch

Steps to Reproduce

> pacote.resolve('github:npm/libnpm').then(console.info, console.error)
"git+ssh://[email protected]/npm/libnpm.git#a10f50f71ad05bf51f478aafd8bf92e770c569ab"

> pacote.resolve('github:npm/libnpm#master').then(console.info, console.error)
Error: command failed
    at ChildProcess.<anonymous> (.../npm-check-git/node_modules/@npmcli/promise-spawn/index.js:63:27)
    at ChildProcess.emit (events.js:315:20)
    at ChildProcess.EventEmitter.emit (domain.js:506:15)
    at maybeClose (internal/child_process.js:1021:16)
    at Socket.<anonymous> (internal/child_process.js:443:11)
    at Socket.emit (events.js:315:20)
    at Socket.EventEmitter.emit (domain.js:506:15)
    at Pipe.<anonymous> (net.js:674:12)
    at Pipe.callbackTrampoline (internal/async_hooks.js:120:14) {
  cmd: '/usr/bin/git',
  args: [
    'clone',
    '--mirror',
    '-q',
    'ssh://[email protected]/npm/libnpm.git',
    '~/.npm/tmp/git-clone-2ffe14e8/.git'
  ],
  code: 128,
  signal: null,
  stdout: '',
  stderr: "fatal: destination path ~/.npm/tmp/git-clone-2ffe14e8/.git' already exists and is not an empty directory.\n"
}

Expected Behavior

  • Using pacote.resolve() with a git shortcode should resolve a git+ssh URL whether using a tag/branch reference or not

Stop using process.umask()

Refs: nodejs/node#32321

Summary: process.umask() (no args) will be deprecated and removed.

I couldn't quite divine what lib/config/defaults.js uses process.umask() for but in most cases you don't need to deal with the umask directly - the operating system will apply it automatically.

Example:

const mode = 0o777 & ~process.umask();
fs.mkdirSync(dir, mode);

Computing the file mode that way is superfluous, it can be replaced with just this:

fs.mkdirSync(dir, 0o777);

nodejs/node#32321 (comment)

From npm/bin-links#18:

Since this is only done after the file is created (and it is created without the knowledge that it will eventually need to be an executable script), we can't just rely on default file creation masking, since chmod isn't limited by that.

If we don't read process.umask, we risk making all executable files world-writable (or even just group-writable) which is a security risk.

As I can see it, the only way to avoid this would be to have pacote take note of executable file targets at unpack time, and create them with a 0o777 mode, regardless of what the archive entry says, and then also tell tar not to chmod them to 0o777 after creation.

Probably this will require a way to provide chmod:false to tar.Unpack anyway, so that pacote can just set the creation modes to 0o666/0o777 and ignore the specific mode found in the archive.

cc @bnoordhuis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.