Git Product home page Git Product logo

Comments (18)

bradzacher avatar bradzacher commented on September 28, 2024 7

After making the above change (adding .emit_rerun_if_changed(false) and adding a manual emit) things are looking MUCH better:

image
trace-1720761384226608.json

$ time cargo check --workspace --manifest-path ./Cargo.toml
1.66s user 0.33s system 74% cpu 2.666 total

🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉

That's more like it

from cargo.

epage avatar epage commented on September 28, 2024 1

@bradzacher Thanks for that context! It looks like only one specific prepare_target call is taking up all of that time. If you re-run with cargo check && CARGO_LOG_PROFILE=true CARGO_LOG_PROFILE_CAPTURE_ARGS=true cargo check then the traces will include the Package ID of the package that is causing this. From that we can try to understand what is different about this package that it is hitting this case in cargo. In particular, if there are is a build script and what directives it emits, symlinks inside its directory, an odd location on disk, etc.

@Xuanwo I don't even know what file walk this is to evaluate an answer like that. One theory is a build.rs that gives a bad path for for re-run-if-changed or we have some bug with how we are walking the directory structure for one of the rebuild-detection checks we do.

from cargo.

weihanglo avatar weihanglo commented on September 28, 2024 1

How about making cargo respect .gitignore, or alternatively, introducing .cargoignore when managing workspaces?

@Xuanwo it already does. The presence of exclude and include fields would affect whether .gitignore is respected.

from cargo.

bradzacher avatar bradzacher commented on September 28, 2024 1

And it only just clicked that OFC the fact that it's dumping cargo:rerun-if-changed=../../ is what's causing cargo to watch the entire repo and crawl everything.

from cargo.

epage avatar epage commented on September 28, 2024 1

I wonder if we should track how long rerun-if-changed took and produce a warning if its above a certain threshold. Unsure how often something like this would come up if it'd be worth it.

EDIT: Something cheaper than a warning is to just log the file walk we do for rerun-if-changed and then someone using the log profiling can find it.

from cargo.

bradzacher avatar bradzacher commented on September 28, 2024 1

So that rerun-if-changed came from tonic.
TL;DR cos we build with both cargo and bazel - we need to make the two work in sync.
Bazel runs everything from the repo root.
So when we generate code from proto files using tonic we need to tell it to resolve the proto paths relative to the repo root (or else tonic would generate different code when run via bazel or via cargo).

So the tonic config includes this config:

        .compile(
            /* protos:   */ &["../../tools/crate3/some/file.foo"],
            /* includes: */ &["../../"],
        )?

Tonic automatically emits cargo:rerun-if-changed directives - hence it emitted cargo:rerun-if-changed=../../.

Thankfully tonic's builder includes emit_rerun_if_changed(bool) - so we can disable its default and manually emit a sane one instead (cargo:rerun-if-changed=tools/crate3/some/file.foo)

from cargo.

epage avatar epage commented on September 28, 2024

Huh, not too sure what this would be. cargo check--manifest-path Cargo.toml vs cargo check shouldn't be too different. When looking for Cargo.toml, we only walk up the directories, not back down again.

Could you give us an idea of

  • Your general directory hierarchy (where node_modules, workspace Cargo.toml, package Cargo.toml, bazel build directory, etc live relative to each other)
  • What your workspace.members looks like

Even better if you could also run cargo check && CARGO_LOG_PROFILE=true cargo check and open the traces in chrome://tracing and let us know what the hot section is (docs). If possible, this would be best done with 1.79 as we tweak the traces for better information with each release.

from cargo.

bradzacher avatar bradzacher commented on September 28, 2024

Your general directory hierarchy

it's all over the place, really - the repo is a mess that's grown organically over time.
here's a rough idea of the mess

- /
  - Cargo.toml (workspace)
  - (bazel folders symlinked in the repo root)
    - bazel_out/
    - bazel_bin/
    - bazel_canva/
  - web/ (the frontend codebase
    - node_modules/
    - crate1/ [[rust]] (some WASM rust)
      - Cargo.toml
  - tools/ (a lot of internal tooling)
    - crate2/ [[rust]]
      - Cargo.toml
    - crate3/ [[rust]]
      - Cargo.toml
    - ...
    - js1/
      - node_modules/
    - js2/
      - node_modules/
    - ...
  - backend_service1/ (a backend microservice)
  - backend_service_rust1/ [[rust]] (a backend microservice, written in rust)
    - Cargo.toml

What your workspace.members looks like

it's just a list of folders (i.e. example based on the above)

members = [
  "web/crate1",
  "tools/crate2",
  "tools/crate3",
  "backend_service_rust1",
  ...
  "path/to/crate12", # there are only 12 crates in our monorepo thus far
]

from cargo.

bradzacher avatar bradzacher commented on September 28, 2024

Trace (with 1.78):
trace-1720750556589970.json

image

I'm not sure how to interpret the trace so here it is in its raw form.
The bulk of the time is spent in "Name prepare_target, Category cargo::core::compiler::fingerprint"

from cargo.

bradzacher avatar bradzacher commented on September 28, 2024

Trace (with 1.79):
trace-1720751045123202.json

image

Looks to be much the same

from cargo.

Xuanwo avatar Xuanwo commented on September 28, 2024

How about making cargo respect .gitignore, or alternatively, introducing .cargoignore when managing workspaces?

from cargo.

weihanglo avatar weihanglo commented on September 28, 2024

Some more questions:

  • Is there any Cargo.toml containing odd package.include that refer to relative parent directory like ..?
  • Is there any symlink under a member package pointing to directories outside the package root, for example to the workspace root?
  • Any builds script rerun-if-changed has relative paths?

from cargo.

bradzacher avatar bradzacher commented on September 28, 2024

Is there any Cargo.toml containing odd package.include that refer to relative parent directory like ..?

No. There is no package.include defined in any Cargo.toml

Is there any symlink under a member package pointing to directories outside the package root, for example to the workspace root?

Two crates contain a node_modules which have a symlinks (as pnpm does symlink installs)
The symlink is from the node_modules back to the root node_modules.
I.e. from tools/crate2/node_modules/typescript to ../../../node_modules/.pnpm/[email protected]/node_modules/typescript (note ../../.. resolves to the repo root)

Other than that - no links at all

Any builds script rerun-if-changed has relative paths?

There are two:
one outputs cargo:rerun-if-changed=tools/crate2/folder
the other outputs this (where ../../ resolves to the repo root)

cargo:rerun-if-changed=../../tools/crate3/some/file.foo
cargo:rerun-if-changed=../../

from cargo.

bradzacher avatar bradzacher commented on September 28, 2024

If you re-run with cargo check && CARGO_LOG_PROFILE=true CARGO_LOG_PROFILE_CAPTURE_ARGS=true cargo check then the traces will include the Package ID of the package that is causing this

I reran it and the profile says that tools/crate3 (the one with the relative rerun-if-changed path) is the one that's responsible for that massive block.

from cargo.

weihanglo avatar weihanglo commented on September 28, 2024

Nice!

The follow-up question then will be: Should paths from rerun-if-changed be normalized?

I believe that could also resolve this issue because .gitignore is already respected. Note that it is discouraged for a rerun-if-changed to refer to paths outside package root, as it breaks the self-contained property of a Cargo package.

A normalization would happen here if we decide to go that route.

from cargo.

bradzacher avatar bradzacher commented on September 28, 2024

Following on from that - it would be good to have an error when you break encapsulation like this.
I would suspect that in the vast majority of cases breaking encapsulation is not the intended result.

Forcing someone to emit like cargo::rerun-if-changed-allow-outside-package-folder=true or something to silence the error would help people catch accidental cases like this and provide huge perf wins!

Alternately - being able to lint for this (eg via clippy) would be good - but likely hard-to-impossible unless clippy can scan the target/debug/foo/output file for problems.

from cargo.

epage avatar epage commented on September 28, 2024

We can't error because of compatibility. We are hesitant about doing things related to build scripts on Edition boundaries because of cases like tonic where you delegate your build script written in one edition to another package which could be written in a different edition.

As for a warning, we avoid those right now unless there is a use actionable way to disable it without changing their behavior. We'll soon have lint control which will would be a way to do so.

from cargo.

bradzacher avatar bradzacher commented on September 28, 2024
$ cargo check --workspace --manifest-path Cargo.toml
$ time cargo check --workspace --manifest-path Cargo.toml

before:

 3.89s user 42.38s system 58% cpu 1:18.62 total

after:

 0.15s user 0.12s system 84% cpu 0.319 total

from cargo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.