Comments (10)
Wow, I'm really surprised about those results. It's completely the opposite of what I'd expect.
I'll need some time to dive into this. Thanks again for reporting it and collecting all this data. 🙇
are these workflow file URLs returning the page on GitHub for the file
👍 They appear with syntax highlighting on my browser (brave)
from backport-action.
I haven't tested it yet, but I think all that's missing is specifying a fetch depth of the fetches executed by the action. I guess currently it fetches all ancestor commits of the fetched commit but that's really unnecessary. This would explain why it takes so long 😌
See --depth
in https://linux.die.net/man/1/git-fetch
from backport-action.
@winterqt Thanks for reporting this.
I've previously pinpointed the problem in nixpkgs with changing the default ref that's checked out. Please try again without the non-default ref
.
However, I don't understand how it could lead to such a speed regression. I'd actually expect that it wouldn't work. Do you have the logs of such a run for me? Perhaps these shed some light on what's going on. I might also be able to use the data of an actual run to reproduce the issue locally.
from backport-action.
Here are a few logs:
- v0.0.5, which corresponds to this workflow file
- v0.0.8, which corresponds to this workflow file. I assume this is what you meant by it expectedly failing?
- v0.0.8 w/ default
ref
, which corresponds to this workflow file. I noticed that we were using a non-defaultref
before opening this issue, so I took a stab at it to see if it would fix anything. This took nearly 20 minutes to run 🤯
(Side note, are these workflow file URLs returning the page on GitHub for the file (with syntax highlighting etc.) or is it returning the raw content of the file? GitHub can't make up its mind seemingly :/ might be a browser thing.)
from backport-action.
EDIT: 🤔 I'm not sure how to determine the number of commits to fetch on the base branch (PR target), perhaps we should simply fetch 1000 commits to start. That's already a factor 400 improvement for nixpkgs. We can further improve it if a need arises.
Copied for visibility from #268 (review)
from backport-action.
This week I've read a lot about shallow repos, and grafted commits. I've played around with some fetching strategies that could work for the backport-action. And I've thought about optimizing the action for performance on larger repos.
TL;DR
We can optimize the action for fetch speed, but until then please consider cloning with the entire history (untested but may work):
- uses: actions/checkout@v3
with:
fetch-depth: 0
The current problems
At the moment, the action uses the git history to determine the relevant commits using git mergebase
. This means that we need to know the entire history between the pull request's base and head refs. Currently, we simply fetch the entire history of each ref. This makes sure that all commits that we're interested in are available for the action. And this works in all cases (on pull_request
and pull_request_target
, as well as merge
, rebase
and squash
scenarios). But it was a difference from before and is the reason you're now seeing this speed regression due to fetching.
In the logs you shared we see that fetching the base and head refs both take about 10 minutes while fetching the target only takes ~50s.
2022-07-25T03:39:51.7562163Z Detected labels on PR: backport backporttttttttt 2022-07-25T03:48:38.1331105Z From https://github.com/winterqt/nixpkgs 2022-07-25T03:48:38.1344220Z * branch b0e7423bf3a9a1b1b2202b56c6a0975598ef969f -> FETCH_HEAD 2022-07-25T03:59:10.9378798Z From https://github.com/winterqt/nixpkgs 2022-07-25T03:59:10.9462819Z * branch refs/pull/1/head -> FETCH_HEAD 2022-07-25T03:59:11.0342065Z Working on label backport backporttttttttt 2022-07-25T03:59:11.0351785Z Found target in label: backporttttttttt 2022-07-25T04:00:00.7954800Z From https://github.com/winterqt/nixpkgs 2022-07-25T04:00:00.8010867Z * branch backporttttttttt -> FETCH_HEAD 2022-07-25T04:00:00.8011544Z * [new branch] backporttttttttt -> origin/backporttttttttt
If we don't want to fetch the entire history but use a shallow repo, then we need to fetch an arbitrary number of commits from the base branch. This number would be dependent on the repo's speed to add more commits. For large repos like Nixpkgs, the base is growing very rapidly (I think it's at ~900 commits per week for Nixpkgs now). And so this may not be the best approach anymore.
A different approach
Instead of determining the commits to cherry-pick based on the repo's history (i.e. git mergebase), we can ask GitHub to tell us about the commits on the PR. Both the REST API as well as the GraphQL API provide methods for this.
Note that these are both paged, and limited to a maximum of 250 commits (which is a limit I'm okay with)
When we know the exact commit shas, we can shallowly fetch only the relevant parts. This should provide optimal performance for repos with large histories, and doesn't negatively impact young repos.
Cloning the repo shallowly
A current design choice is that the action can backport a PR to multiple targets. This means it needs to check out each target's latest commit and then cherry-pick the relevant commits from the PR on top of it. So we cannot simply clone the target branch shallowly. So we'll need to start with the PR head.
Sadly, it is not possible with git clone
to directly checkout the refs/pull/<#>/head
as a branch. However, it seems possible with the checkout
action, so we can look into this further optimization later on. For now, we'll need to clone the repo's default branch shallowly to get started:
git clone <URL> --depth=1
Fetch only the relevant commits to cherry-pick
Now we can fetch the relevant commits. To retrieve those, we can fetch refs/pull/<#>/head
. However, if we would shallowly fetch these, then the deepest commit will be a grafted commit that contains all changes up to that point. This would effectively remove the ability to cherry-pick it because it contains many more changes than the original commit on origin.
To work around this, we can simply fetch 1 commit deeper:
git fetch origin refs/pull/#/head --depth=n+1
(for each target:) Fetch the target and cherry-pick the commits
The target can again be shallowly fetched as a grafted commit because we don't really care about that latest commit. We just want to cherry-pick on top of it.
git fetch origin <target>:<target> --depth=1
git switch -c backport-<#>-to-<target> <target>
We can verify whether the commits that we want to cherry-pick are available to use:
git cat-file -t sha1
git cat-file -t sha2
etc
And now we can cherry-pick them.
git cherry-pick -x sha1 sha2 etc
In my experiments, I was unable to cherry-pick a range of commits this way. Perhaps this is because of the grafted commits, 🤷 But for now it's fine to reference them individually. If this command becomes too long we can even cherry-pick them individually.
git cherry-pick -x sha1
git cherry-pick -x sha2
etc
What's blocking us from doing this?
The tests are too coupled to the implementation and don't really allow such a refactoring. Recently, I've been working on https://github.com/korthout/backport-action-test as a way to test only the behavior. I don't want to make the old tests work with this new way. Instead, I'd prefer to replace them with more cases in backport-action-test. Sadly these tests are still a bit brittle and slow. I hope to improve that soon.
But for now, there aren't really many test cases in place that support these changes. So we'd have to test these changes more or less in production. That may be okay, but:
Is there anything I can try in the meantime?
I think we can try to checkout the repo differently and potentially see a similar performance as v0.0.5.
If we clone with fetch-depth 0 (entire history), I think that the following fetches should complete pretty much instantly.
- uses: actions/checkout@v3
with:
fetch-depth: 0
@winterqt Could you please give this a try?
from backport-action.
Firstly, thank you for all the research and work you've put into this, I greatly appreciate it.
Secondly, sure -- I assume you want me to bump the action to v0.0.8 and try keeping our fetch-depth
at 0, right?
from backport-action.
I assume you want me to bump the action to v0.0.8 and try keeping our fetch-depth at 0, right?
👍 Yes that's correct. Curious to your results 🙇
I greatly appreciate it.
❤️ Thanks, that's great to hear.
from backport-action.
It's taken me some time, but I'm happy to report that I was able to successfully perform a lot of successful tests with a new version of the backport-action that implements the approach I described above. 🎉
That means:
- a true shallow clone (1 commit deep) using:
- uses: actions/checkout@v3
with:
fetch-depth: 1
- the action fetches only the necessary commits containing the changes (using
git fetch --depth=x+1
) - and fetching only a single commit for each target (using
git fetch --depth=1
)
I was able to test this:
- as a PR created locally in the repo that is backported on
pull_request[closed]
- as a PR created in a fork of the repo that is backported on
pull_request_target[closed]
@winterqt I'd love to receive some feedback on this from you. Does it really speed up the checkout and the backporting in a large project like Nixpkgs?
Please try it out by switching the checkout action's fetch-depth
to 1 and using a special version of the backport action:
- uses: actions/checkout@v3
with:
fetch-depth: 1
- name: Backport
uses: zeebe-io/backport-action@korthout-fetch-depth
If these changes solve the problem as expected, this will form the basis of v1.0 of this action.
from backport-action.
@winterqt My team has switched to v1-rc1
to further battle-test this version before releasing v1
(see #289). We've already seen a dramatic performance boost on our repo with just 17k commits.
Backporting a pull request to 2 branches went from ~1m on
v0.0.9
to just 11s onv1-rc1
in the camunda/zeebe repo 🌱
Of course, I'm curious about the effect this release will have on Nixpkgs (430k commits), but I don't recommend switching the repository to this release candidate already, as it hasn't been battle-tested yet. I just wanted to keep you in the loop.
from backport-action.
Related Issues (20)
- Automate release process
- Document action outputs
- [Feature Request] Support auto-deletion for branches if the backport PR is closed HOT 1
- Feature request: allow specifying mainline HOT 9
- PR commits are cherry-picked instead of the squashed commit HOT 8
- Feature request : backport to all open feature branches HOT 8
- Use PAT in eat-your-own-dogfood workflow
- Support glob patterns in `target_branches` input
- Improve error message on workflow file changes HOT 5
- Support skipping merge commits HOT 3
- PRs created by backport-action should trigger the CI HOT 1
- ability to backport a PR to any repo, not necessarily a fork HOT 3
- support backport, committing conflicts HOT 4
- Instructions for manually cherry-picking are incorrect as they do not skip merge commits HOT 2
- Get The successful PR numbers as an output of the action HOT 1
- Improve the error messages
- Conflict resolution `draft_commit_conflicts` provides incorrect instructions
- Conflict resolve suggestion doesn't fetch commits to cherry-pick HOT 5
- Delete the backport branch after merge HOT 2
- Workflow dispatch support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from backport-action.