rapidsai / pre-commit-hooks Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Currently all of the non-dependency components of our package files are managed manually. This can lead to problems when some fields need to be specified in certain ways. It should be fairly trivial to implement a pre-commit hook that checks things like the license fields to ensure that they are appropriate. We could add as much complexity as we really need to handle various, although I'd probably err on the side of starting with only linting simple cases like common license typos and only adding more complex checks later if necessary.
Is your feature request related to a problem? Please describe.
I recently setup a GitHub action in the Thrust repository to validate links in Markdown docs to ensure links don't get stale/broken.
It would be nice to do the same thing for libcudf (and other RAPIDS libraries).
The first hook that we would like to support is for checking copyright headers to ensure that they are up to date. The copyright script that we would like to adapt is already present in a few repos such as cudf. The purpose of this issue is to capture discussions that we have had recently regarding issues with this script, discuss potential workarounds, and determine a path forward.
The current script tends to work well for our style checks in CI, but it occasionally runs into issues on local runs where it incorrectly identifies modified files, resulting in updated copyrights for unmodified files. Here are some of the solutions suggested so far:
--all-files
result in errors on foo.hpp/merge
GH command to account for this discrepancy by performing a squash and copyright update prior to doing the squash merge on GH, but we rejected this solution as requiring too much engineering effort.Given the above considerations, our current best path forward may be simply using the existing script and trying to identify its weak points. @ajschmidt8 and I considered the possibility of including further git logic into the copyright script to verify whether the relevant git target branches are sufficiently up-to-date. If we went this route, my inclination would be to throw errors rather than do any implicit git actions in the hook, but we may at least be able to provide more robust error modes in this way and avoid incorrect modifications.
When the verify-copy
right hook is not provided a target branch via command-line flags or environment variables, and when it finds a branch-{YY}.{MM}
one locally, it currently applies the following procedure to find the commit to compare to:
branch-{YY}.{MM}
is trackingIt should instead do this:
branch-{YY}.{MM}
is trackingpre-commit
disagreeing as a result of your fork being out of date with the upstream repoModify the implementation of get_target_branch_upstream_commit()
to accomplish this:
The verify-copyright
hook only modifies copyright headers for files that it considers to have "changed".
"changed" in that hooks definition means "the file content is different between HEAD wherever the hook is being run and the latest commit on the target branch".
"target branch" can be passed to the hook via environment variables or command-line flags, as described here:
When none of those methods are used, the hook tries to determine the target branch and its latest commit by inspecting the git
repo it's being run from and looking for branches that follow the RAPIDS conventions like branch-{YY}.{MM}
.
This logic says "take the latest branch-{YY}.{MM}
locally, and get the commit ID of the the latest commit on the upstream branch it's tracking".
pre-commit-hooks/src/rapids_pre_commit_hooks/copyright.py
Lines 191 to 194 in b467a5e
Created based on discussions about this: rapidsai/rmm#1564 (comment)
In rapidsai/cudf#14917 we pinned this repo to a git SHA. I'd like to adopt some kind of version/tag strategy for this repo, so we don't have to pin to hashes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.