biglocalnews / warn-github-flow Goto Github PK
View Code? Open in Web Editor NEWGitHub Action workflow for automating a WARN Act notice ETL pipeline
Home Page: https://biglocalnews.org/content/tools/layoff-watch.html
License: Apache License 2.0
GitHub Action workflow for automating a WARN Act notice ETL pipeline
Home Page: https://biglocalnews.org/content/tools/layoff-watch.html
License: Apache License 2.0
Alert
The v0 series of google-github-actions/auth is no longer maintained. It will not receive updates, improvements, or security patches. Please upgrade to the latest supported versions:
https://github.com/google-github-actions/auth
Not sure it matters, but figured I'd mention it here.
ETL runs should not be set to start at the top of the hour: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule
Might also be a good time to see if the runs are at the right time and quantity.
When Github shifts to a different version of Node we start getting deprecation warnings. When appropriate updates exist, the fixes are relatively simple but sometimes difficult to find. You need to look in Github projects to see what the new correct version exists, if there actually is a Node 24 or whatever version yet. Then you need to fix the references to the version numbers in a bunch of spots.
References to look for in each thing:
actions/checkout
actions/setup-python
actions/download-artifact
actions/upload-artifact
biglocalnews/upload-files
slackapi/slack-github-action
stefanzweifel/git-auto-commit-action
Patch needed in main folder and in .github:
biglocalnews/upload-files.
Releasing a new version of biglocalnews/upload-files can be a little weird -- I think I've just been incrementing a whole version number (e.g., v3.0.0 to v4.0.0) but also re-releasing with a new major tag version tag (e.g., "v4"). Not sure that's the most direct way of doing it but it seems to be working.
Patches needed inside .github of these:
biglocalnews/bln-python-client
biglocalnews/warn-transformer
biglocalnews/warn-scraper
And a bunch of different spots in warn-github-flow -- be sure to look in .github/actions for biglocalnews/upload-files
3.9 has less than two years of its five-year life remaining, with security fixes ending in September 2025. Testing with 3.12 would add another 3 years.
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/setup-python@v4. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
Github Actions seems to have maybe run out of runners the other day and some of the scraping jobs just never ran in a 50-minute failed job. Then there was a 47-minute job and all the components took far, far less than that.
Worth keeping an eye on, and perhaps adjusting the schedule if needed to a less crush-y time.
warn-bot updates are called through .github/workflows/etl.yml and extract.yml, near the end.
- name: Install private dependencies
run: pip install warn-bot==0.1.21 -vv
warn-bot is failing on kqedsf by not finding a key of 'county' .. Disabled bot with warn-bot==0.1.13, will need to get fixed and turned back on.
First time I noticed this one:
https://github.com/biglocalnews/warn-github-flow/actions/runs/5639374606/job/15274506568
This may be hitting some API limits. I seem to recall the warn-github-flow repo is very, very large; I don't know if it makes sense to try to prune some of it.
Fetching the repository
/usr/bin/git -c protocol.version=2 fetch --prune --progress --no-recurse-submodules origin +refs/heads/:refs/remotes/origin/ +refs/tags/:refs/tags/
Error: fatal: unable to access 'https://github.com/biglocalnews/warn-github-flow/': The requested URL returned error: 429
The process '/usr/bin/git' failed with exit code 128
Waiting 16 seconds before trying again
/usr/bin/git -c protocol.version=2 fetch --prune --progress --no-recurse-submodules origin +refs/heads/:refs/remotes/origin/ +refs/tags/:refs/tags/
Error: fatal: unable to access 'https://github.com/biglocalnews/warn-github-flow/': The requested URL returned error: 429
The process '/usr/bin/git' failed with exit code 128
Waiting 16 seconds before trying again
/usr/bin/git -c protocol.version=2 fetch --prune --progress --no-recurse-submodules origin +refs/heads/:refs/remotes/origin/ +refs/tags/:refs/tags/
Error: fatal: unable to access 'https://github.com/biglocalnews/warn-github-flow/': The requested URL returned error: 429
Error: The process '/usr/bin/git' failed with exit code 128
I've found several Node12 dependency problems that keep resulting in logged warnings in the runners, but ... I have not found them all. Looks like there's at least one more as a dependency within warn-transformer.
And need to add to docs for #24
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: slackapi/[email protected]. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: stefanzweifel/git-auto-commit-action@v4. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: slackapi/[email protected]. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
Scrapers for newly scraped states need to be manually added to .github/workflows in the etl.yml and extract.yml files.
Actions are moving slower and some routine stuff fails sometimes, e.g.:
Fetching the repository
/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/:refs/remotes/origin/ +refs/tags/:refs/tags/
Error: error: RPC failed; curl 92 HTTP/2 stream 0 was not closed cleanly: CANCEL (err 8)
Error: error: 2831 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
Error: fatal: early EOF
Error: fatal: fetch-pack: invalid index-pack output
The process '/usr/bin/git' failed with exit code 128
Waiting 13 seconds before trying again
git-sizer
reports the repo has gotten quite large again:
Name | Value | Level of concern |
---|---|---|
Overall repository size | ||
* Blobs | ||
* Total size | 32.9 GiB | *** |
Biggest objects | ||
* Trees | ||
* Maximum entries [1] | 2.15 k | ** |
* Blobs | ||
* Maximum size [2] | 13.8 MiB | * |
The notes on shrinking the repo in #7 were never actually turned into useful documentation.
The following actions uses node12 which is deprecated and will be forced to run on node16: actions/setup-python@v2. For more info: https://github.blog/changelog/2023-06-13-github-actions-all-actions-will-run-on-node16-instead-of-node12-by-default/
I ditched the history of three branches -- il, wa and transformer using something like this:
git clone -b wa --single-branch [email protected]:biglocalnews/warn-github-flow.git flow-wa
cd flow-wa
git-sizer
git checkout --orphan thin-wa
git add data
git commit -m "Thin WA"
git branch -D wa
git branch -m wa
git push -u -f origin wa
git-sizer
cd ..
This shrunk the repo down considerably (Yay!).
It also breaks now when the Github Actions try to run.
I'm getting errors like this:
Download action repository 'stefanzweifel/git-auto-commit-action@master' (SHA:47a8ad5f38721f4b62f84ddd01aba6b281956891)
Download action repository 'biglocalnews/upload-files@v2' (SHA:178935a04a6a7f0856d52260cc227ce2fd96f3df)
Getting action download info
Download action repository 'actions/setup-python@v2' (SHA:e9aba2c848f5ebd159c070c61ea2c4e2b122355e)
Run ./.github/actions/extract
Run git config --global user.email "[email protected]"
Switched to a new branch 'il'
branch 'il' set up to track 'origin/il'.
From https://github.com/biglocalnews/warn-github-flow
* branch main -> FETCH_HEAD
fatal: refusing to merge unrelated histories
Error: Process completed with exit code 128.
That seems to be coming from .github/actions/extract/action.yml , at least for the scrapers; I imagine it's a similar problem for the transform branch.
I'm not sure what the way forward is here, partly because I'm not sure why it's erroring out in the first place.
This in the configuration looks promising: git config pull.rebase false
I can try to force Actions to do a merge (with --allow-unrelated-histories) before the push, also.
I can try to run some of this manually.
I do have a pre-change archive of the repo, but getting git to recognize all the remote branch data has been ... not ... great.
Ideas?
I don't know what's missing here. Is it as basic as needing a branch?
Any documentation on a fix needs to get put into #10
Prepare all required actions
Getting action download info
Download action repository 'stefanzweifel/git-auto-commit-action@master' (SHA:3d1b5e078a85df99db0cb2441cd4309b09d86253)
Download action repository 'biglocalnews/upload-files@v3' (SHA:b8becd325dd4535bcd60895d3d01ca37b6dc3bce)
Getting action download info
Run ./.github/actions/extract
Run git config --global user.email "[email protected]"
git config --global user.email "[email protected]"
git config --global user.name "GitHub Actions"
git config pull.rebase false
git checkout hi
git pull origin main
shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
env:
pythonLocation: /opt/hostedtoolcache/Python/3.9.18/x64
PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.9.18/x64/lib/pkgconfig
Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.9.18/x64
Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.9.18/x64
Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.9.18/x64
LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.18/x64/lib
error: pathspec 'hi' did not match any file(s) known to git
Error: Process completed with exit code 1.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.