Comments (13)
We are now often seeing the job exceed the 6 hour job runtime limit. For example: https://github.com/adafruit/circuitpython-org/actions/runs/6809744073/job/18516615845.
There is some GitHub API rate limiting going on too (see in that job), but it would be really nice to make this run faster.
from adabot.
Thinking about your local test in #361, are we doing the queries without credentials? If we did them with credentials, would we avoid the rate limit?
from adabot.
I'm not sure about authorization. I filed #348 thinking that this process was using Classic PATs (https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic) which are deprecated or discouraged; I forget the exact context that led to me filing that issue but it seems to be related to this note in the docs:
Note: Organization owners can restrict the access of personal access token (classic) to their organization. If you try to use a personal access token (classic) to access resources in an organization that has disabled personal access token (classic) access, your request will fail with a 403 response. Instead, you must use a GitHub App, OAuth app, or fine-grained personal access token.
Using PATs of some kind seems to be the Proper Way (TM) to do this, but "fine-grained" PATs are the preferred way now, if we don't want to deal with having a GitHub App: https://docs.github.com/en/actions/security-guides/automatic-token-authentication#granting-additional-permissions
I am pretty sure SOME kind of authentication/token is at play here, otherwise the unauthenticated limit is something like 20 API calls an hour, far too few.
from adabot.
My local test was without authorization, but that was kinda-nice since it let me easily hit the timeout case. For the run in circuitpython-org that's troubling us, we appear to be using a token via ADABOT_GITHUB_USER and _ACCESS_TOKEN:
Thu, 09 Nov 2023 09:20:13 GMT ADABOT_GITHUB_USER: ***
Thu, 09 Nov 2023 09:20:13 GMT ADABOT_GITHUB_ACCESS_TOKEN: ***
since the value of repository secrets can't be inspected, I don't know what user & token is in play.
from adabot.
I think these tokens are OK, because I changed them recently, carefully, and other Actions jobs depend on them.
from adabot.
A successful run looks like this:
Sat, 25 Nov 2023 09:19:00 GMT Run Date: 25 November 2023, 09:19AM
Sat, 25 Nov 2023 09:19:00 GMT - Report output will be saved to: /home/runner/work/circuitpython-org/circuitpython-org/bin/adabot/libraries.v2.json
Sat, 25 Nov 2023 10:02:45 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Sat, 25 Nov 2023 10:02:45 GMT Rate Limit will reset at: 2023-11-25 10:19:13
Sat, 25 Nov 2023 10:19:13 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Sat, 25 Nov 2023 10:20:59 GMT {
Sat, 25 Nov 2023 10:20:59 GMT "updated_at": "2023-11-25T09:19:00Z",
[rest of json snipped]
so we have:
- runs for ~40 minutes before hitting rate limit
- sleeps for ~20 minutes after printing "rate limit reached" message once
- encounters rate limit message again, but apparently doesn't sleep this time
- finishes about 2 minutes later
A failed run:
Fri, 24 Nov 2023 09:20:27 GMT - Report output will be saved to: /home/runner/work/circuitpython-org/circuitpython-org/bin/adabot/libraries.v2.json
Fri, 24 Nov 2023 10:06:30 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Fri, 24 Nov 2023 10:20:39 GMT Rate Limit will reset at: 2023-11-24 10:20:39
Fri, 24 Nov 2023 10:20:39 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Fri, 24 Nov 2023 10:20:39 GMT Rate Limit will reset at: 2023-11-24 10:20:39
Fri, 24 Nov 2023 15:20:09 GMT Error: The operation was canceled.
here,
- runs for about 45 minutes
- sleeps for about 15 minutes
- apparently goes to sleep again for a long time, or never hits rate limit again
- gets timed out 5 hours later
Besides adding debugging another idea is to prepend the command with timeout
so that we can hopefully get a Python traceback from where the process is stuck, something like timeout -s INT 18000 [adabot command]
. this would be a change in the circuitpython-org repo, not here, I think.
from adabot.
also we could sleep +60 seconds after we think the rate limit will reset, not +1 second; since we hit the rate limit after 40-45 minutes and it resets after 1 hour, this doesn't really lower our throughput much.
from adabot.
Can we insert delays between our requests? Would that throttle it enough to avoid the rate limit?
from adabot.
I think I found the real problem 🎉
from adabot.
Fixed by #362.
from adabot.
I saw another failed 6-hour run, despite circuitpython-org being updated to include #362. 😕
https://github.com/adafruit/circuitpython-org/actions/runs/7030445965
Reopening
@jepler
from adabot.
oh drat
from adabot.
I have thought about moving part of the CI into libraries themselves somehow. The basic idea is to have repos trigger when updated (or scheduled) to push a record of the information gathered either to a central repo (like a folder in this repository, for example) or possibly an S3 bucket. This has the advantage of spreading out or eliminating API calls, as well as distributing the burden over multiple CI runs across the libraries. This would allow this CI to just collect them all.
from adabot.
Related Issues (20)
- Remaining libraries to patch
- Update check for setup.py after pyproject.toml
- Add check for setup.py HOT 2
- Linter failing
- Fix Actions to ignore checks requiring secrets HOT 3
- Fix negative number of issues assigned to milestones
- adabot references piwheel downloads
- Upload PyPI stats to AWS HOT 1
- Not reporting latest GitHub Actions CI results
- Pylint version needs to be re-pinned
- Update .pre-commit-config.yaml check
- Mark pull requests as "draft" in PR reports HOT 1
- Total PyPI downloads truncated
- CI needs to be update before deprecation of methods
- Perform majority of checks on items in Bundle and Blinka ecosystem
- Use same method of announcing updated libraries as Community Bundle
- Get rid of use of PAT HOT 1
- add versioned releases HOT 1
- Get PyPI stats using dedicated library
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adabot.