Git Product home page Git Product logo

Comments (13)

dhalbert avatar dhalbert commented on July 20, 2024

We are now often seeing the job exceed the 6 hour job runtime limit. For example: https://github.com/adafruit/circuitpython-org/actions/runs/6809744073/job/18516615845.

There is some GitHub API rate limiting going on too (see in that job), but it would be really nice to make this run faster.

from adabot.

dhalbert avatar dhalbert commented on July 20, 2024

Thinking about your local test in #361, are we doing the queries without credentials? If we did them with credentials, would we avoid the rate limit?

from adabot.

jepler avatar jepler commented on July 20, 2024

I'm not sure about authorization. I filed #348 thinking that this process was using Classic PATs (https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic) which are deprecated or discouraged; I forget the exact context that led to me filing that issue but it seems to be related to this note in the docs:

Note: Organization owners can restrict the access of personal access token (classic) to their organization. If you try to use a personal access token (classic) to access resources in an organization that has disabled personal access token (classic) access, your request will fail with a 403 response. Instead, you must use a GitHub App, OAuth app, or fine-grained personal access token.

Using PATs of some kind seems to be the Proper Way (TM) to do this, but "fine-grained" PATs are the preferred way now, if we don't want to deal with having a GitHub App: https://docs.github.com/en/actions/security-guides/automatic-token-authentication#granting-additional-permissions

I am pretty sure SOME kind of authentication/token is at play here, otherwise the unauthenticated limit is something like 20 API calls an hour, far too few.

from adabot.

jepler avatar jepler commented on July 20, 2024

My local test was without authorization, but that was kinda-nice since it let me easily hit the timeout case. For the run in circuitpython-org that's troubling us, we appear to be using a token via ADABOT_GITHUB_USER and _ACCESS_TOKEN:

Thu, 09 Nov 2023 09:20:13 GMT     ADABOT_GITHUB_USER: ***
Thu, 09 Nov 2023 09:20:13 GMT     ADABOT_GITHUB_ACCESS_TOKEN: ***

since the value of repository secrets can't be inspected, I don't know what user & token is in play.

from adabot.

dhalbert avatar dhalbert commented on July 20, 2024

I think these tokens are OK, because I changed them recently, carefully, and other Actions jobs depend on them.

from adabot.

jepler avatar jepler commented on July 20, 2024

A successful run looks like this:

Sat, 25 Nov 2023 09:19:00 GMT Run Date: 25 November 2023, 09:19AM
Sat, 25 Nov 2023 09:19:00 GMT  - Report output will be saved to: /home/runner/work/circuitpython-org/circuitpython-org/bin/adabot/libraries.v2.json
Sat, 25 Nov 2023 10:02:45 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Sat, 25 Nov 2023 10:02:45 GMT Rate Limit will reset at: 2023-11-25 10:19:13
Sat, 25 Nov 2023 10:19:13 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Sat, 25 Nov 2023 10:20:59 GMT {
Sat, 25 Nov 2023 10:20:59 GMT   "updated_at": "2023-11-25T09:19:00Z",
[rest of json snipped]

so we have:

  • runs for ~40 minutes before hitting rate limit
  • sleeps for ~20 minutes after printing "rate limit reached" message once
  • encounters rate limit message again, but apparently doesn't sleep this time
  • finishes about 2 minutes later

A failed run:

Fri, 24 Nov 2023 09:20:27 GMT  - Report output will be saved to: /home/runner/work/circuitpython-org/circuitpython-org/bin/adabot/libraries.v2.json
Fri, 24 Nov 2023 10:06:30 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Fri, 24 Nov 2023 10:20:39 GMT Rate Limit will reset at: 2023-11-24 10:20:39
Fri, 24 Nov 2023 10:20:39 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Fri, 24 Nov 2023 10:20:39 GMT Rate Limit will reset at: 2023-11-24 10:20:39
Fri, 24 Nov 2023 15:20:09 GMT Error: The operation was canceled.

here,

  • runs for about 45 minutes
  • sleeps for about 15 minutes
  • apparently goes to sleep again for a long time, or never hits rate limit again
  • gets timed out 5 hours later

Besides adding debugging another idea is to prepend the command with timeout so that we can hopefully get a Python traceback from where the process is stuck, something like timeout -s INT 18000 [adabot command]. this would be a change in the circuitpython-org repo, not here, I think.

from adabot.

jepler avatar jepler commented on July 20, 2024

also we could sleep +60 seconds after we think the rate limit will reset, not +1 second; since we hit the rate limit after 40-45 minutes and it resets after 1 hour, this doesn't really lower our throughput much.

from adabot.

dhalbert avatar dhalbert commented on July 20, 2024

Can we insert delays between our requests? Would that throttle it enough to avoid the rate limit?

from adabot.

jepler avatar jepler commented on July 20, 2024

I think I found the real problem 🎉

from adabot.

dhalbert avatar dhalbert commented on July 20, 2024

Fixed by #362.

from adabot.

dhalbert avatar dhalbert commented on July 20, 2024

I saw another failed 6-hour run, despite circuitpython-org being updated to include #362. 😕
https://github.com/adafruit/circuitpython-org/actions/runs/7030445965
Reopening
@jepler

from adabot.

jepler avatar jepler commented on July 20, 2024

oh drat

from adabot.

tekktrik avatar tekktrik commented on July 20, 2024

I have thought about moving part of the CI into libraries themselves somehow. The basic idea is to have repos trigger when updated (or scheduled) to push a record of the information gathered either to a central repo (like a folder in this repository, for example) or possibly an S3 bucket. This has the advantage of spreading out or eliminating API calls, as well as distributing the burden over multiple CI runs across the libraries. This would allow this CI to just collect them all.

from adabot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.