Git Product home page Git Product logo

Comments (4)

dbjorge avatar dbjorge commented on August 17, 2024

Definitely seems like it's a puppeteer issue. Here's a self-contained repro (contains both an express server reproducing the required test page redirect behavior and also a self-contained puppeteer/playwright test script demonstrating the issue, see comment at top for instructions).

The issue repros in both Puppeteer 18.1.0 (what service currently uses) and 19.8.3 (latest as of writing). It does not repro in Playwright, but note that this is in part because Playwright's behavior for Page.goto's return value differs (both as-documented and in-practice) in this case. Playwright documents:

In case of multiple redirects, [Page.goto] will resolve with the first non-redirect response.

That is, in the same type of code as we see in the repro, Playwright (intentionally) resolves with the first 200 response. However, even if the repro is modified to prime the cache and then check the result of a direct Page.goto to the second page in line (which responds with a 302 directly), Playwright still doesn't repro the issue.

This is in comparison to Puppeteer, which documents:

In case of multiple redirects, [Page.goto] will resolve with the response of the last redirect.

This wording is a little ambiguous to me about whether the intended behavior is actually to return the 302 here vs the final 200 here, but whichever one is intended, it seems like it's probably a puppeteer bug that you get a different answer based on whether the page is cached or not.

from accessibility-insights-service.

dbjorge avatar dbjorge commented on August 17, 2024

Filed upstream issue with Puppeteer at puppeteer/puppeteer#9965. However, it's likely that we'll need to consider a workaround even if they fix it, since Apify locks us into an older version of Puppeteer.

from accessibility-insights-service.

dbjorge avatar dbjorge commented on August 17, 2024

This turns out to actually be a bug further upstream in Chromium (https://crbug.com/1340398), but Puppeteer has implemented a workaround on their end and released it in 19.8.4. Playwright also already contains a similar workaround as of this commit, which I think maps to release 1.23.1, though there isn't a patch note for it.

So other options for workarounds on our end now include:

  • Update our override of puppeteer to use 19.8.4 instead of the current 18.1.0 (both versions are well above what our current version of apify claims support for).
  • Use playwright instead of puppeteer
  • Update from apify v2 to crawlee (the renamed apify v3), whose puppeteer engine currently depends on "puppeteer": "<= 19.x" (which would allow for 19.8.4 without a resolution)
  • Add a puppeteer patch that implements the NetworkManager.ts update contained in this PR
    • Note that this path would require an update to the ADO task to enable the use of patches in its runtime dependency installation step

from accessibility-insights-service.

dbjorge avatar dbjorge commented on August 17, 2024

Verified that bumping the override of puppeteer to 19.8.4 results in the repro case scanning correctly using the CLI.

from accessibility-insights-service.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.