Comments (6)
One thing to try is to update the autofetch script to cause an additional wait as soon as any autofetching is started, similar to videos.. Need to test if that would be sufficient..
I dont know if networkidle0 is always the best option, I think I've seen some pages work better with waiting for load event..
from browsertrix-crawler.
It turns out its possible to wait for both networkidle0 and load events simply by passing an array to waitUntil
.
The command-line option defaults to load,networkidle0
and I think that should do it for this use case.
Tested with the example linked and the srcsets seem to be getting captured.
from browsertrix-crawler.
Thanks for testing this in the browser! That does seem very strange, perhaps there is a timing issue and it moves to next page too quickly before they can be fetched.. There is not anything waiting to ensure the autofetcher is fully finished..
Could you try running with --waitUntil networkidle0
to see if that makes a difference? If not, maybe need to have additional waiting options.
from browsertrix-crawler.
Ah, that's it! Worked with networkidle0
.
from browsertrix-crawler.
What should be done here? Should this working configuration be the default one?
from browsertrix-crawler.
@ikreymer Thank you. Would that be possible to make it available soon in a release?
from browsertrix-crawler.
Related Issues (20)
- how configurable is the Automated Profile Creation feature
- Add request initiator to WARC? HOT 6
- [Bug]: no warc-info header in any warc file included in a wacz
- SOCKS proxy username and password parameters missing
- Crawl JS and CSS HOT 3
- RCE Vulnerability in puppeter-core HOT 1
- Generate 'pageinfo' resource records with summary of all page resources. HOT 1
- Unable to run multiple crawls in a single bash session HOT 1
- Add option to write pages to queue in Redis
- Brave Default Setting Improvements HOT 1
- Change path in seedFile example in readme.md HOT 3
- Handle seed redirects
- Failure uploading large files (handling slowDown) HOT 10
- Use js-wacz to create WACZ files HOT 1
- Make screenshot after custom behaviors HOT 4
- WARC Validation Error appears from time to time HOT 3
- browsertrix URLCleaner rules Contribute HOT 2
- Crawl resumed from saved state revisits already done pages HOT 5
- Question: is there any processing done to URI values? HOT 2
- Update documentation for 1.0.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from browsertrix-crawler.