Comments (4)
It's hard to say exactly, as its not easy to repro, but I suspect its a running-out-of-memory issue (probably RAM). At least that's what searching for 'error waiting for container: EOF' seems to suggest.
You could try running with less workers to see if it happens again? And then try docker inspect
on the container to see if there is more info on why it was SIGTERMed.
The next release will make it a bit easier to restart the crawl. With the 0.5.0, the container will automatically try to save the crawl config + current state of the crawl to the crawls
directory such that it can be restarted again. You could try running with that version, currently on the: https://github.com/webrecorder/browsertrix-crawler/tree/save-state branch.
from browsertrix-crawler.
Thanks. I am running jobs using 8 workers. Is that considered to be a lot / too many, or is it within the acceptable range for browsertrix (and therefore, should not crash)?
from browsertrix-crawler.
I can report that I've seen a strange SIGTERM as well, on an Ubuntu snap-based Docker deployment. Of course, when I added extra logging and restarted the daemon, it didn't happen! So I guess one possibility is to try restarting the docker daemon if it happens again? Will update if I find more info..
from browsertrix-crawler.
Haven't been able to repro since, but I suspect it was oomkilled. We now have Period State Saving which can help if this happens in the future. Closing, since exact repro is uncertain and not sure its fully solvable if OOM.
Comment if there is a specific repro.
from browsertrix-crawler.
Related Issues (20)
- Inconsistent Tweet archiving HOT 4
- Cloudflare interstitial wait isn't working HOT 3
- Any way to save seed urls into separate collections? HOT 2
- make browsertrix-crawler runnable in serverless environments HOT 3
- how configurable is the Automated Profile Creation feature
- Add request initiator to WARC? HOT 6
- [Bug]: no warc-info header in any warc file included in a wacz
- SOCKS proxy username and password parameters missing
- Crawl JS and CSS HOT 3
- RCE Vulnerability in puppeter-core HOT 1
- Generate 'pageinfo' resource records with summary of all page resources. HOT 1
- Unable to run multiple crawls in a single bash session HOT 1
- Add option to write pages to queue in Redis
- Brave Default Setting Improvements HOT 1
- Change path in seedFile example in readme.md HOT 3
- Handle seed redirects
- Failure uploading large files (handling slowDown) HOT 10
- Use js-wacz to create WACZ files HOT 1
- Make screenshot after custom behaviors HOT 4
- WARC Validation Error appears from time to time HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from browsertrix-crawler.