Comments (3)
In case it help, this could be done using an additional Processor
designed to be placed near the end of the fetch chain. It could check the status and/or the response itself, and reset it to S_DEFERRED
so Heritrix will re-enqueue it.
It should also be possible to override the re-fetch time so it backs off a bit before re-trying, but I'm not sure how that is done.
from heritrix3.
I'm interested in exploring this. I would like to work on this as part of the ongoing Hacktoberfest.
Would you be able to add the "hacktoberfest" label to it?
https://hacktoberfest.com/
from heritrix3.
Seems like a reasonable request, so I've tagged this issue. But be aware this project is not very well resourced so I can't guarantee how quickly we'll review things.
from heritrix3.
Related Issues (20)
- Question re: cloudfront.net HOT 1
- Compatibility problems with Sonatype release process
- ${launchId} is not being replaced (sometimes) HOT 1
- Questions about TransclusionDecideRule HOT 6
- Bean reference missing inherited properties
- Question about the size of the 'state' directory HOT 3
- Time is not stopped when Disk Space Monitor is triggered and report files are removed HOT 5
- Resume a crawl for later
- Question: how to create a new log/report for a single class
- Implicit max. value of URI cost and precedence (?) should raise warning if exceeded HOT 1
- Error: Could not find or load main class org.archive.crawler.Heritrix Caused by: java.lang.ClassNotFoundException: org.archive.crawler.Heritrix HOT 2
- WARNING: politessDelay unset, returning default 5000
- How to change auth type?
- Provided seed files are updated (the more the job is repited, the more they are modified)
- Error when more than 125 jobs are instantiated HOT 4
- archive web crawler - crawl speed HOT 7
- Support for silent option when running a job
- Redirect field in seeds-report.txt is only populated for status 301 and 302
- Text versions of DNS should be recorded as WARC-Type resource instead of response
- Heritrix 3.4.0-SNAPSHOT-2022-03-08T19:15:59Z keeps pausing.. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from heritrix3.