Comments (5)
Glad someone else is coming across this problem. I was able to fix this problem on my local machine by using proper model files provided by our data scientist Christine. Unfortunately, this did not work in the case of our deployed servers, which are still suffering from this problem, despite using the same model files that I used on my local machine.
I'll talk with Christine today and see what the status is of our ache debugging, and I may be able to tackle this problem freshly by creating new model files on our deployment server.
I'll let you know how it goes, since we are in the same boat :).
from ache.
I'm also trying to reproduce this error and understand what is going on. I'll let you know if I find something.
from ache.
So we managed to fix this on our deployment box by redeploying it on a new machine. We never managed to figure out why it was failing, and were never able to reproduce the bug. Wish I could help you guys out.
from ache.
What version are you running? Is it the version from branch master of this repository?
Here's what I found so far about this problem. The ACHE crawler has three main services: TargetStorage
, LinkStorage
and CrawlerManager
. This exception happens when the TargetStorage
tries to communicate with the LinkStorage
, but the LinkStorage
is not running.
When running the crawler using the command ache startCrawl
, all services are started at the same time in different threads of the same process. Sometimes the TargetStorage
tries to connect to LinkStorage
, but the LinkStorage
has not finished initializing yet, so this exception happens. As the TargetStorage
tries to reconnect automatically, everything works fine when the LinkStorage
is ready. In this case, this is not a real problem, but the logs are misleading.
When each service is started in a different process, if the LinkStorage
dies the same problem will happen. You need to restart the LinkStorage
process in this case.
I've been working on improving the logs and detect any problems that may cause the LinkStorage
stops working. This changes will be pushed to this repository soon.
from ache.
Already fixed in commit 0c066bc
from ache.
Related Issues (20)
- Crawler getting stuck (lots of "Still waiting to process downloaded pages..." msgs) HOT 7
- Ache Tor Crawler cannot create index in ElasticSearch, Please help. HOT 3
- How to config ache Tor crawler for deal with captcha HOT 2
- Support Elasticsearch 7.x and 8x HOT 11
- Crawler Execution Failed HOT 5
- Test Page Classifier against given page HOT 2
- TorProxy not connecting HOT 6
- Question: Elastic Cloud Credentials HOT 3
- cloning problem HOT 3
- How to connect OpenSearch / ElasticSearch with User and Password. HOT 9
- I can not find any folder like buid in ache folder HOT 1
- Issue in some test case HOT 1
- Issue in Start Ache Crawler HOT 1
- Crawler does not crawl all links of paginated forum list HOT 4
- TLS connection error HOT 1
- Enable Javascript using TorProxy HOT 1
- Unable to start crawler using Java 17 HOT 4
- unable to set different ElasticSearch index name for different crawl in the server
- Unable to crawl continuously
- buildCrawler error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ache.