Comments (13)
Awesome, thanks for providing a test case!
I'll investigate why this is happening tonight. :)
from simplecrawler.
I traced it to v0.9.12
being the first version to behave like this.
Running node v0.9.11
6 May 15:34:42 - Are test an instance of Crawler: false
6 May 15:34:42 - Are test an instance of Crawler: true
6 May 15:34:42 - Running: true
6 May 15:34:42 - Stopping crawler
6 May 15:34:42 - Running: false
6 May 15:34:42 - Listners:
[ [Function: datahandler] ]
6 May 15:34:42 - Removing listeners
6 May 15:34:42 - Listners:
[]
6 May 15:34:44 - http://deewr.gov.au/
6 May 15:34:46 - http://deewr.gov.au/rss/deewr-videos.xml
Running node v0.9.12
6 May 15:45:20 - Are test an instance of Crawler: false
6 May 15:45:21 - Are test an instance of Crawler: true
6 May 15:45:21 - Running: true
6 May 15:45:21 - Stopping crawler
6 May 15:45:21 - Running: false
6 May 15:45:21 - Listners:
[ [Function: datahandler] ]
6 May 15:45:21 - Removing listeners
6 May 15:45:21 - Listners:
[]
6 May 15:45:23 - http://deewr.gov.au/
6 May 15:45:23 - http://deewr.gov.au/
6 May 15:45:25 - http://deewr.gov.au/rss/deewr-videos.xml
6 May 15:45:25 - http://deewr.gov.au/rss/deewr-videos.xml
Seems like there is many changes to events: nodejs/node-v0.x-archive@v0.9.11-release...v0.9.12-release.
For example: nodejs/node-v0.x-archive@8ab346c
from simplecrawler.
After actually sitting down and looking at this, it really seems like a node problem, or perhaps something to do with how I'm attaching the crawler prototype to EventEmitter. I'm doing an investigation of the node/lib/events.js source now to see whether there's anything I can do (or whether the problem is just caused by me being an idiot, which is highly likely.)
Thanks again for the detailed test case! :)
from simplecrawler.
https://github.com/cgiffard/node-simplecrawler/blob/master/lib/crawler.js#L140
The "New way" of doing is util.inherits(Crawler, EventEmitter)
if i remember correctly.
Not sure if it will help tho :)
I can give it a shot.
from simplecrawler.
Weird, I could have sworn I changed it to that ages ago. This one must have escaped my attention.
from simplecrawler.
OK, it works, and I don't know why. I'll push the change but do some more investigating! :) Thanks for your help.
from simplecrawler.
I've pushed 0.2.4 to npm. Can you verify whether the fix worked or not?
from simplecrawler.
I think you should add a EventEmitter.call(this);
here: https://github.com/cgiffard/node-simplecrawler/blob/master/lib/crawler.js#L37
See: http://nodejs.org/docs/latest/api/util.html#util_util_inherits_constructor_superconstructor
from simplecrawler.
OK. (Although I must say it's rather annoying that I have to run the EventEmitter constructor too :P)
from simplecrawler.
Done! Can you check it? :)
(Pushed to git, but not npm... yet.)
from simplecrawler.
OK, it's on npm as 0.2.5.
from simplecrawler.
Seems fine now :)
Running node v0.10.5
7 May 15:39:31 - Are test an instance of Crawler: false
7 May 15:39:31 - Are test an instance of Crawler: true
7 May 15:39:31 - Running: true
7 May 15:39:31 - Stopping crawler
7 May 15:39:31 - Running: false
7 May 15:39:31 - Listners:
[ [Function: datahandler] ]
7 May 15:39:31 - Removing listeners
7 May 15:39:31 - Listners:
[]
7 May 15:39:33 - http://deewr.gov.au/
7 May 15:39:35 - http://deewr.gov.au/rss/deewr-videos.xml
from simplecrawler.
Cool. Thanks again! :)
from simplecrawler.
Related Issues (20)
- An in-range update of eslint is breaking the build 🚨 HOT 2
- An in-range update of mocha is breaking the build 🚨 HOT 2
- feature: validate outgoing request headers before sending them
- Cannot find site page HOT 5
- Request path contains unescaped characters
- Adding fetchcondition to check broken links HOT 2
- Crawler stuck on url 'Exceeded maximum number of redirects' HOT 1
- An in-range update of eslint is breaking the build 🚨 HOT 2
- SQLite FetchQueue Implementation for Simplecrawler
- Which method to use avoid crawling URL's that end with .js /.css /.png/.jpg HOT 1
- How to await "fetchcomplete"? HOT 1
- docs: does this spider support "waiting for all the JS to finish" because actually crawling the resulting page? HOT 2
- An in-range update of mocha is breaking the build 🚨 HOT 2
- Fail to decode application/x-gzip
- Generic error missing HOT 1
- Proxy for each request in the queue? HOT 2
- addFetchCondition to get only text/html content type? HOT 2
- crawler.supportedMimeTypes not moving after first page
- Adding to the queue on "complete" callback doesn't work
- addDownloadCondition example? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simplecrawler.