The secrawler from xtt129

secrawler's People

Contributors

Stargazers

Watchers

secrawler's Issues

Crawling titles

Hello! It is possible to crawl a titles, not just urls?

Am hoping this post is being monitored even if a bit old...:). I'm trying to use it, and having some issues that I suspect are related to using python 3x instead of 2x, but am unsure. Looks pretty elegant and powerful, so am hoping for help:), but am wondering too whether I need an approach that takes google fighting automated searches like this into account.

E.g., the print statement in searchEngines.py throws an error that I think requires adding () so it becomes print("total page:{0}".format(self.totalPage))....I changed the import and added "from seCrawler.common.searchEngines import SearchEngines" in searchResultPages.py".

Starting the crawler from the readme, scrapy crawl keywordSpider -a keyword=Spider-Man -a se=google -a pages=50 (without or without the ''', so same result with scrapy crawl keywordSpider -a keyword=Spider-Man -a se=google -a pages=50), throws and error:

File "/spiders/keywordSpider.py", line 21, in init
for url in pageUrls:
TypeError: iter() returned non-iterator of type 'searResultPages'

I'm seeing some posts around with python 3, need to use def next(self): instead of next(), but I don't believe there is a next() statement anywhere in the code. The " init" file in the spider folder is essentially commented out/blank.

My guess is this has something to do with turning number of pages into an integer, and then iterating through whatever that value is (50 as the default).

Q: if anyone notices this, suggestions how how to tweak code to make it work? Am I behind the times, and all search engines robots.txt forbid what I'm trying to do?

FYI I don't think this is related to which search engine is being used....as crawl command throws same error whether using google, bing, or baidu.

Thx

xtt129 / secrawler Goto Github PK

secrawler's People

Contributors

Stargazers

Watchers

Forkers

secrawler's Issues

Crawling titles

Help w/ Errors

TypeError: iter() returned non-iterator of type 'searResultPages'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent