Git Product home page Git Product logo

spider's People

Contributors

gsh199449 avatar hokis avatar ordinaryyzh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spider's Issues

能否曾加一个批量发布采集功能?

很感谢您的这个产品,昨天我试了一下。有个建议希望在爬虫列表那里增加一个勾选框,这样方便批量发布采集任务,这样不用每次都要单独进入编辑模板才能发布。谢谢。

抓取任务列表中的【已抓取数量】和网站列表中的【资讯数】不一致

task在触发reachMax或者exceedRatio停止之后,CommonSpider onSuccess方法中log打印的【有效页面数】和抓取任务列表中的【已抓取数量】以及网站列表中的【资讯数】不一致均不一致

log

爬虫ID5e21c6bd-4878-413c-b0fa-a46a5c3376ac已处理31个页面,有效页面6个,最大抓取页数10,reachMax=false,exceedRatio=true,退出.

已抓取数量

任务名称 已抓取数量 抓取状态
www.163.com 9 STOP

资讯数

网站域名 资讯数
www.163.com 7

eclipse构建项目,使用模板抓取时报错

你好,我直接将你的打好的war包放置到Tomcat下跑然后使用你构建好的模板抓取时没问题的。
但是我在使用eclipse构建的时候,项目成功跑起来了,其他功能没问题。
同样使用你构建好的模板抓取报错

23:36:48[WARN com.gs.spider.gather.commons.ContentLengthLimitHttpClientDownloader] download page http://news.qq.com/a/20160418/023093.htm error
javax.net.ssl.SSLKeyException: RSA premaster secret error
	at sun.security.ssl.RSAClientKeyExchange.<init>(Unknown Source) ~[?:1.8.0_161]
	at sun.security.ssl.ClientHandshaker.serverHelloDone(Unknown Source) ~[?:1.8.0_161]
	at sun.security.ssl.ClientHandshaker.processMessage(Unknown Source) ~[?:1.8.0_161]
	at sun.security.ssl.Handshaker.processLoop(Unknown Source) ~[?:1.8.0_161]
	at sun.security.ssl.Handshaker.process_record(Unknown Source) ~[?:1.8.0_161]
	at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source) ~[?:1.8.0_161]
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source) ~[?:1.8.0_161]
	at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source) ~[?:1.8.0_161]
	at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source) ~[?:1.8.0_161]
	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[httpclient-4.5.2.jar:4.5.2]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) ~[httpclient-4.5.2.jar:4.5.2]
	at com.gs.spider.gather.commons.ContentLengthLimitHttpClientDownloader.download(ContentLengthLimitHttpClientDownloader.java:112) [classes/:?]
	at us.codecraft.webmagic.Spider.processRequest(Spider.java:404) [webmagic-core-0.6.0.jar:?]
	at us.codecraft.webmagic.Spider$1.run(Spider.java:321) [webmagic-core-0.6.0.jar:?]
	at us.codecraft.webmagic.thread.CountableThreadPool$1.run(CountableThreadPool.java:74) [webmagic-core-0.6.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_161]
	at java.lang.Thread.run(Unknown Source) [?:1.8.0_161]
Caused by: java.security.NoSuchAlgorithmException: SunTls12RsaPremasterSecret KeyGenerator not available
	at javax.crypto.KeyGenerator.<init>(KeyGenerator.java:169) ~[?:1.8.0_171]
	at javax.crypto.KeyGenerator.getInstance(KeyGenerator.java:223) ~[?:1.8.0_171]
	at sun.security.ssl.JsseJce.getKeyGenerator(Unknown Source) ~[?:1.8.0_161]
	... 28 more

请问可能是什么问题呢,jdk1.8,tomcat8.5

defaultCategory字段无法使用

"categoryReg": "",
"categoryXPath": "",
"defaultCategory": "体育",
//之前以为前2项不指定,就会默认分类为"体育"。试了下不是这样。如上指定规则,最后得到的category是空

运行项目报异常

org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'commonSpider' defined in URL [file:/C:/Java/IdeaProjects/GatherPlatform/spider-master/target/spider/WEB-INF/classes/mvc-dispatcher-servlet.xml]: Cannot resolve reference to bean 'commonWebpageDAO' while setting bean property 'commonWebpageDAO'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'commonWebpageDAO' defined in file [C:\Java\IdeaProjects\GatherPlatform\spider-master\target\spider\WEB-INF\classes\com\gs\spider\dao\CommonWebpageDAO.class]: Bean instantiation via constructor failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.gs.spider.dao.CommonWebpageDAO]: Constructor threw exception; nested exception is NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{y3YLXLvBRXGPiHARqVc2qQ}{127.0.0.1}{127.0.0.1:9300}]]

每次更新已有模板,都会创建一个新的模板

每次更新已有模板,都会新建一个新的模板ID,这是故意所为?
对于修改还是不太友好啊,特别是测试阶段可能会保存很多次。
另外,已经跑通了,分分钟抓一个站,非常赞!

模板保存失败

从代码库里面导入了sample模板,点击存储该模板还是失败,提示
请重试java.lang.NullPointerException:null

极简模式部署后,访问页面报错

感谢提供这个工具,我在百度下载的build好的版本spider.war,直接放tomcat下启动,访问页面报错如下,是还需要什么配置么?

java.lang.IllegalArgumentException: Non-positive period.
	java.util.Timer.schedule(Unknown Source)
	com.gs.spider.gather.commons.CommonSpider.<init>(CommonSpider.java:364)
	sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
	sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
	java.lang.reflect.Constructor.newInstance(Unknown Source)

存储模板时NPE

在编辑模板页面编辑好后,点保存总是会出现/panel/commons/editSpiderInfo,每次都必现。
image
image

java.lang.NullPointerException
	at com.gs.spider.dao.SpiderInfoDAO.getByDomain(SpiderInfoDAO.java:126)
	at com.gs.spider.dao.SpiderInfoDAO.index(SpiderInfoDAO.java:51)
	at com.gs.spider.service.commons.spiderinfo.SpiderInfoService.lambda$index$2(SpiderInfoService.java:58)
	at com.gs.spider.model.utils.ResultBundleBuilder.bundle(ResultBundleBuilder.java:25)
	at com.gs.spider.service.commons.spiderinfo.SpiderInfoService.index(SpiderInfoService.java:58)
	at com.gs.spider.controller.commons.spiderinfo.SpiderInfoController.save(SpiderInfoController.java:87)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.