shengqiangzhang / examples-of-web-crawlers Goto Github PK
View Code? Open in Web Editor NEW一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
License: MIT License
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
License: MIT License
哎呀,出错了,点击刷新再来一次(error:e3Euv),大神有遇到类似情况吗?求解答
按照readme.md要求,卸载依赖再安装,运行后扫码报错
正在获取微信好友数据信息,请耐心等待……
微信好友数据信息获取完毕
正在分析你的群聊,请耐心等待……
Traceback (most recent call last):
File "d:/code/8/generate_wx_data.py", line 563, in <module>
group_common_in()
File "d:/code/8/generate_wx_data.py", line 526, in group_common_in
bar = Bar('共同所在群聊分析')
File "C:\Users\youyim\AppData\Local\Programs\Python\Python36\lib\site-packages\pyecharts\charts\chart.py", line 148, in __init__
super().__init__(init_opts=init_opts)
File "C:\Users\youyim\AppData\Local\Programs\Python\Python36\lib\site-packages\pyecharts\charts\chart.py", line 14, in __init__
super().__init__(init_opts=init_opts)
File "C:\Users\youyim\AppData\Local\Programs\Python\Python36\lib\site-packages\pyecharts\charts\base.py", line 28, in __init__
self.width = _opts.get("width", "900px")
AttributeError: 'str' object has no attribute 'get'
淘宝登录里面有weibo的css 样式标签,好像有些东西没有完全改过来,
python taobao_login.py
Traceback (most recent call last):
File "taobao_login.py", line 72, in
a.login() #登录
File "taobao_login.py", line 32, in login
self.browser.find_element_by_xpath('//*[@Class="forget-pwd J_Quick2Static"]').click()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 80, in click
self._execute(Command.CLICK_ELEMENT)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webelement.py", line 633, in _execute
return self._parent.execute(command, params)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: element not interactable
(Session info: chrome=72.0.3626.119)
(Driver info: chromedriver=72.0.3626.69 (3c16f8a135abc0d4da2dff33804db79b849a7c38),platform=Linux 4.15.0-46-generic x86_64)
您好,我跳出滑动验证码的时候老是过不去页面上显示“哎呀,出错了,点击刷新再来一次(error:NgaRgk)”,然后程序显示‘selenium.common.exceptions.TimeoutException: Message:’,请问大佬该怎么破。。
Log如下:
包邮威德博威No.5羽毛球五号场馆训练比赛12只装耐打稳定室内高手 月成交68评价732旺旺在线 55.00 //detail.tmall.com/item.htm?id=13074420768&skuId=22276410079&areaId=310100&user_id=748152180&cat_id=50043727&is_b=1&rn=aa934dd095dab511c3ed3f2cde0da7b6
get button failed: Message: move target out of bounds
(Session info: chrome=76.0.3809.87)
Traceback (most recent call last):
File "C:/Users/Daoling/Downloads/Tmall.py", line 216, in
a.crawl_good_data() # 爬取天猫商品数据
File "C:/Users/Daoling/Downloads/Tmall.py", line 150, in crawl_good_data
EC.presence_of_element_located((By.CSS_SELECTOR, '#J_ItemList > div.product > div.product-iWrap')))
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
报错截图:
正在获取微信好友数据信息,请耐心等待……
微信好友数据信息获取完毕
正在分析你的群聊,请耐心等待……
Traceback (most recent call last):
File "getUrl.py", line 532, in
group_common_in()
File "getUrl.py", line 498, in group_common_in
bar = Bar('共同所在群聊分析')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyecharts-1.2.1-py3.7.egg/pyecharts/charts/chart.py", line 143, in init
super().init(init_opts=init_opts)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyecharts-1.2.1-py3.7.egg/pyecharts/charts/chart.py", line 15, in init
super().init(init_opts=init_opts)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyecharts-1.2.1-py3.7.egg/pyecharts/charts/base.py", line 29, in init
self.width = _opts.get("width")
AttributeError: 'str' object has no attribute 'get'
Line 94, map.add("", provice. Should it be province?
使用手册能不能再详细点,如何在Linux运行
天猫 做滑块验证的时候,直接使用move_by_offset 函数 直接一拖到底 貌似会被检测到非人工
使用轨迹 循环拖动 会很慢,也会被检测到异常,这个问题该怎么办呢
您好,我跳出滑动验证码的时候老是过不去页面上显示“哎呀,出错了,点击刷新再来一次(error:NgaRgk)”,然后程序显示‘selenium.common.exceptions.TimeoutException: Message:’,请问大佬该怎么破。。
我这边测试目前是正常的哦,如果不正常的话,你手动登录网页滑动一次,下次一般就不会提示了。
Originally posted by @shengqiangzhang in #4 (comment)
楼主,为什么我这边只要是用了selenium打开网页,手动登录都无法登录的。即便添加了开发者模式、、、、这个问题已经困扰我好久了。
Traceback (most recent call last):
File "generate_wx_data.py", line 617, in
File "generate_wx_data.py", line 317, in merge_head_image
File "site-packages\PIL\Image.py", line 1817, in resize
File "site-packages\PIL\ImageFile.py", line 239, in load
OSError: image file is truncated (0 bytes not processed)
[8144] Failed to execute script generate_wx_data
Traceback (most recent call last):
File "configparser.py", line 1138, in _unify_values
KeyError: 'configuration'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "say_to_lady.py", line 175, in
File "configparser.py", line 781, in get
File "configparser.py", line 1141, in _unify_values
configparser.NoSectionError: No section: 'configuration'
[1812] Failed to execute script say_to_lady
我也想自己修改代码做一个,但心书的页面直接调用系统打印PDF图片全部是空白(非懒加载),看了一下源码好像图片全部是调用的CSS,不知道应该怎么处理。
生成微信报告提示:ImportError: cannot import name 'Pie'
安装Pie模块提示:ERROR: Could not find a version that satisfies the requirement pie (from versions: none)
ERROR: No matching distribution found for pie
Log如下:
包邮威德博威No.5羽毛球五号场馆训练比赛12只装耐打稳定室内高手 月成交68评价732旺旺在线 55.00 //detail.tmall.com/item.htm?id=13074420768&skuId=22276410079&areaId=310100&user_id=748152180&cat_id=50043727&is_b=1&rn=aa934dd095dab511c3ed3f2cde0da7b6
get button failed: Message: move target out of bounds
(Session info: chrome=76.0.3809.87)
Traceback (most recent call last):
File "C:/Users/Daoling/Downloads/Tmall.py", line 216, in
a.crawl_good_data() # 爬取天猫商品数据
File "C:/Users/Daoling/Downloads/Tmall.py", line 150, in crawl_good_data
EC.presence_of_element_located((By.CSS_SELECTOR, '#J_ItemList > div.product > div.product-iWrap')))
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
报错截图:
我用你的方法试了,也在网上查了其他方法。只要浏览器是用selenium打开的,修改配置文件和改为开发者模式都会被识别。如果是用账号登录就有过不去的滑块,手动拉都不行。用微博登录就会看不见验证码,又必须要输入。
每天不同时间段通过微信发消息提醒女友,运行命令是什么呀,应该不会随便找个目录创建ini配置文件就运行了吧
使用xvfb也不可以?不知道你们是怎么部署到服务器的
模块已经引入了还是不行
我用天猫商品数据爬虫时,只能爬取到第一页的数据,然后命令行中显示 NOT IMPLEMENTED,之后就无法再继续爬取数据
当搜索到的商品不满一页时会出错,在开始爬取信息前,您应加上这些代码。
err1 = self.browser.find_element_by_xpath("//*[@id='content']/div/div[2]").text
err1 = err1[:5]
if(err1 == "喵~没找到"):
print("出错了");
return
try:
self.browser.find_element_by_xpath("//*[@id='J_ComboRec']/div[1]")
err2 = self.browser.find_element_by_xpath("//*[@id='J_ComboRec']/div[1]").text
#print(stt)
err2 = err2[:5]
if(err2 == "我们还为您"):
print("出错了")
return
wxpy的包加进去就有问题诶
Getting uuid of QR code.
INFO:itchat:Getting uuid of QR code.
Downloading QR code.
INFO:itchat:Downloading QR code.
Traceback (most recent call last):
File "generate_wx_data.py", line 542, in
bot = Bot(cache_path=True)
File "E:\Program Files\python37\lib\site-packages\wxpy\api\bot.py", line 86, in init
loginCallback=login_callback, exitCallback=logout_callback
File "E:\Program Files\python37\lib\site-packages\itchat\components\register.py", line 30, in auto_login
loginCallback=loginCallback, exitCallback=exitCallback)
File "E:\Program Files\python37\lib\site-packages\itchat\components\login.py", line 44, in login
picDir=picDir, qrCallback=qrCallback)
File "E:\Program Files\python37\lib\site-packages\itchat\components\login.py", line 117, in get_QR
utils.print_qr(picDir)
File "E:\Program Files\python37\lib\site-packages\itchat\utils.py", line 85, in print_qr
os.startfile(fileDir)
OSError: [WinError -2147221003] 找不到应用程序: 'QR.png'
当搜索到的商品不满一页时会出错,在开始爬取信息前,您应加上这些代码。
err1 = self.browser.find_element_by_xpath("//*[@id='content']/div/div[2]").text
err1 = err1[:5]
if(err1 == "喵~没找到"):
print("出错了");
return
try:
self.browser.find_element_by_xpath("//*[@id='J_ComboRec']/div[1]")
err2 = self.browser.find_element_by_xpath("//*[@id='J_ComboRec']/div[1]").text
#print(stt)
err2 = err2[:5]
if(err2 == "我们还为您"):
print("出错了")
return
#content > div > div.ui-page > div > b.ui-page-skip > form > input[type="hidden"]:nth-child(7)
和你当时写的不一样了,现在还有什么方法可以破吗
微信卡在人脸识别第一个不动了,有很长时间。
Exception in thread generate_data:
Traceback (most recent call last):
File "threading.py", line 916, in _bootstrap_inner
File "threading.py", line 864, in run
File "main.py", line 152, in generate_data
TypeError: 'NoneType' object is not subscriptable
能在Gitee(码云)上发布一版吗?太大了,下载总是超时!
如题。
现在看理论课程看的很头疼,换换口味
C:\Users\Administrator>C:\Users\Administrator\Desktop\say_to_lady\say_to_lady.exe
Getting uuid of QR code.
Downloading QR code.
Traceback (most recent call last):
File "say_to_lady.py", line 154, in
File "site-packages\wxpy\api\bot.py", line 86, in init
File "site-packages\itchat\components\register.py", line 35, in auto_login
File "site-packages\itchat\components\login.py", line 44, in login
File "site-packages\itchat\components\login.py", line 117, in get_QR
File "site-packages\itchat\utils.py", line 85, in print_qr
OSError: [WinError -2147221003] 找不到应用程序: 'QR.png'
[3992] Failed to execute script say_to_lady
可执行文件闪退
TX把很多账号的微信网页端登录功能给关了,所以wxpy暂时可能都无法使用了
运行您的淘宝登录代码,利用微博登录的时候显示网络连接超时,可能是被检测到selenium了,我用正常的浏览器手动操作能成功登录,不知道怎么解决
一个小建议:最后不是打开pdf,改成打开文件所在文件夹
老哥,你这个不行啊,淘宝还是会识别出来,加入这个开发者模式,淘宝还是可以识别
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.