Git Product home page Git Product logo

kangvcar / infospider Goto Github PK

View Code? Open in Web Editor NEW
7.4K 7.4K 1.5K 41.33 MB

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、**移动、**联通、**电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源**博客、简书。

Home Page: https://infospider.vercel.app

License: GNU General Public License v3.0

Python 66.67% Shell 0.06% Jupyter Notebook 2.73% HTML 14.20% CSS 0.22% JavaScript 16.12%
automation chrome crawl csdn hotmail outlook python3 selenium spider tkinter wxpython

infospider's Issues

ERROR: Command errored out with exit status 1

pip install -r requirements.txt
输出:

ERROR: Command errored out with exit status 1: /usr/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-pf5_kd92/wxpython/setup.py'"'"'; __file__='"'"'/tmp/pip-install-pf5_kd92/wxpython/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-gjw9u541/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/tz/.local/include/python3.8/wxPython Check the logs for full command output.

【更新建议】可以支持人人网吗

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

挺不错的,但是为什么要用tk作为gui界面, 我很奇怪。

刚在公众号推送看到你的开源, 确实很不错, 很全也挺美观的, 但是作为一个开发来说,我觉得这个工具作用没那么高, 但是对应你的标题,收集你自己的个性信息,又非常合情合理了, 另外其实没必要用tkinter作为GUI界面的创造,tk有的时候会崩溃的,不是很好使,最好建议是开发一个web端口, 那就很好了,good !

GITHUB

兄弟能看看有没有办法让邮箱服务器走代理。主要是需要寻找一个可以放Linux全局走代理的方法就行,然后通过命令行切换IP或许是其他邮箱服务器切换IP的思路? 能解决会有报酬 有解决办法的开发请联系 Q3374835496 邮箱 [email protected] skype live:.cid.b409052f6258136f

Brother can see if there is a way to make the mailbox server go proxy. Need to find a way to put Linux global go proxy on the line, and then switch IP through the command line may be other mailbox server switch IP ideas? Can solve will be paid, there is a solution to the development of please contact Q3374835496 email 3374835496@qq. Com Skype Live: . Cid. B409052F6258136F

拼多多有计划支持么?

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

kobicoin.com

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

taobao 爬虫好像不成功 taobao_cookies.json需要更换吗

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

这个不犯法吗?

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

我在使用中出现了这个报错

Traceback (most recent call last):
  File "D:\Working\Codes\InfoSpider\tools\main.py", line 34, in <module>
    from alipay.main import ASpider
ModuleNotFoundError: No module named 'alipay.main'

淘宝和支付宝网站支持的不好,抛出异常

Bug Report

Description: [Description of the issue]
Traceback (most recent call last):
File "main.py", line 520, in OnClick
t = TaobaoSpider(cookie_list)
File "E:\my_work_spaces\pycharm\Self_learn_projs\Crawler_projs\InfoSpider-master./Spiders\taobao\spider.py", line 65, in init
self.path = askdirectory(title='选择信息保存文件夹')
File "G:\py37\lib\tkinter\filedialog.py", line 428, in askdirectory
return Directory(**options).show()
File "G:\py37\lib\tkinter\commondialog.py", line 39, in show
w = Frame(self.master)
File "G:\py37\lib\tkinter_init_.py", line 2744, in init
Widget.init(self, master, 'frame', cnf, {}, extra)
File "G:\py37\lib\tkinter_init_.py", line 2299, in init
(widgetName, self._w) + extra + self._options(cnf))
RuntimeError: main thread is not in main loop

安装依赖

报错:

Using legacy 'setup.py install' for lxml, since package 'wheel' is not installed.
Installing collected packages: lxml, pyquery, certifi, chardet, idna, requests, Pillow, wxPython, pytz, pandas, future, pypng, pyqrcode, itchat, wxpy, soupsieve, beautifulsoup4
Running setup.py install for lxml ... error
ERROR: Command errored out with exit status 1:
command: 'd:\soft\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Administrator\AppData\Local\Temp\pip-install-grl1th1g\lxml\setup.py'"'"'; file='"'"'C:\Users\Administrator\AppData\Local\Temp\pip-install-grl1th1g\lxml\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Administrator\AppData\Local\Temp\pip-record-ohs8ihq1\install-record.txt' --single-version-externally-managed --compile --install-headers 'd:\soft\python\python38\Include\lxml'
cwd: C:\Users\Administrator\AppData\Local\Temp\pip-install-grl1th1g\lxml
Complete output (77 lines):
Building lxml version 4.3.3.
Building without Cython.
ERROR: b"'xslt-config' \xb2\xbb\xca\xc7\xc4\xda\xb2\xbf\xbb\xf2\xcd\xe2\xb2\xbf\xc3\xfc\xc1\xee\xa3\xac\xd2\xb2\xb2\xbb\xca\xc7\xbf\xc9\xd4\xcb\xd0\xd0\xb5\xc4\xb3\xcc\xd0\xf2\r\n\xbb\xf2\xc5\xfa\xb4\xa6\xc0\xed\xce\xc4\xbc\xfe\xa1\xa3\r\n"
** make sure the development packages of libxml2 and libxslt are installed **

Using build configuration of libxslt
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.8
creating build\lib.win-amd64-3.8\lxml
copying src\lxml\builder.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\cssselect.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\doctestcompare.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\ElementInclude.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\pyclasslookup.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\sax.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\usedoctest.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\_elementpath.py -> build\lib.win-amd64-3.8\lxml
copying src\lxml\__init__.py -> build\lib.win-amd64-3.8\lxml
creating build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\__init__.py -> build\lib.win-amd64-3.8\lxml\includes
creating build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\builder.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\clean.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\defs.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\diff.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\ElementSoup.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\formfill.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\html5parser.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\soupparser.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\usedoctest.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\_diffcommand.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\_html5builder.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\_setmixin.py -> build\lib.win-amd64-3.8\lxml\html
copying src\lxml\html\__init__.py -> build\lib.win-amd64-3.8\lxml\html
creating build\lib.win-amd64-3.8\lxml\isoschematron
copying src\lxml\isoschematron\__init__.py -> build\lib.win-amd64-3.8\lxml\isoschematron
copying src\lxml\etree.h -> build\lib.win-amd64-3.8\lxml
copying src\lxml\etree_api.h -> build\lib.win-amd64-3.8\lxml
copying src\lxml\lxml.etree.h -> build\lib.win-amd64-3.8\lxml
copying src\lxml\lxml.etree_api.h -> build\lib.win-amd64-3.8\lxml
copying src\lxml\includes\c14n.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\config.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\dtdvalid.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\etreepublic.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\htmlparser.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\relaxng.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\schematron.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\tree.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\uri.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xinclude.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xmlerror.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xmlparser.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xmlschema.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xpath.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\xslt.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\__init__.pxd -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\etree_defs.h -> build\lib.win-amd64-3.8\lxml\includes
copying src\lxml\includes\lxml-version.h -> build\lib.win-amd64-3.8\lxml\includes
creating build\lib.win-amd64-3.8\lxml\isoschematron\resources
creating build\lib.win-amd64-3.8\lxml\isoschematron\resources\rng
copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\rng
creating build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl
copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl
copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl
creating build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstract_expand.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_include.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematron_message.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematron_skeleton_for_xslt1.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_for_xslt1.xsl -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt -> build\lib.win-amd64-3.8\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
running build_ext
building 'lxml.etree' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/
----------------------------------------

ERROR: Command errored out with exit status 1: 'd:\soft\python\python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Administrator\AppData\Local\Temp\pip-install-grl1th1g\lxml\setup.py'"'"'; file='"'"'C:\Users\Administrator\AppData\Local\Temp\pip-install-grl1th1g\lxml\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:\Users\Administrator\AppData\Local\Temp\pip-record-ohs8ihq1\install-record.txt' --single-version-externally-managed --compile --install-headers 'd:\soft\python\python38\Include\lxml' Check the logs for full command output.

知乎提示 请升级客户端后重试

Bug Report

Description: [Description of the issue]

{"id":"c9b28ce4b50bf0444d17d010224cb06f","url_token":"houziliaorenwu","name":"猴子","use_default_avatar":false,"avatar_url":"https://pic1.zhimg.com/v2-12ef91a3f1e91e70bd3480d755e058b1_l.jpg?source=32738c0c","avatar_url_template":"https://picx.zhimg.com/v2-12ef91a3f1e91e70bd3480d755e058b1.jpg?source=32738c0c","is_org":false,"type":"people","url":"https://www.zhihu.com/api/v4/people/houziliaorenwu","user_type":"people","headline":"公中号(猴子数据分析)著有畅销书《数据分析思维》 科普**专家","headline_render":"公中号(猴子数据分析)著有畅销书《数据分析思维》科普**专家","gender":1,"is_advertiser":false,"ip_info":"IP 属地北京","vip_info":{"is_vip":true,"vip_type":1,"rename_days":"60","widget":{"id":"13017","url":"https://pic1.zhimg.com/v2-06ff79935442c7b0b2de8bde3529de2a.jpg?source=88ceefae","night_mode_url":"https://pic1.zhimg.com/v2-7cb817a30db30272a00bc17450a2ea79.jpg?source=88ceefae"},"entrance_v2":null,"rename_frequency":3,"rename_await_days":0},"available_medals_count":0,"is_realname":true,"has_applying_column":false}

{
    "error": {
        "code": 10002,
        "message": "10002:\u8bf7\u6c42\u53c2\u6570\u5f02\u5e38\uff0c\u8bf7\u5347\u7ea7\u5ba2\u6237\u7aef\u540e\u91cd\u8bd5"
    }
}

{
    "error": {
        "code": 10002,
        "message": "10002:\u8bf7\u6c42\u53c2\u6570\u5f02\u5e38\uff0c\u8bf7\u5347\u7ea7\u5ba2\u6237\u7aef\u540e\u91cd\u8bd5"
    }
}

{
    "error": {
        "code": 10002,
        "message": "10002:\u8bf7\u6c42\u53c2\u6570\u5f02\u5e38\uff0c\u8bf7\u5347\u7ea7\u5ba2\u6237\u7aef\u540e\u91cd\u8bd5"
    }
}

<html><title>404: Not Found</title><body>404: Not Found</body></html>
{"error":{"message":"请求参数异常,请升级客户端后重试","code":10003}}

{"data": []}

安装依赖时报错,临时解决办法

安装第一个依赖时报错:UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 2621: illegal multibyte sequence
解决办法,替换版本
matplotlib==3.2.0 为 matplotlib==3.6.0

应该是安装numpy的时候提示要安装c++ 14以上的
解决方法,先安装conda 然后
conda install libpython m2w64-toolchain -c msys2

期待macos版本

这个想法非常不错。能不能拓展一下关于关键词的信息搜索与归纳的功能。
希望早点支持Macos版本,与我同样期待的人应该不少。

GITHUB

您好我有其他项目需要咨询 麻烦请加我的Q 3374835496 或许SKPYE live:.cid.b409052f6258136f
Hello, I have other projects to consult, please add my Q 3374835496 or SKPYE live:.cid.b409052f6258136f

在安装依赖时报错了

lib-3.2.0-cp38-cp38-win_amd64.whl
Downloading matplotlib-3.2.0-cp38-cp38-win_amd64.whl (9.2 MB)
|██████████▌ | 3.0 MB 4.7 kB/s eta 0:22:02ER
ROR: Exception:
Traceback (most recent call last):
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\urllib3\response.py", line 437, in _error_catcher
yield
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\urllib3\response.py", line 519, in read
data = self._fp.read(amt) if not fp_closed else b""
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\cachecontrol\filewrapper.py", line 62, in read
data = self.__fp.read(amt)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\http\c
lient.py", line 454, in read
n = self.readinto(b)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\http\c
lient.py", line 498, in readinto
n = self.fp.readinto(b)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\socket
.py", line 669, in readinto
return self._sock.recv_into(b)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\ssl.py
", line 1241, in recv_into
return self.read(nbytes, buffer)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\ssl.py
", line 1099, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\cli\base_command.py", line 228, in _main
status = self.run(options, args)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\cli\req_command.py", line 182, in wrapper
return func(self, options, args)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\commands\install.py", line 323, in run
requirement_set = resolver.resolve(
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\resolution\legacy\resolver.py", line 183, in resolve
discovered_reqs.extend(self._resolve_one(requirement_set, req))
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\resolution\legacy\resolver.py", line 388, in _resolve_one
abstract_dist = self._get_abstract_dist_for(req_to_install)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\resolution\legacy\resolver.py", line 340, in _get_abstract
_dist_for
abstract_dist = self.preparer.prepare_linked_requirement(req)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\operations\prepare.py", line 467, in prepare_linked_requir
ement
local_file = unpack_url(
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\operations\prepare.py", line 255, in unpack_url
file = get_http_url(
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\operations\prepare.py", line 129, in get_http_url
from_path, content_type = _download_http_url(
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\operations\prepare.py", line 282, in _download_http_url
for chunk in download.chunks:
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\cli\progress_bars.py", line 168, in iter
for x in it:
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_internal\network\utils.py", line 64, in response_chunks
for chunk in response.raw.stream(
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\urllib3\response.py", line 576, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\urllib3\response.py", line 541, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\contex
tlib.py", line 131, in exit
self.gen.throw(type, value, traceback)
File "c:\users\administrator\appdata\local\programs\python\python38\lib\site-p
ackages\pip_vendor\urllib3\response.py", line 442, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files
.pythonhosted.org', port=443): Read timed out.

Mac系统是不是不支持?

2020-08-27 11:04:51.534 Python[1657:25291] -[wxNSApplication _setup:]: unrecognized selector sent to instance 0x7fae66c372a0

No module named 'Spiders'

运行main.py报错:
Traceback (most recent call last):
File "main.py", line 32, in
from Spiders.A12306 import main12306
ModuleNotFoundError: No module named 'Spiders'

在print(BASE_PATH)后增加一行sys.path.append(BASE_PATH)就能运行了

关于简书爬虫

如果作者开发一个从特定文章获取数据的功能,也许会提升运行效率。

看了目前的爬虫代码,是从个人主页获取的,但是文章中获取好像有点难,开发工具里找不到对应的网络请求。

要爬的字段主要是这几个:

  • 简书钻
  • 阅读量
  • 发布时间
  • 点赞量
  • 评论量

后两个已经可以解决了,前三个可以在 Html 中找到,但直接 Get 获取不到,看网络请求发现没有,应该是 JS 发起请求再填充进去的,但我没有 JS 开发能力,没办法解析代码。

初步定位到请求应该来自 _app.js 这个文件,不知道具体怎么发起的,居然可以隐藏网络请求。

最后,我自己有个简书爬虫库,主页的 JianshuResearchTools 就是,也用的 Requests 和 BeautifulSoup4,可以参考一下,如果能提几个 PR 更好。

感谢开发大大。

我用pip安装的时候,报这个错误

value:InfoSpider:% pip install -r requirements.txt                     <master>
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: Could not find a version that satisfies the requirement matplotlib==3.2.0 (from -r requirements.txt (line 1)) (from versions: 0.86, 0.86.1, 0.86.2, 0.91.0, 0.91.1, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.4.0, 1.4.1rc1, 1.4.1, 1.4.2, 1.4.3, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 2.0.0b1, 2.0.0b2, 2.0.0b3, 2.0.0b4, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.2.0rc1, 2.2.0, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 3.0.0rc2, 3.0.0, 3.0.1, 3.0.2, 3.0.3)
ERROR: No matching distribution found for matplotlib==3.2.0 (from -r requirements.txt (line 1))
value:InfoSpider:%                                                       <master>

没有微博数据源吗?

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

请求商务推广合作

作者您好,我们也是一家专业做IP代理的服务商,极速HTTP,我们注册认证会送10000IP(可以帮助您的学者适当薅羊毛试用 :) 。想跟您谈谈是否能够达成商业推广上的合作。如果您,有意愿的话,可以联系我,微信:13982004324 谢谢(如果没有意愿的话,抱歉,打扰了)

没有微博数据源吗

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

how to get all user facebook id

Bug Report

Description: [Description of the issue]

Expected behavior: [What should happen]

Current behavior: [What happpens instead of the expected behavior]

Steps to Reproduce:

  1. [First Step]
  2. [Second Step]
  3. [and so on ¡­]

Reproduce how often: [What percentage of the time does it reproduce?]

Possible solution: [Not obligatory, but suggest a fix/reason for the bug]

Context (Environment):[The code version, python version, operating system or other software/libs you use]

Additional Information

[Any other useful information about the problem].

爬取失败

systeminfo:
image

C:\Users\stsg0>python -V
Python 3.7.9

C:\Users\stsg0>pip -V
pip 20.2.3 from c:\users\stsg0\appdata\local\programs\python\python37\lib\site-packages\pip (python 3.7)

1.点击QQ邮箱,没有弹出输入框,右下角直接提示爬取失败
image
2.点击网易邮箱,控制台报错
image

chromeDriver已经启动了
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.