itstyren / cnki-download Goto Github PK

View Code? Open in Web Editor NEW

497.0 497.0 149.0 49 KB

:frog: 知网(CNKI)文献下载及文献速览爬虫

License: MIT License

Python 100.00%

cnki-download's People

Contributors

Stargazers

Watchers

Forkers

valerianrc dongxian1 ayueaa xuechaohui jiaoshengha geeklee1998 ppho99 liuxher abelard2008 qaz734913414 fuweiwei1 ttpianobirds hanteng sloan592 yudajiaoye wanyzh asishidairo swz-study superguoger barnett2010 hank0438 jflafan crazyzsshuo liyajiegit fevear2101 bibilii bianduk aidingge muduoyouxin sha-meng marskong guojson caibird01 tiancureme bluerangala dockerwang yinzhaokai shuhuize1130 marknaver irimsky zhongyinhei yuandiyzy eisenrrie nicktcl czcxwe moodn eienyuki dyebasedink toughen-rock t0data 1111aiwan cjhaitman liu-bai-chuan dd-guo michaelcao1997 shitianshiwa alikzz caofeizhen wangyunling32 sxcgc z0ow 591317622a lotushunt skygongque huohuo0413 yingl7 adam-0522 laoli985 bigfeetcrystal lxngoddess5321 shaodidong zzzcolin such027 eggert-burstingstar yautongng serendipity999 leigangblog gaohui965 chenliangiwe jackmartinyjc sundoges wingsgit seinaville yx890115 zhuifeng414 nxgygjp juingzhou zasphalt hulei123456 letualone spartajet shitatsu dreamtalesz trifen zkguchun zhao103804 mwhwe 2449673842 alexning98 antspi

cnki-download's Issues

出现验证码后报错

config内容：

[crawl]
; 爬取及下载开关 0为关闭 1为开启
isDownloadFile = 0
isCrackCode=1
isDetailPage=1
isDownLoadLink=0
stepWaitTime=3

报错内容

正在下载: 高中政治教学中渗透科学精神核心素养路径初探.caj
正在下载: 试论初中语文教学中学生表达能力的培养策略.caj
ERROR:root:出现验证码
Traceback (most recent call last):
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 144, in parse_page
    tr_table.tr.extract()
AttributeError: 'NoneType' object has no attribute 'tr'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 259, in <module>
    if __name__ == '__main__':
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 253, in main
    search = SearchTools()
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 98, in search_reference
    self.parse_page(
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 149, in parse_page
    crack.get_image(self.get_result_url, self.session,
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/CrackVerifyCode.py", line 34, in get_image
    self.current_url = re.search(r'(.*?)#', current_url).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

定位到的有关代码

最后一行

class CrackCode(object):
    def get_image(self, current_url, session, page_source):
        '''
        获取验证码图片
        '''
        self.header = HEADER
        self.session = session
        # 获得验证码图片地址
        imgurl_pattern_compile = re.compile(r'.*?<img src="(.*?)".*?')
        img_url = re.search(imgurl_pattern_compile, page_source).group(1)
        self.current_url = re.search(r'(.*?)#', current_url).group(1)

即使把所有的链接改为 https，仍会爆出下面的错误，如何解决呢？

D:\chromedownloads\CNKI-download-master\CNKI-download-master>python main.py
－－－－－－－－－－－－－－－－－－－－－－－－－－
|　　　　　　　　　　　　　　　　　　　　　　　　　|
|　请选择检索条件：（可多选）　　　　　　　　　　　|
|（ａ）主题　　　（ｂ）关键词　　　（ｃ）篇名　　　|
|（ｄ）摘要　　　（ｅ）全文　　　　（ｆ）被引文献　|
|（ｇ）中图分类号　　　　　　　　　　　　　　　　　|
|　　　　　　　　　　　　　　　　　　　　　　　　　|
－－－－－－－－－－－－－－－－－－－－－－－－－－
请选择（以空格分割，如a c）：a
－－－－－－－－－－－－－－－－－－－－－－－－－－
您选择的是：
主题 |
－－－－－－－－－－－－－－－－－－－－－－－－－－
请输入【主题】：asdf
－－－－－－－－－－－－－－－－－－－－－－－－－－
是否需要规定文献来源（y/n）？y
输入文献来源期刊名称：
正在检索中.....
－－－－－－－－－－－－－－－－－－－－－－－－－－
检索到4条结果，全部下载大约需要00小时00分钟20秒。
是否要全部下载（y/n）?y
正在下载: 中信：决战澳矿.caj
Traceback (most recent call last):
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 839, in _validate_conn
conn.connect()
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connection.py", line 364, in connect
_match_hostname(cert, self.assert_hostname or server_hostname)
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connection.py", line 374, in _match_hostname
match_hostname(cert, asserted_hostname)
File "D:\Users\18301\anaconda3\lib\ssl.py", line 334, in match_hostname
% (hostname, ', '.join(map(repr, dnsnames))))
ssl.SSLCertVerificationError: ("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Users\18301\anaconda3\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\util\retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='i.shufang.cnki.net', port=443): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DJJDK202009024&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3D%25E6%259B%25BE%25E6%2599%25A8%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=199bceef-d913-a550-9ff0-b5614a82b64&action=file&userName=&td=1544605318654 (Caused by SSLError(SSLCertVerificationError("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'")))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 259, in
main()
File "main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "main.py", line 188, in parse_page
self.download_url)
File "D:\chromedownloads\CNKI-download-master\CNKI-download-master\GetPageDetail.py", line 73, in get_detail_page
params=params)
File "D:\Users\18301\anaconda3\lib\site-packages\requests\sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "D:\Users\18301\anaconda3\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "D:\Users\18301\anaconda3\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "D:\Users\18301\anaconda3\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='i.shufang.cnki.net', port=443): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DJJDK202009024&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3D%25E6%259B%25BE%25E6%2599%25A8%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=199bceef-d913-a550-9ff0-b5614a82b64&action=file&userName=&td=1544605318654 (Caused by SSLError(SSLCertVerificationError("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'")))

D:\chromedownloads\CNKI-download-master\CNKI-download-master>

老铁，需要用账号登陆的咋搞啊

在校外，没有公网的网址，需要用用户名和账号登陆才能下载，这个咋搞啊

验证码出问题

采了一百多篇，验证码出问题，提示：
ERROR:root:出现验证码
Traceback (most recent call last):
File "main.py", line 144, in parse_page
tr_table.tr.extract()
AttributeError: 'NoneType' object has no attribute 'tr'

用不用识别码都报错，请帮忙看一下是什么原因？谢谢

请教下载链接的解析方法链接形式 https://kns.cnki.net/kns/download.aspx?filename=WRGMhx2KSxkQxNUQD50cSZXZUlHTv8ma3I2RKlnbwpFMrJXcEpHc5dzUPF3Z1BneZFHNGhEdCdFUnJkRzh3ayU1dE9WSiZ2KQxUbGdETQl1KSp1dw40b1JWcpV3cxAzYqFGaydmNQlmSDlXNsRkcQZEZrZVTul2N&tablename=CJFDLAST2018

实现了学校ip的知网登录但下载文献需要验证码（每一篇都要），真实的浏览器（selenium驱动浏览器也每篇都要验证码）请求可以直接下载到文献，是少量什么参数还是什么？
看了下CNKI-download的文献下载部分只是简单的get请求加了headers是一个404

import requests
headers = {
        'Connection': 'keep-alive',
        'Cache-Control': 'max-age=0',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'Accept-Language': 'zh-CN,zh;q=0.9,en-GB;q=0.8,en;q=0.7',
        # 'Cookie': 'SID=020197; Ecp_LoginStuts={"IsAutoLogin":false,"UserName":"DX0434","ShowName":"%e6%b5%99%e6%b1%9f%e7%90%86%e5%b7%a5%e5%a4%a7%e5%ad%a6","UserType":"bk","BUserName":"","BShowName":"","BUserType":"","r":"0rHTHE"}; c_m_LinID=LinID=WEEvREcwSlJHSldRa1FhcEFLUmVicE1SUFRzQTZEZW5Va0VWYitsa2NPMD0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!&ot=06/19/2020 13:54:08; LID=WEEvREcwSlJHSldRa1FhcEFLUmVicE1SUFRzQTZEZW5Va0VWYitsa2NPMD0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!; c_m_expire=2020-06-19 13:54:08; Ecp_session=1; ASP.NET_SessionId=vughxubnlqvnxrf0vtd0brwz; Ecp_ClientId=5200619133401915832'
    }
    session = requests.Session()
    session.headers.update(headers)
    # ip 登录
    r = session.get(
        'https://login.cnki.net/TopLogin/api/loginapi/IpLoginFlush')
    r.encoding = r.apparent_encoding
    # print(r.text)
    res = session.get('https://kns.cnki.net/kns/download.aspx?filename=WRGMhx2KSxkQxNUQD50cSZXZUlHTv8ma3I2RKlnbwpFMrJXcEpHc5dzUPF3Z1BneZFHNGhEdCdFUnJkRzh3ayU1dE9WSiZ2KQxUbGdETQl1KSp1dw40b1JWcpV3cxAzYqFGaydmNQlmSDlXNsRkcQZEZrZVTul2N&tablename=CJFDLAST2018')
    res.encoding = res.apparent_encoding
    # print(res.headers)
    print(res.text)

output

</head>
<body>
    <div class="c_verify-box">
        <form method="post" onsubmit="return validate();">
            <h3 class="title">安全验证</h3>
            <p class="c_verify-desc">您当前的IP为：183.134.192.27，您的操作过于频繁，为保障帐
户的正常使用，请输入验证码：</p>
            <dl class="c_verify-code">
                <dt><img id="vImg" src="/kdoc/request/ValidateCode.ashx?t=1577242936454" alt="验证码" title="点击切换验证码"></dt>
                <dd>
                    <p class="tips" id="tips"></p>
                    <input type="password" id="vcode" name="vcode" maxlength="4"><button class="c_btn" type="submit">提交</button>
                </dd>
            </dl>
        </form>
    </div>

</body>
</html>

验证码获取失败，怎么破

!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdAAIdIRdFWpKA4BYUdQL6KgtZbgvJz/EQj5rQumcYDL2xlhbi/alS8mbfX8EEY1efJIm0syhzU+O3mZ5ahqVI454K" />

        $(document).ready(function () {

            qkInfoCall();
            setAuShow();
            // GetHeat();
           
            window.parent.HideWaitDiv();
            SetFrameHeight();
            isHasAddFav();

            try{
                parent.window.adsContainer.loadAds(parent.document.getElementById("txt_1_value1").value);
            }
            catch(e){}
        });

    </script>
    <table border=0  ><tr><td>记录集失效</td></tr></table>

    <input name="tpagemode" type="hidden" id="tpagemode" value="L" />
</form>
<script type="text/javascript">
try{}catch(e){}
    ChangeDownloadImg();
    RevertUserSelect();
    briefTableListJSEvent();
    BindOnlick_ShowWait();
    BindTurnPage_TitleTip();
    parent.$("#zyzklist").hide();
    //外文推荐
    // recWWJDAddToTable();20170921,增加中英文混合检索，不需要再加载外文推荐
    var analysisURL = "/KVisual/ArticleAnalysis/index";
    modifySql();
    function modifySql() {
        var param = "";
        if (param == null || param == "") {
            var obj = parent.document.getElementById("sql");
            if (obj) {
                obj.value = "2827E4B6502D8710744CC7825A00F3F82BAB6FF9F49C28A8C06DBD3C5D73A36E7A6B95F18DA4019E021F3F1691F6B0A03B99C056E48A0254F8D0AFE1AAB57A9BFDCBBFFEEAAFA080E188818637CF6AADB3910F9CB0D5384C288BBBD10EE5B756BFAE86E762F5587544067EFCB6335F1551B1752FED7007F848A2F65F6361E4969CA97A467AE7DFF1D65FA2333691AE914B807EA865F98F2B4DA7F0E5B53CEDD31A34E99814BE79036EC7A23B28568767B543605EEB42085FF85A2AC02FD02AE188F025E7ADBB5D5456124701C643F785C0E8F466CEE182F0A51495CB44F3F039D6E5D62B005E08337F47C8371201A0DFFCD7B64073A1CD0D600811A47AC221B26485DE690B81866288498CD8DECB643D5A64546FA6FF6D41267ACBE6078EC4D35DF08B166A076AEAA5A7E0C875747A661813A88146D8A0137BBB953F17A54818672367305E80A265A56051CB57C24AD39C2D00E0684CDFA37DD96554F37EE38FD19E0CD5CE82D88DA5FAE4A2031AE3E919BB498FF0449A5F52A7D842DC60B2BD843E9B9509F4BC42505450294895655B83E5650C9144C860DA8E88EE4C6B08E27624BDE654E1FC7AF299653113BC029D0992FBF45C30DBB551D112D5C03A389CD1052A01C8786C738A9F5DF0C441D49E11AFA9584FF3A277196FFB1CA6BDC1A25E6772206DF8EFC2D5447DFAB86DBAA1613C34E184FC2B7B55377B7884B29AECB4936D0C467D89B9E4E9369F64918AEBD8384D5D249B77A9B49004D8D15D3A7ED0C89DFFE7113205E7BB1299D4FC6B0DA8ACB80F7FAC8108D4A4E64B60670662A952D1BE0AE397082DA211E56C8C828AA8E92C268A3FBC1B6198341E104077130909FA61E6683103C1254083F67147DEBD755F6092E3F90395E0CA27CAB3B84317BB47FA03DB85EEBC5B615F588F9DD26A526A277A46AD88D604D532A35D63E94F900E98D9D0C37B0A7BEC09EDDB1D89099BCBF1F2A3A8E1653D4EDD15965D90A79F1A31D6B9BF54835DF333410FFD5BA72C9A8D7B57E62F44302072FFE974835BDE3FE5299B779AF41A80BD39D540926EDC484B56409B2C66FFC44338DD0F61DF4706323FF89C933DADC03DC5BE11F75426D85B473DCFAE42917F52A585ADD81ED18A1EA75F13D4F70F5E8EA50D223A9342048E7986AECE95607D7476F386A9%";
        }
    }
}
//绑定分析
$("#analysisBox").hoverDelay({
    hoverDuring: 200,
    outDuring: 0,
    hoverEvent: function () {
        var $this = $(this);
        //显示数字
        var fileNameS = new FileNameS();
        var pcnt = fileNameS.Count();
        var rcnt = 643822;
        var ptext = pcnt > -10 ? "<span>(" + pcnt + ")</span>" : "";//始终为真
        var rtext = rcnt > -10 ? "<span>(" + rcnt + ")</span>" : "";
        $this.find("a").eq(0).html("已选文献分析" + ptext);
        //$this.find("a").eq(1).html("检索结果");
        $this.find(".imiSelDp").show();
    },
    outEvent: function () {
        $(this).find(".imiSelDp").hide();
    }
});
    //排序方式缓存  add by LH 2017-7-26
$("#J_ORDER .Btn5 a").click(function() {
    SetSortTypeCookie(this);
});
function SetSortTypeCookie(elm) {
    var sorttype = GetQueryStringByName($(elm).attr("href"), "sorttype");
    var Days = 7;
    var exp = new Date();
    exp.setTime(exp.getTime() + Days * 24 * 60 * 60 * 1000);
    var dbcode = GetQueryStringByName(window.location.href, "dbPrefix");
    document.cookie = "KNS_SortType" + "=" + escape(dbcode+"!"+sorttype) + ";expires=" + exp.toGMTString() + ";path=/";
}
window.document.onclick = parent.OnclickForHideMoredo;

</script>
<style type="text/css">
    .fly
       ingAdd
    { left: -100px;
        top: 0px;
        position: absolute;
        width: 50px;
        text-align: center;
        height: 50px;
        font-size: 50px;
        color: #999;
        z-index: 50000;
    }
    /*等待*/
    .loading {
        position: absolute;
        width: 232px;
        height: 32px;
        z-index: 300;
        background: url(../images/gb/loading.gif) no-repeat scroll center center transparent;
    }
</style>
<div style="left: -1000px; top: -100px; opacity: 1; font-size: 50px;" class="flyingAdd">
    <img src="../images/gb/checkboxbook.png" alt="" />
</div>
<script type="text/javascript">
    LoadScript('/KRS/scripts/Recommend.js');
    LoadScript('//piccache.cnki.net/kdn/nvsmkns/script/piwikCommon70.js');
</script>

下载代码

main.py 218 行
refence_file = requests.get(self.download_url, headers=HEADER)
改为：
refence_file = self.session.get(self.download_url) ?

请教遇到这个问题怎么办？十分感谢

Traceback (most recent call last):
File "F:\Python\CNKI-爬虫download\main.py", line 27, in
from GetPageDetail import page_detail
File "F:\Python\CNKI-爬虫download\GetPageDetail.py", line 203, in
page_detail = PageDetail()
File "F:\Python\CNKI-爬虫download\GetPageDetail.py", line 39, in init
if config.crawl_isDownLoadLink == '1':
File "F:\Python\CNKI-爬虫download\GetConfig.py", line 30, in get
value = self.func(instance)
File "F:\Python\CNKI-爬虫download\GetConfig.py", line 75, in crawl_isDownLoadLink
return int(self.conf.get('crawl', 'isDownLoadLink'))
File "F:\Anaconda\envs\tensorflow-gpu\lib\configparser.py", line 781, in get
d = self._unify_values(section, vars)
File "F:\Anaconda\envs\tensorflow-gpu\lib\configparser.py", line 1141, in _unify_values
raise NoSectionError(section)
configparser.NoSectionError: No section: 'crawl'

请教出现这些是什么问题？万分感谢

Traceback (most recent call last):
File "C:\Users\Yuchl\Downloads\CNKI-download-master\main.py", line 259, in
main()
File "C:\Users\Yuchl\Downloads\CNKI-download-master\main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "C:\Users\Yuchl\Downloads\CNKI-download-master\main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "C:\Users\Yuchl\Downloads\CNKI-download-master\main.py", line 106, in pre_parse_page
reference_num = re.search(reference_num_pattern_compile,
AttributeError: 'NoneType' object has no attribute 'group'****

下载指定篇名的论文下不了啊

应该时没搜到主题词关键词等单一搜索都是报这个错
Traceback (most recent call last):
File "C:/Users/Lenovo/Downloads/CNKI-download-master/CNKI-download-master/main.py", line 246, in
main()
File "C:/Users/Lenovo/Downloads/CNKI-download-master/CNKI-download-master/main.py", line 240, in main
search.search_reference(get_uesr_inpt())
File "C:/Users/Lenovo/Downloads/CNKI-download-master/CNKI-download-master/main.py", line 87, in search_reference
second_get_res.text).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

非常抱歉打扰，但是这个问题困扰了我许久，期待您的解答

请输入【主题】：python
请输入【篇名】：网络
请输入【篇名】条件类型:（ａ）并且　（ｂ）或者　（ｃ）不含 c
－－－－－－－－－－－－－－－－－－－－－－－－－－
是否需要规定文献来源（y/n）？y
输入文献来源期刊名称：电子技术与软件工程
正在检索中.....
－－－－－－－－－－－－－－－－－－－－－－－－－－
检索到85条结果，全部下载大约需要00小时07分钟05秒。
是否要全部下载（y/n）?n
请输入需要下载的数量：1
开始下载前1页所有文件，预计用时00小时01分钟40秒
－－－－－－－－－－－－－－－－－－－－－－－－－－
正在下载: Python在商品销售数据分析中的使用.cajTraceback (most recent call last):
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\util\connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connection.py", line 239, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 1037, in _send_output
self.send(msg)
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 975, in send
self.connect()
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connection.py", line 205, in connect
conn = self._new_conn()
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x000002B698451D20>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\idea\Data_mining\venv\lib\site-packages\requests\adapters.py", line 489, in send
resp = conn.urlopen(
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\util\retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='i.shufang.cnki.net', port=80): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DDZRU202210049&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3Dpython%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=726a6f53-1896-b19c-b08a-c7edde6fcf0&action=file&userName=&td=1544605318654 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002B698451D20>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\idea\Data_mining\数据挖掘\zhiwang\main.py", line 249, in
main()
File "D:\idea\Data_mining\数据挖掘\zhiwang\main.py", line 243, in main
search.search_reference(get_uesr_inpt())
File "D:\idea\Data_mining\数据挖掘\zhiwang\main.py", line 88, in search_reference
self.parse_page(
File "D:\idea\Data_mining\数据挖掘\zhiwang\main.py", line 176, in parse_page
page_detail.get_detail_page(self.session, self.get_result_url,
File "D:\idea\Data_mining\数据挖掘\zhiwang\GetPageDetail.py", line 70, in get_detail_page
self.session.get(
File "D:\idea\Data_mining\venv\lib\site-packages\requests\sessions.py", line 600, in get
return self.request("GET", url, **kwargs)
File "D:\idea\Data_mining\venv\lib\site-packages\requests\sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "D:\idea\Data_mining\venv\lib\site-packages\requests\sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "D:\idea\Data_mining\venv\lib\site-packages\requests\adapters.py", line 565, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='i.shufang.cnki.net', port=80): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DDZRU202210049&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3Dpython%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=726a6f53-1896-b19c-b08a-c7edde6fcf0&action=file&userName=&td=1544605318654 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002B698451D20>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

检索论文出现问题

检索到69条结果，全部下载大约需要00小时05分钟45秒。
是否要全部下载（y/n）?y
正在下载: 基于文字识别技术的作业自动批改系统.caj
Traceback (most recent call last):
File "main.py", line 259, in
main()
File "main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "main.py", line 188, in parse_page
self.download_url)
File "D:\paper_search_program\CNKI-download-master\GetPageDetail.py", line 80, in get_detail_page
self.pars_page(get_res.text)
File "D:\paper_search_program\CNKI-download-master\GetPageDetail.py", line 89, in pars_page
orgn_list = soup.find(name='div', class_='orgn').find_all('a')
AttributeError: 'NoneType' object has no attribute 'find_all'

这个该怎么解决啊？博主，希望可以解答一波！！！！！！！！谢谢！！！

出现“正在检索中”后报错

老哥，我下载的文件只有1kb是怎么回事啊，下载链接复制到浏览器可以正常下载啊？

无法搜索

python main.py
－－－－－－－－－－－－－－－－－－－－－－－－－－
|　　　　　　　　　　　　　　　　　　　　　　　　　|
|　请选择检索条件：（可多选）　　　　　　　　　　　|
|（ａ）主题　　　（ｂ）关键词　　　（ｃ）篇名　　　|
|（ｄ）摘要　　　（ｅ）全文　　　　（ｆ）被引文献　|
|（ｇ）中图分类号　　　　　　　　　　　　　　　　　|
|　　　　　　　　　　　　　　　　　　　　　　　　　|
－－－－－－－－－－－－－－－－－－－－－－－－－－
请选择（以空格分割，如a c）：c
－－－－－－－－－－－－－－－－－－－－－－－－－－
您选择的是：
篇名 |
－－－－－－－－－－－－－－－－－－－－－－－－－－
请输入【篇名】：汉服
－－－－－－－－－－－－－－－－－－－－－－－－－－
是否需要规定文献来源（y/n）？n
正在检索中.....
－－－－－－－－－－－－－－－－－－－－－－－－－－
Traceback (most recent call last):
File "main.py", line 259, in
main()
File "main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "main.py", line 107, in pre_parse_page
page_source).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

1000条文献左右出现反复输入验证码情况

我跑到200左右出现验证码了

AttributeError: 'NoneType' object has no attribute 'find_all'这个怎么解决呀

Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\CNKI-download\main.py", line 259, in
main()
File "C:\Users\Administrator\Desktop\CNKI-download\main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "C:\Users\Administrator\Desktop\CNKI-download\main.py", line 98, in search_reference
self.parse_page(
File "C:\Users\Administrator\Desktop\CNKI-download\main.py", line 186, in parse_page
page_detail.get_detail_page(self.session, self.get_result_url,
File "C:\Users\Administrator\Desktop\CNKI-download\GetPageDetail.py", line 80, in get_detail_page
self.pars_page(get_res.text)
File "C:\Users\Administrator\Desktop\CNKI-download\GetPageDetail.py", line 89, in pars_page
orgn_list = soup.find(name='div', class_='orgn').find_all('a')
AttributeError: 'NoneType' object has no attribute 'find_all'

无法下载文件

我从github上下载项目配置完成后得到.caj文件打开是网页源代码而不是caj文档

下载的文件都只有2kb，大佬这么解决？谢谢！

如何检索非期刊论文（如学位论文）

在具体使用过程中，我发现程序在指定文献期刊来源后，只能检索到期刊内容，无法检索到非期刊文献。例如将文献期刊来源设定为"xx大学"，检索结果的来源为“xx大学学报”，数据库为“期刊”。有没有什么方法能检索到硕博士论文？
读过代码后发现，该检索条件传入的参数为“'magazine_value1”，我想要修改此处的参数，尝试了几种方法但是没找到具体该传入什么参数。个人对爬虫和网络相关知识的相当浅薄，想知道此处应该如何修改？感谢大佬

无法正常检索论文

－－－－－－－－－－－－－－－－－－－－－－－－－－
|　　　　　　　　　　　　　　　　　　　　　　　　　|
|　请选择检索条件：（可多选）　　　　　　　　　　　|
|（ａ）主题　　　（ｂ）关键词　　　（ｃ）篇名　　　|
|（ｄ）摘要　　　（ｅ）全文　　　　（ｆ）被引文献　|
|（ｇ）中图分类号　　　　　　　　　　　　　　　　　|
|　　　　　　　　　　　　　　　　　　　　　　　　　|
－－－－－－－－－－－－－－－－－－－－－－－－－－
请选择（以空格分割，如a c）：c
－－－－－－－－－－－－－－－－－－－－－－－－－－
您选择的是：
篇名 |
－－－－－－－－－－－－－－－－－－－－－－－－－－
请输入【篇名】：贫化铀
－－－－－－－－－－－－－－－－－－－－－－－－－－
是否需要规定文献来源（y/n）？n
正在检索中.....
－－－－－－－－－－－－－－－－－－－－－－－－－－
Traceback (most recent call last):
File "main.py", line 259, in
main()
File "main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "main.py", line 107, in pre_parse_page
page_source).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

只检索文献信息出错，无法爬取摘要和关键字

报错信息：
在检索文件信息时，会出现”NoneType...find_all(‘a’)”的报错
解决办法：
我加了一个if判断如果find不到需要的信息（作者单位）就跳过，发现生成的excel里面都没有摘要和关键字了
问题猜测：
我打印了爬取到的soup，发现爬取到的html里面都没有摘要（在网页上查找同样的文章是存在摘要的），想问下作者是不是知网的接口又变了，因为对爬虫的了解很肤浅，真诚希望作者大大百忙之中解答一下，谢谢！

知网反爬

知网改了网页源代码，将搜索后包含内容的

进行了隐藏，爬取的网页源代码中无检索的结果，报错：
Traceback (most recent call last):
File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 263, in
main()
File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 257, in main
search.search_reference(get_uesr_inpt())
File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 100, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 110, in pre_parse_page
reference_num = re.search(reference_num_pattern_compile,
AttributeError: 'NoneType' object has no attribute 'group'
正则表达式无法检索到匹配项，返回None导致group()方法报错
知网更改了

有没有人能在这个代码基础上读取到正确的标签的

下载时出现错误代码

如下
Traceback (most recent call last):
File "D:\Python-3.83\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
self._prepare_proxy(conn)
File "D:\Python-3.83\lib\site-packages\urllib3\connectionpool.py", line 930, in _prepare_proxy
conn.connect()
File "D:\Python-3.83\lib\site-packages\urllib3\connection.py", line 396, in connect
_match_hostname(cert, self.assert_hostname or server_hostname)
File "D:\Python-3.83\lib\site-packages\urllib3\connection.py", line 406, in _match_hostname
match_hostname(cert, asserted_hostname)
File "D:\Python-3.83\lib\ssl.py", line 416, in match_hostname
raise CertificateError("hostname %r "
ssl.SSLCertVerificationError: ("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Python-3.83\lib\site-packages\requests\adapters.py", line 439, in send
resp = conn.urlopen(
File "D:\Python-3.83\lib\site-packages\urllib3\connectionpool.py", line 724, in urlopen
retries = retries.increment(
File "D:\Python-3.83\lib\site-packages\urllib3\util\retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='i.shufang.cnki.net', port=443): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DTYGY20210331002&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3D%25E8%25AF%25AD%25E9%259F%25B3%25E8%25AF%2586%25E5%2588%25AB%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=d522e520-357b-a254-9bd3-9e95fdce484&action=file&userName=&td=1544605318654 (Caused by SSLError(SSLCertVerificationError("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'")))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/ASUS/PycharmProjects/1111111/知网/CNKI-download-master/main.py", line 259, in
main()
File "C:/Users/ASUS/PycharmProjects/1111111/知网/CNKI-download-master/main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "C:/Users/ASUS/PycharmProjects/1111111/知网/CNKI-download-master/main.py", line 98, in search_reference
self.parse_page(
File "C:/Users/ASUS/PycharmProjects/1111111/知网/CNKI-download-master/main.py", line 186, in parse_page
page_detail.get_detail_page(self.session, self.get_result_url,
File "C:\Users\ASUS\PycharmProjects\1111111\知网\CNKI-download-master\GetPageDetail.py", line 70, in get_detail_page
self.session.get(
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 677, in send
history = [resp for resp in gen]
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 677, in
history = [resp for resp in gen]
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 237, in resolve_redirects
resp = self.send(
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "D:\Python-3.83\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='i.shufang.cnki.net', port=443): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DTYGY20210331002&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3D%25E8%25AF%25AD%25E9%259F%25B3%25E8%25AF%2586%25E5%2588%25AB%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=d522e520-357b-a254-9bd3-9e95fdce484&action=file&userName=&td=1544605318654 (Caused by SSLError(SSLCertVerificationError("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'")))
请求您的解答

报错

AttributeError: 'NoneType' object has no attribute 'group'

ocr识别出问题

问题描述

直接fork到的代码不是直接能用的
然后修改了一下

    def depoint(self, img):
        """传入二值化后的图片进行降噪"""
        pixdata = img.load()
        w, h = img.size
        for y in range(1, h - 1):
            for x in range(1, w - 1):
                count = 0
                if pixdata[x, y - 1] > 245:  # 上
                    count = count + 1
                if pixdata[x, y + 1] > 245:  # 下
                    count = count + 1
                if pixdata[x - 1, y] > 245:  # 左
                    count = count + 1
                if pixdata[x + 1, y] > 245:  # 右
                    count = count + 1
                if pixdata[x - 1, y - 1] > 245:  # 左上
                    count = count + 1
                if pixdata[x - 1, y + 1] > 245:  # 左下
                    count = count + 1
                if pixdata[x + 1, y - 1] > 245:  # 右上
                    count = count + 1
                if pixdata[x + 1, y + 1] > 245:  # 右下
                    count = count + 1
                if count > 4:
                    pixdata[x, y] = 255
        return img

    def imge2string(self,image,threshold):
        """
        图片转字符串
        按照threshold进行降噪
        """

        image = image.convert('L')
        # 二值化
        image = image.point(lambda x: 255 if x > threshold else 0)
        #
        # 继续降噪
        image = self.depoint(image)
        # 识别//这里识别还有问题 tesserocr识别内容为空
        result = tesserocr.image_to_text(image)
        print(str(threshold)+"识别到验证码：" + str(result))
        return result

    def crack_code(self):
        '''
        自动识别验证码
        '''
        image = Image.open('./data/crack_code.jpeg')
        # 转为灰度图像

        # 设定二值化阈值
        threshold = 127
        s1 = self.imge2string(image, threshold)
        s2 = self.imge2string(image, threshold+20)
        s3 = self.imge2string(image, threshold-20)
        if s1 == s2 == s3 or s1 == s2 or s1 == s3:
            return self.send_code(str(s1))
        elif s2 == s3:
            return self.send_code(str(s2))

在result = tesserocr.image_to_text(image)这里出现了问题
无论如何识别，或者处理图像，tesserocr返回结果均为空

对知网爬虫这个项目非常感兴趣，期待和可以和你联系，期待合作交流

请问一下，做了多线程么？

有相似无法检索的情况

是否需要规定文献来源（y/n）？n
正在检索中.....
－－－－－－－－－－－－－－－－－－－－－－－－－－
Traceback (most recent call last):
File "/Users/Desktop/CNKI-download-master/main.py", line 259, in
main()
File "/Users/Desktop/CNKI-download-master/main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "/Users/Desktop/CNKI-download-master/main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "/Users/Desktop/CNKI-download-master/main.py", line 106, in pre_parse_page
reference_num = re.search(reference_num_pattern_compile,
AttributeError: 'NoneType' object has no attribute 'group'

求助大佬

当搜索结果不够20个（一页）的时候会报错，希望能解决一下

正在检索中.....
－－－－－－－－－－－－－－－－－－－－－－－－－－
Traceback (most recent call last):
  File "/Users/irimsky/Downloads/CNKI-download-master/main.py", line 254, in <module>
    main()
  File "/Users/irimsky/Downloads/CNKI-download-master/main.py", line 248, in main
    search.search_reference(get_uesr_inpt())
  File "/Users/irimsky/Downloads/CNKI-download-master/main.py", line 92, in search_reference
    second_get_res.text).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

下载过来的文件都是2kb的，想问应该怎么解决呀，大佬们！

修改的地方就是http改成了https，其他没有动过，但是下载下来都是2kb，打开显示已损坏

itstyren / cnki-download Goto Github PK

cnki-download's People

Contributors

Stargazers

Watchers

Forkers

cnki-download's Issues

config内容：

报错内容

定位到的有关代码

问题描述

Recommend Projects

Recommend Topics

Recommend Org