Git Product home page Git Product logo

cnki-download's People

Contributors

irimsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cnki-download's Issues

出现验证码后报错

config内容:

[crawl]
; 爬取及下载开关 0为关闭 1为开启
isDownloadFile = 0
isCrackCode=1
isDetailPage=1
isDownLoadLink=0
stepWaitTime=3

报错内容

正在下载: 高中政治教学中渗透科学精神核心素养路径初探.caj
正在下载: 试论初中语文教学中学生表达能力的培养策略.caj
ERROR:root:出现验证码
Traceback (most recent call last):
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 144, in parse_page
    tr_table.tr.extract()
AttributeError: 'NoneType' object has no attribute 'tr'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 259, in <module>
    if __name__ == '__main__':
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 253, in main
    search = SearchTools()
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 98, in search_reference
    self.parse_page(
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 195, in parse_page
    self.get_another_page(download_page_left)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 209, in get_another_page
    self.parse_page(download_page_left, get_res.text)
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/main.py", line 149, in parse_page
    crack.get_image(self.get_result_url, self.session,
  File "/Users/caizhicheng/Desktop/CNKI_download/CNKI-download/CrackVerifyCode.py", line 34, in get_image
    self.current_url = re.search(r'(.*?)#', current_url).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

定位到的有关代码

最后一行

class CrackCode(object):
    def get_image(self, current_url, session, page_source):
        '''
        获取验证码图片
        '''
        self.header = HEADER
        self.session = session
        # 获得验证码图片地址
        imgurl_pattern_compile = re.compile(r'.*?<img src="(.*?)".*?')
        img_url = re.search(imgurl_pattern_compile, page_source).group(1)
        self.current_url = re.search(r'(.*?)#', current_url).group(1)

即使把所有的链接改为 https,仍会爆出下面的错误,如何解决呢?

即使把所有的链接改为 https,仍会爆出下面的错误,如何解决呢?

D:\chromedownloads\CNKI-download-master\CNKI-download-master>python main.py
--------------------------
|                         |
| 请选择检索条件:(可多选)           |
|(a)主题   (b)关键词   (c)篇名   |
|(d)摘要   (e)全文    (f)被引文献 |
|(g)中图分类号                 |
|                         |
--------------------------
请选择(以空格分割,如a c):a
--------------------------
您选择的是:
主题 |
--------------------------
请输入【主题】:asdf
--------------------------
是否需要规定文献来源(y/n)?y
输入文献来源期刊名称:
正在检索中.....
--------------------------
检索到4条结果,全部下载大约需要00小时00分钟20秒。
是否要全部下载(y/n)?y
正在下载: 中信:决战澳矿.caj
Traceback (most recent call last):
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 839, in _validate_conn
conn.connect()
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connection.py", line 364, in connect
_match_hostname(cert, self.assert_hostname or server_hostname)
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connection.py", line 374, in _match_hostname
match_hostname(cert, asserted_hostname)
File "D:\Users\18301\anaconda3\lib\ssl.py", line 334, in match_hostname
% (hostname, ', '.join(map(repr, dnsnames))))
ssl.SSLCertVerificationError: ("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Users\18301\anaconda3\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "D:\Users\18301\anaconda3\lib\site-packages\urllib3\util\retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='i.shufang.cnki.net', port=443): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DJJDK202009024&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3D%25E6%259B%25BE%25E6%2599%25A8%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=199bceef-d913-a550-9ff0-b5614a82b64&action=file&userName=&td=1544605318654 (Caused by SSLError(SSLCertVerificationError("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'")))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 259, in
main()
File "main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "main.py", line 188, in parse_page
self.download_url)
File "D:\chromedownloads\CNKI-download-master\CNKI-download-master\GetPageDetail.py", line 73, in get_detail_page
params=params)
File "D:\Users\18301\anaconda3\lib\site-packages\requests\sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "D:\Users\18301\anaconda3\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "D:\Users\18301\anaconda3\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "D:\Users\18301\anaconda3\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='i.shufang.cnki.net', port=443): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DJJDK202009024&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3D%25E6%259B%25BE%25E6%2599%25A8%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=199bceef-d913-a550-9ff0-b5614a82b64&action=file&userName=&td=1544605318654 (Caused by SSLError(SSLCertVerificationError("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'")))

D:\chromedownloads\CNKI-download-master\CNKI-download-master>

验证码出问题

采了一百多篇,验证码出问题,提示:
ERROR:root:出现验证码
Traceback (most recent call last):
File "main.py", line 144, in parse_page
tr_table.tr.extract()
AttributeError: 'NoneType' object has no attribute 'tr'

用不用识别码都报错,请帮忙看一下是什么原因?谢谢

请教下载链接的解析方法 链接形式 https://kns.cnki.net/kns/download.aspx?filename=WRGMhx2KSxkQxNUQD50cSZXZUlHTv8ma3I2RKlnbwpFMrJXcEpHc5dzUPF3Z1BneZFHNGhEdCdFUnJkRzh3ayU1dE9WSiZ2KQxUbGdETQl1KSp1dw40b1JWcpV3cxAzYqFGaydmNQlmSDlXNsRkcQZEZrZVTul2N&tablename=CJFDLAST2018

实现了学校ip的知网登录但下载文献需要验证码(每一篇都要),真实的浏览器(selenium驱动浏览器也每篇都要验证码)请求可以直接下载到文献,是少量什么参数还是什么?
看了下CNKI-download的文献下载部分只是简单的get请求加了headers是一个404

import requests
headers = {
        'Connection': 'keep-alive',
        'Cache-Control': 'max-age=0',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'Accept-Language': 'zh-CN,zh;q=0.9,en-GB;q=0.8,en;q=0.7',
        # 'Cookie': 'SID=020197; Ecp_LoginStuts={"IsAutoLogin":false,"UserName":"DX0434","ShowName":"%e6%b5%99%e6%b1%9f%e7%90%86%e5%b7%a5%e5%a4%a7%e5%ad%a6","UserType":"bk","BUserName":"","BShowName":"","BUserType":"","r":"0rHTHE"}; c_m_LinID=LinID=WEEvREcwSlJHSldRa1FhcEFLUmVicE1SUFRzQTZEZW5Va0VWYitsa2NPMD0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!&ot=06/19/2020 13:54:08; LID=WEEvREcwSlJHSldRa1FhcEFLUmVicE1SUFRzQTZEZW5Va0VWYitsa2NPMD0=$9A4hF_YAuvQ5obgVAqNKPCYcEjKensW4IQMovwHtwkF4VYPoHbKxJw!!; c_m_expire=2020-06-19 13:54:08; Ecp_session=1; ASP.NET_SessionId=vughxubnlqvnxrf0vtd0brwz; Ecp_ClientId=5200619133401915832'
    }
    session = requests.Session()
    session.headers.update(headers)
    # ip 登录
    r = session.get(
        'https://login.cnki.net/TopLogin/api/loginapi/IpLoginFlush')
    r.encoding = r.apparent_encoding
    # print(r.text)
    res = session.get('https://kns.cnki.net/kns/download.aspx?filename=WRGMhx2KSxkQxNUQD50cSZXZUlHTv8ma3I2RKlnbwpFMrJXcEpHc5dzUPF3Z1BneZFHNGhEdCdFUnJkRzh3ayU1dE9WSiZ2KQxUbGdETQl1KSp1dw40b1JWcpV3cxAzYqFGaydmNQlmSDlXNsRkcQZEZrZVTul2N&tablename=CJFDLAST2018')
    res.encoding = res.apparent_encoding
    # print(res.headers)
    print(res.text)

output

</head>
<body>
    <div class="c_verify-box">
        <form method="post" onsubmit="return validate();">
            <h3 class="title">安全验证</h3>
            <p class="c_verify-desc">您当前的IP为:183.134.192.27,您的操作过于频繁,为保障帐
户的正常使用,请输入验证码:</p>
            <dl class="c_verify-code">
                <dt><img id="vImg" src="/kdoc/request/ValidateCode.ashx?t=1577242936454" alt="验证码" title="点击切换验证码"></dt>
                <dd>
                    <p class="tips" id="tips"></p>
                    <input type="password" id="vcode" name="vcode" maxlength="4"><button class="c_btn" type="submit">提交</button>
                </dd>
            </dl>
        </form>
    </div>

</body>
</html>

验证码获取失败,怎么破

!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<title>概览页</title> <script type="text/javascript" src="//piccache.cnki.net/kdn/nvsmkns/script/jquery-1.4.2.min.js"></script> <script type="text/javascript" src="//piccache.cnki.net/kdn/nvsmkns/script/min/gb.BriefPage.min.js?v=D59787997F3B8FCE" ></script> <script type="text/javascript" src="//piccache.cnki.net/kdn/nvsmkns/script/WideScreen.js"></script>
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdAAIdIRdFWpKA4BYUdQL6KgtZbgvJz/EQj5rQumcYDL2xlhbi/alS8mbfX8EEY1efJIm0syhzU+O3mZ5ahqVI454K" />
<script type="text/javascript"> function ShowGroup(p1, p2, p3) { return parent.ShowGroup(p1, p2, p3); }
        $(document).ready(function () {

            qkInfoCall();
            setAuShow();
            // GetHeat();
           
            window.parent.HideWaitDiv();
            SetFrameHeight();
            isHasAddFav();

            try{
                parent.window.adsContainer.loadAds(parent.document.getElementById("txt_1_value1").value);
            }
            catch(e){}
        });

    </script>
    <table border=0  ><tr><td>记录集失效</td></tr></table>

    <input name="tpagemode" type="hidden" id="tpagemode" value="L" />
</form>
<script type="text/javascript">
try{}catch(e){}
    ChangeDownloadImg();
    RevertUserSelect();
    briefTableListJSEvent();
    BindOnlick_ShowWait();
    BindTurnPage_TitleTip();
    parent.$("#zyzklist").hide();
    //外文推荐
    // recWWJDAddToTable();20170921,增加中英文混合检索,不需要再加载外文推荐
    var analysisURL = "/KVisual/ArticleAnalysis/index";
    modifySql();
    function modifySql() {
        var param = "";
        if (param == null || param == "") {
            var obj = parent.document.getElementById("sql");
            if (obj) {
                obj.value = "2827E4B6502D8710744CC7825A00F3F82BAB6FF9F49C28A8C06DBD3C5D73A36E7A6B95F18DA4019E021F3F1691F6B0A03B99C056E48A0254F8D0AFE1AAB57A9BFDCBBFFEEAAFA080E188818637CF6AADB3910F9CB0D5384C288BBBD10EE5B756BFAE86E762F5587544067EFCB6335F1551B1752FED7007F848A2F65F6361E4969CA97A467AE7DFF1D65FA2333691AE914B807EA865F98F2B4DA7F0E5B53CEDD31A34E99814BE79036EC7A23B28568767B543605EEB42085FF85A2AC02FD02AE188F025E7ADBB5D5456124701C643F785C0E8F466CEE182F0A51495CB44F3F039D6E5D62B005E08337F47C8371201A0DFFCD7B64073A1CD0D600811A47AC221B26485DE690B81866288498CD8DECB643D5A64546FA6FF6D41267ACBE6078EC4D35DF08B166A076AEAA5A7E0C875747A661813A88146D8A0137BBB953F17A54818672367305E80A265A56051CB57C24AD39C2D00E0684CDFA37DD96554F37EE38FD19E0CD5CE82D88DA5FAE4A2031AE3E919BB498FF0449A5F52A7D842DC60B2BD843E9B9509F4BC42505450294895655B83E5650C9144C860DA8E88EE4C6B08E27624BDE654E1FC7AF299653113BC029D0992FBF45C30DBB551D112D5C03A389CD1052A01C8786C738A9F5DF0C441D49E11AFA9584FF3A277196FFB1CA6BDC1A25E6772206DF8EFC2D5447DFAB86DBAA1613C34E184FC2B7B55377B7884B29AECB4936D0C467D89B9E4E9369F64918AEBD8384D5D249B77A9B49004D8D15D3A7ED0C89DFFE7113205E7BB1299D4FC6B0DA8ACB80F7FAC8108D4A4E64B60670662A952D1BE0AE397082DA211E56C8C828AA8E92C268A3FBC1B6198341E104077130909FA61E6683103C1254083F67147DEBD755F6092E3F90395E0CA27CAB3B84317BB47FA03DB85EEBC5B615F588F9DD26A526A277A46AD88D604D532A35D63E94F900E98D9D0C37B0A7BEC09EDDB1D89099BCBF1F2A3A8E1653D4EDD15965D90A79F1A31D6B9BF54835DF333410FFD5BA72C9A8D7B57E62F44302072FFE974835BDE3FE5299B779AF41A80BD39D540926EDC484B56409B2C66FFC44338DD0F61DF4706323FF89C933DADC03DC5BE11F75426D85B473DCFAE42917F52A585ADD81ED18A1EA75F13D4F70F5E8EA50D223A9342048E7986AECE95607D7476F386A9%";
        }
    }
}
//绑定分析
$("#analysisBox").hoverDelay({
    hoverDuring: 200,
    outDuring: 0,
    hoverEvent: function () {
        var $this = $(this);
        //显示数字
        var fileNameS = new FileNameS();
        var pcnt = fileNameS.Count();
        var rcnt = 643822;
        var ptext = pcnt > -10 ? "<span>(" + pcnt + ")</span>" : "";//始终为真
        var rtext = rcnt > -10 ? "<span>(" + rcnt + ")</span>" : "";
        $this.find("a").eq(0).html("已选文献分析" + ptext);
        //$this.find("a").eq(1).html("检索结果");
        $this.find(".imiSelDp").show();
    },
    outEvent: function () {
        $(this).find(".imiSelDp").hide();
    }
});
    //排序方式缓存  add by LH 2017-7-26
$("#J_ORDER .Btn5 a").click(function() {
    SetSortTypeCookie(this);
});
function SetSortTypeCookie(elm) {
    var sorttype = GetQueryStringByName($(elm).attr("href"), "sorttype");
    var Days = 7;
    var exp = new Date();
    exp.setTime(exp.getTime() + Days * 24 * 60 * 60 * 1000);
    var dbcode = GetQueryStringByName(window.location.href, "dbPrefix");
    document.cookie = "KNS_SortType" + "=" + escape(dbcode+"!"+sorttype) + ";expires=" + exp.toGMTString() + ";path=/";
}
window.document.onclick = parent.OnclickForHideMoredo;

</script>
<style type="text/css">
    .fly
       ingAdd
    { left: -100px;
        top: 0px;
        position: absolute;
        width: 50px;
        text-align: center;
        height: 50px;
        font-size: 50px;
        color: #999;
        z-index: 50000;
    }
    /*等待*/
    .loading {
        position: absolute;
        width: 232px;
        height: 32px;
        z-index: 300;
        background: url(../images/gb/loading.gif) no-repeat scroll center center transparent;
    }
</style>
<div style="left: -1000px; top: -100px; opacity: 1; font-size: 50px;" class="flyingAdd">
    <img src="../images/gb/checkboxbook.png" alt="" />
</div>
<script type="text/javascript">
    LoadScript('/KRS/scripts/Recommend.js');
    LoadScript('//piccache.cnki.net/kdn/nvsmkns/script/piwikCommon70.js');
</script>

下载代码

main.py 218 行
refence_file = requests.get(self.download_url, headers=HEADER)
改为:
refence_file = self.session.get(self.download_url) ?

请教遇到这个问题怎么办?十分感谢

Traceback (most recent call last):
File "F:\Python\CNKI-爬虫download\main.py", line 27, in
from GetPageDetail import page_detail
File "F:\Python\CNKI-爬虫download\GetPageDetail.py", line 203, in
page_detail = PageDetail()
File "F:\Python\CNKI-爬虫download\GetPageDetail.py", line 39, in init
if config.crawl_isDownLoadLink == '1':
File "F:\Python\CNKI-爬虫download\GetConfig.py", line 30, in get
value = self.func(instance)
File "F:\Python\CNKI-爬虫download\GetConfig.py", line 75, in crawl_isDownLoadLink
return int(self.conf.get('crawl', 'isDownLoadLink'))
File "F:\Anaconda\envs\tensorflow-gpu\lib\configparser.py", line 781, in get
d = self._unify_values(section, vars)
File "F:\Anaconda\envs\tensorflow-gpu\lib\configparser.py", line 1141, in _unify_values
raise NoSectionError(section)
configparser.NoSectionError: No section: 'crawl'

请教出现这些是什么问题?万分感谢

Traceback (most recent call last):
File "C:\Users\Yuchl\Downloads\CNKI-download-master\main.py", line 259, in
main()
File "C:\Users\Yuchl\Downloads\CNKI-download-master\main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "C:\Users\Yuchl\Downloads\CNKI-download-master\main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "C:\Users\Yuchl\Downloads\CNKI-download-master\main.py", line 106, in pre_parse_page
reference_num = re.search(reference_num_pattern_compile,
AttributeError: 'NoneType' object has no attribute 'group'****

下载指定篇名的论文下不了啊

应该时没搜到 主题词 关键词等单一搜索都是报这个错
Traceback (most recent call last):
File "C:/Users/Lenovo/Downloads/CNKI-download-master/CNKI-download-master/main.py", line 246, in
main()
File "C:/Users/Lenovo/Downloads/CNKI-download-master/CNKI-download-master/main.py", line 240, in main
search.search_reference(get_uesr_inpt())
File "C:/Users/Lenovo/Downloads/CNKI-download-master/CNKI-download-master/main.py", line 87, in search_reference
second_get_res.text).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

非常抱歉打扰,但是这个问题困扰了我许久,期待您的解答

请输入【主题】:python
请输入【篇名】:网络
请输入【篇名】条件类型:(a)并且 (b)或者 (c)不含 c
--------------------------
是否需要规定文献来源(y/n)?y
输入文献来源期刊名称:电子技术与软件工程
正在检索中.....
--------------------------
检索到85条结果,全部下载大约需要00小时07分钟05秒。
是否要全部下载(y/n)?n
请输入需要下载的数量:1
开始下载前1页所有文件,预计用时00小时01分钟40秒
--------------------------
正在下载: Python在商品销售数据分析中的使用.cajTraceback (most recent call last):
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\util\connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connection.py", line 239, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 1037, in _send_output
self.send(msg)
File "C:\Users\11815\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 975, in send
self.connect()
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connection.py", line 205, in connect
conn = self._new_conn()
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x000002B698451D20>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\idea\Data_mining\venv\lib\site-packages\requests\adapters.py", line 489, in send
resp = conn.urlopen(
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "D:\idea\Data_mining\venv\lib\site-packages\urllib3\util\retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='i.shufang.cnki.net', port=80): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DDZRU202210049&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3Dpython%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=726a6f53-1896-b19c-b08a-c7edde6fcf0&action=file&userName=&td=1544605318654 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002B698451D20>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\idea\Data_mining\数据挖掘\zhiwang\main.py", line 249, in
main()
File "D:\idea\Data_mining\数据挖掘\zhiwang\main.py", line 243, in main
search.search_reference(get_uesr_inpt())
File "D:\idea\Data_mining\数据挖掘\zhiwang\main.py", line 88, in search_reference
self.parse_page(
File "D:\idea\Data_mining\数据挖掘\zhiwang\main.py", line 176, in parse_page
page_detail.get_detail_page(self.session, self.get_result_url,
File "D:\idea\Data_mining\数据挖掘\zhiwang\GetPageDetail.py", line 70, in get_detail_page
self.session.get(
File "D:\idea\Data_mining\venv\lib\site-packages\requests\sessions.py", line 600, in get
return self.request("GET", url, **kwargs)
File "D:\idea\Data_mining\venv\lib\site-packages\requests\sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "D:\idea\Data_mining\venv\lib\site-packages\requests\sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "D:\idea\Data_mining\venv\lib\site-packages\requests\adapters.py", line 565, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='i.shufang.cnki.net', port=80): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DDZRU202210049&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3Dpython%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=726a6f53-1896-b19c-b08a-c7edde6fcf0&action=file&userName=&td=1544605318654 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000002B698451D20>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

检索论文出现问题

检索到69条结果,全部下载大约需要00小时05分钟45秒。
是否要全部下载(y/n)?y
正在下载: 基于文字识别技术的作业自动批改系统.caj
Traceback (most recent call last):
File "main.py", line 259, in
main()
File "main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "main.py", line 188, in parse_page
self.download_url)
File "D:\paper_search_program\CNKI-download-master\GetPageDetail.py", line 80, in get_detail_page
self.pars_page(get_res.text)
File "D:\paper_search_program\CNKI-download-master\GetPageDetail.py", line 89, in pars_page
orgn_list = soup.find(name='div', class_='orgn').find_all('a')
AttributeError: 'NoneType' object has no attribute 'find_all'

这个该怎么解决啊? 博主,希望可以解答一波!!!!!!!!谢谢!!!

推荐下载pdf

我记得知网有个api,选定了是否是pdf文件还是caj文件,caj比较恶心,而且类型还不是所有的软件都支持打开

/kns/download?filename=5UjSyB3SXd0N18mWrImTGNVYTxETNF0QZhXMWl3R2RVTHRnYIVjRuBzT6dmarVEa5gHVGJEeCplQHJETrZ2Q40UQMVmeTNTZTFEM4cnerglV0hDOoVGVI5WRR5mWod2VilUZ2V2QFN1dqJ2ZKtSMZR0LrFWW1t0U&tablename=CAPJLAST&dflag=pdfdown

dflag=pdfdown 这个是pdf的下载链接
dflag=cajdown 这个是caj的下载链接
除此之外其余的参数就没什么区别了

无法搜索

python main.py
--------------------------
|                         |
| 请选择检索条件:(可多选)           |
|(a)主题   (b)关键词   (c)篇名   |
|(d)摘要   (e)全文    (f)被引文献 |
|(g)中图分类号                 |
|                         |
--------------------------
请选择(以空格分割,如a c):c
--------------------------
您选择的是:
篇名 |
--------------------------
请输入【篇名】:汉服
--------------------------
是否需要规定文献来源(y/n)?n
正在检索中.....
--------------------------
Traceback (most recent call last):
File "main.py", line 259, in
main()
File "main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "main.py", line 107, in pre_parse_page
page_source).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

AttributeError: 'NoneType' object has no attribute 'find_all'这个怎么解决呀

Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\CNKI-download\main.py", line 259, in
main()
File "C:\Users\Administrator\Desktop\CNKI-download\main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "C:\Users\Administrator\Desktop\CNKI-download\main.py", line 98, in search_reference
self.parse_page(
File "C:\Users\Administrator\Desktop\CNKI-download\main.py", line 186, in parse_page
page_detail.get_detail_page(self.session, self.get_result_url,
File "C:\Users\Administrator\Desktop\CNKI-download\GetPageDetail.py", line 80, in get_detail_page
self.pars_page(get_res.text)
File "C:\Users\Administrator\Desktop\CNKI-download\GetPageDetail.py", line 89, in pars_page
orgn_list = soup.find(name='div', class_='orgn').find_all('a')
AttributeError: 'NoneType' object has no attribute 'find_all'

无法下载文件

我从github上下载项目配置完成后 得到.caj文件打开是网页源代码 而不是caj文档

如何检索非期刊论文(如学位论文)

在具体使用过程中,我发现程序在指定文献期刊来源后,只能检索到期刊内容,无法检索到非期刊文献。例如将文献期刊来源设定为"xx大学",检索结果的来源为“xx大学学报”,数据库为“期刊”。有没有什么方法能检索到硕博士论文?
读过代码后发现,该检索条件传入的参数为“'magazine_value1”,我想要修改此处的参数,尝试了几种方法但是没找到具体该传入什么参数。个人对爬虫和网络相关知识的相当浅薄,想知道此处应该如何修改?感谢大佬

无法正常检索论文

--------------------------
|                         |
| 请选择检索条件:(可多选)           |
|(a)主题   (b)关键词   (c)篇名   |
|(d)摘要   (e)全文    (f)被引文献 |
|(g)中图分类号                 |
|                         |
--------------------------
请选择(以空格分割,如a c):c
--------------------------
您选择的是:
篇名 |
--------------------------
请输入【篇名】:贫化铀
--------------------------
是否需要规定文献来源(y/n)?n
正在检索中.....
--------------------------
Traceback (most recent call last):
File "main.py", line 259, in
main()
File "main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "main.py", line 107, in pre_parse_page
page_source).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

只检索文献信息出错,无法爬取摘要和关键字

报错信息:
在检索文件信息时,会出现”NoneType...find_all(‘a’)”的报错
解决办法:
我加了一个if判断如果find不到需要的信息(作者单位)就跳过,发现生成的excel里面都没有摘要和关键字了
问题猜测:
我打印了爬取到的soup,发现爬取到的html里面都没有摘要(在网页上查找同样的文章是存在摘要的),想问下作者是不是知网的接口又变了,因为对爬虫的了解很肤浅,真诚希望作者大大百忙之中解答一下,谢谢!

知网反爬

知网改了网页源代码,将搜索后包含内容的

进行了隐藏,爬取的网页源代码中无检索的结果,报错:
Traceback (most recent call last):
File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 263, in
main()
File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 257, in main
search.search_reference(get_uesr_inpt())
File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 100, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "D:\code\CNKI-download-master (1)\CNKI-download-master\main.py", line 110, in pre_parse_page
reference_num = re.search(reference_num_pattern_compile,
AttributeError: 'NoneType' object has no attribute 'group'
正则表达式无法检索到匹配项,返回None导致group()方法报错
知网更改了

有没有人能在这个代码基础上读取到正确的标签的

下载时出现错误代码

如下
Traceback (most recent call last):
File "D:\Python-3.83\lib\site-packages\urllib3\connectionpool.py", line 667, in urlopen
self._prepare_proxy(conn)
File "D:\Python-3.83\lib\site-packages\urllib3\connectionpool.py", line 930, in _prepare_proxy
conn.connect()
File "D:\Python-3.83\lib\site-packages\urllib3\connection.py", line 396, in connect
_match_hostname(cert, self.assert_hostname or server_hostname)
File "D:\Python-3.83\lib\site-packages\urllib3\connection.py", line 406, in _match_hostname
match_hostname(cert, asserted_hostname)
File "D:\Python-3.83\lib\ssl.py", line 416, in match_hostname
raise CertificateError("hostname %r "
ssl.SSLCertVerificationError: ("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Python-3.83\lib\site-packages\requests\adapters.py", line 439, in send
resp = conn.urlopen(
File "D:\Python-3.83\lib\site-packages\urllib3\connectionpool.py", line 724, in urlopen
retries = retries.increment(
File "D:\Python-3.83\lib\site-packages\urllib3\util\retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='i.shufang.cnki.net', port=443): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DTYGY20210331002&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3D%25E8%25AF%25AD%25E9%259F%25B3%25E8%25AF%2586%25E5%2588%25AB%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=d522e520-357b-a254-9bd3-9e95fdce484&action=file&userName=&td=1544605318654 (Caused by SSLError(SSLCertVerificationError("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'")))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/ASUS/PycharmProjects/1111111/知网/CNKI-download-master/main.py", line 259, in
main()
File "C:/Users/ASUS/PycharmProjects/1111111/知网/CNKI-download-master/main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "C:/Users/ASUS/PycharmProjects/1111111/知网/CNKI-download-master/main.py", line 98, in search_reference
self.parse_page(
File "C:/Users/ASUS/PycharmProjects/1111111/知网/CNKI-download-master/main.py", line 186, in parse_page
page_detail.get_detail_page(self.session, self.get_result_url,
File "C:\Users\ASUS\PycharmProjects\1111111\知网\CNKI-download-master\GetPageDetail.py", line 70, in get_detail_page
self.session.get(
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 677, in send
history = [resp for resp in gen]
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 677, in
history = [resp for resp in gen]
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 237, in resolve_redirects
resp = self.send(
File "D:\Python-3.83\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "D:\Python-3.83\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='i.shufang.cnki.net', port=443): Max retries exceeded with url: /KRS/KRSWriteHandler.ashx?curUrl=detail.aspx%3FdbCode%3DCJFQ%26fileName%3DTYGY20210331002&referUrl=https%3A%2F%2Fkns.cnki.net%2Fkns%2Fbrief%2Fbrief.aspx%3Fpagename%3DASP.brief_default_result_aspx%26isinEn%3D1%26dbPrefix%3DSCDB%26dbCatalog%3D%25e4%25b8%25ad%25e5%259b%25bd%25e5%25ad%25a6%25e6%259c%25af%25e6%259c%259f%25e5%2588%258a%25e7%25bd%2591%25e7%25bb%259c%25e5%2587%25ba%25e7%2589%2588%25e6%2580%25bb%25e5%25ba%2593%26ConfigFile%3DCJFQ.xml%26research%3Doff%26t%3D1544249384932%26keyValue%3D%25E8%25AF%25AD%25E9%259F%25B3%25E8%25AF%2586%25E5%2588%25AB%26S%3D1%26sorttype%3D%23J_ORDER%26&cnkiUserKey=d522e520-357b-a254-9bd3-9e95fdce484&action=file&userName=&td=1544605318654 (Caused by SSLError(SSLCertVerificationError("hostname 'i.shufang.cnki.net' doesn't match either of '.cnki.net', 'www.cnki.net', '.global.cnki.net', '*.oversea.cnki.net', 'big5.book.oversea.cnki.net', 'caj.d.cnki.net', 'caj.oversea.d.cnki.net', 'en.cend.cnki.net', 'eng.tcm.cnki.net', 'gb.book.oversea.cnki.net', 'gb.cend.cnki.net', 'gb.cnbar.cnki.net', 'gb.obaor.cnki.net', 'gb.sczlmz.cnki.net', 'gb.sczlzj.cnki.net', 'gb.tcm.cnki.net', 'kb.tcm.cnki.net', 'oversea.d.cnki.net', 'pdf.d.cnki.net', 'pdf.oversea.d.cnki.net', 'tra.tcm.cnki.net', 'cnki.net'")))
请求您的解答

报错

AttributeError: 'NoneType' object has no attribute 'group'

ocr识别出问题

问题描述

直接fork到的代码不是直接能用的
然后修改了一下

    def depoint(self, img):
        """传入二值化后的图片进行降噪"""
        pixdata = img.load()
        w, h = img.size
        for y in range(1, h - 1):
            for x in range(1, w - 1):
                count = 0
                if pixdata[x, y - 1] > 245:  # 上
                    count = count + 1
                if pixdata[x, y + 1] > 245:  # 下
                    count = count + 1
                if pixdata[x - 1, y] > 245:  # 左
                    count = count + 1
                if pixdata[x + 1, y] > 245:  # 右
                    count = count + 1
                if pixdata[x - 1, y - 1] > 245:  # 左上
                    count = count + 1
                if pixdata[x - 1, y + 1] > 245:  # 左下
                    count = count + 1
                if pixdata[x + 1, y - 1] > 245:  # 右上
                    count = count + 1
                if pixdata[x + 1, y + 1] > 245:  # 右下
                    count = count + 1
                if count > 4:
                    pixdata[x, y] = 255
        return img

    def imge2string(self,image,threshold):
        """
        图片转字符串
        按照threshold进行降噪
        """

        image = image.convert('L')
        # 二值化
        image = image.point(lambda x: 255 if x > threshold else 0)
        #
        # 继续降噪
        image = self.depoint(image)
        # 识别//这里识别还有问题 tesserocr识别内容为空
        result = tesserocr.image_to_text(image)
        print(str(threshold)+"识别到验证码:" + str(result))
        return result

    def crack_code(self):
        '''
        自动识别验证码
        '''
        image = Image.open('./data/crack_code.jpeg')
        # 转为灰度图像

        # 设定二值化阈值
        threshold = 127
        s1 = self.imge2string(image, threshold)
        s2 = self.imge2string(image, threshold+20)
        s3 = self.imge2string(image, threshold-20)
        if s1 == s2 == s3 or s1 == s2 or s1 == s3:
            return self.send_code(str(s1))
        elif s2 == s3:
            return self.send_code(str(s2))

result = tesserocr.image_to_text(image)这里出现了问题
无论如何识别,或者处理图像,tesserocr返回结果均为空

有相似无法检索的情况

是否需要规定文献来源(y/n)?n
正在检索中.....
--------------------------
Traceback (most recent call last):
File "/Users/Desktop/CNKI-download-master/main.py", line 259, in
main()
File "/Users/Desktop/CNKI-download-master/main.py", line 253, in main
search.search_reference(get_uesr_inpt())
File "/Users/Desktop/CNKI-download-master/main.py", line 99, in search_reference
self.pre_parse_page(second_get_res.text), second_get_res.text)
File "/Users/Desktop/CNKI-download-master/main.py", line 106, in pre_parse_page
reference_num = re.search(reference_num_pattern_compile,
AttributeError: 'NoneType' object has no attribute 'group'

求助大佬

当搜索结果不够20个(一页)的时候会报错,希望能解决一下

正在检索中.....
--------------------------
Traceback (most recent call last):
  File "/Users/irimsky/Downloads/CNKI-download-master/main.py", line 254, in <module>
    main()
  File "/Users/irimsky/Downloads/CNKI-download-master/main.py", line 248, in main
    search.search_reference(get_uesr_inpt())
  File "/Users/irimsky/Downloads/CNKI-download-master/main.py", line 92, in search_reference
    second_get_res.text).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.