Git Product home page Git Product logo

antispider's Issues

关于 xpath 与 css 选择器

第四章 信息校验与反爬虫 postman 示例

cookie 反爬虫

`import requests
from lxml import etree

url = 'http://www.porters.vip/verify/cookie/content.html'
resp = requests.get(url)
if resp.status_code == 200:
html = etree.HTML(resp.text)
res = html.cssselect('.page-header h1') #①
print(res)
else:
print('This request is fail.')`

① 处 使用的是css选择器,需要指定cookie才有内容返回,但我没有加cookie,使用xpath(改为 html.xpath('//h1/text()') ) 后就爬取到了主题,为什么?难道 xpath 与 css 选择器在重定向上有原理差异?

有没有9.3.2 实现滑动验证码的完整HTML代码

我参照你的书敲出的代码如下,本来想以txt格式上传,但系统又报错。阁下之前第九章的HTML代码也无效,我自行修改才有正常效果。这个太复杂,不会改。

<title>实现滑动验证码</title> <style> .tracks{ /*滑轨样式*/ width:390px; height:40px; background: #d0c4fe; overflow: hidden; border: 1px solid #c5c5c5; border-radius: 4px; text-align: center; } .hover{ /*滑块样式*/ left: 0px; position: absolute; margin-left: 16px; width: 50px; height: 38px; background: #ad99ff; text-align: center; line-height: 38px; } .hover:hover{ background: #fff; } .slidertips{ /*提示信息样式*/ height: 38px; line-height: 38px; color:#fff; visibility: hidden; } </style> <script> $(function(){ var tracks=document.getElementById('tracks'), sliderblock=document.getElementById('sliderblock'), slidertips=document.getElementById('slidertips'); }) //滑块宽度 var sliderblockWidth=$('#sliderblock').width(); //滑轨长度 var tracksWidth=$('#tracks').width(); var mousemove=false;//mousedown状态 sliderblock.addEventListener('mousedown',function(e){ //监听mousedown事件,记录滑块起始位置 mousemove=true; startCoordinateX=e.clientX //滑块起始位置 }) var distanceCoordianteX=0;//滑块起始位置 tracks.addEventListener('mousemove',function(e){ //监听鼠标移动 if(mousemove){//鼠标点击滑块后才跟踪移动 distanceCoordianteX=e.clientX-startCoordinateX;//滑块当前位置 if(distanceCoordianteX>tracksWidth-sliderblockWidth){ //通过限制滑块位移距离,避免滑块向右移出滑轨 distanceCoordianteX=tracksWidth-sliderblockWidth; }else if(distanceCoordianteX<0){ //通过限制滑块位移距离,避免滑块向左移出滑轨 distanceCoordianteX=0; } //根据移动距离显示滑块位置 sliderblock.style.left=distanceCoordianteX+'px'; } }) sliderblock.addEventListener('mouseup',function(e){ //鼠标松开视为完成滑动,记录滑块当前位置并调用验证方法 var endCoordinateX=e.clientX; verifySliderRetuls(endCoordinateX); }) function verifySliderRetuls(endCoordinateX){//验证滑动结果 mousemove=false;//此时鼠标已松开,防止滑块跟随鼠标移动 //允许误差3像素 if(Math.abs(endCoordinateX-startCoordinateX-tracksWidth)
>>
验证通过!

splash 运行一直在Initializing...

docker运行splash后访问http://localhost:8050后,在render框里面输入链接https://www.baidu.com,点击后一直在Initializing。。。。

020-11-05 12:11:24+0000 [-] Log opened.
2020-11-05 12:11:24.204062 [-] Splash version: 3.3.1
2020-11-05 12:11:24.204543 [-] Qt 5.9.1, PyQt 5.9.2, WebKit 602.1, sip 4.19.4, Twisted 18.9.0, Lua 5.2
2020-11-05 12:11:24.204648 [-] Python 3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609]
2020-11-05 12:11:24.204834 [-] Open files limit: 1048576
2020-11-05 12:11:24.204923 [-] Can't bump open files limit
2020-11-05 12:11:24.308447 [-] Xvfb is started: ['Xvfb', ':639602014', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
2020-11-05 12:11:24.392224 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2020-11-05 12:11:24.392458 [-] memory cache: enabled, private mode: enabled, js cross-domain access: disabled
2020-11-05 12:11:24.525890 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=90.0
2020-11-05 12:11:24.527004 [-] Web UI: enabled, Lua: enabled (sandbox: enabled)
2020-11-05 12:11:24.527598 [-] Site starting on 8050
2020-11-05 12:11:24.527729 [-] Starting factory <twisted.web.server.Site object at 0x7f53dff16cc0>
2020-11-05 12:11:24.528053 [-] Server listening on http://0.0.0.0:8050
2020-11-05 12:11:46.112275 [-] "172.17.0.1" - - [05/Nov/2020:12:11:45 +0000] "GET / HTTP/1.1" 200 7679 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0"
2020-11-05 12:11:46.152861 [-] "172.17.0.1" - - [05/Nov/2020:12:11:45 +0000] "GET /_ui/style.css HTTP/1.1" 200 2591 "http://localhost:8050/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0"
2020-11-05 12:11:51.300247 [-] "172.17.0.1" - - [05/Nov/2020:12:11:50 +0000] "GET /info?wait=0.5&images=1&expand=1&timeout=90.0&url=http%3A%2F%2Fgoogle.com&lua_source= HTTP/1.1" 200 5320 "http://localhost:8050/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0"
2020-11-05 12:11:54.908156 [-] "172.17.0.1" - - [05/Nov/2020:12:11:54 +0000] "GET /favicon.ico HTTP/1.1" 404 153 "http://localhost:8050/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0"
2020-11-05 12:12:05.672961 [-] "172.17.0.1" - - [05/Nov/2020:12:12:05 +0000] "GET /info?wait=0.5&images=1&expand=1&timeout=90.0&url=http%3A%2F%2Fwww.baidu.com&lua_source= HTTP/1.1" 200 5329 "http://localhost:8050/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0"
2020-11-05 12:13:05.673362 [-] Timing out client: IPv4Address(type='TCP', host='172.17.0.1', port=51986)

我尝试了3.3.1跟3.5这两个版本,都是这个问题,操作系统也尝试了centos7.8,ubuntu 20.04/18都尝试过还是这个样子,绝望了,请大佬指导下

勘误#第99页

原书第99页,print('This request is fial'),失败的英文是fail,作者拼错勒!

原书第16页pyteer.py执行报错

原书第16页代码执行时报以下错误:
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
[0310/181827.222752:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
系统:Ubuntu 18.04.4 LTS

解决办法:修改相应代码
browser = await launch()
修改为
browser = await launch({'args': ['--no-sandbox']})

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.