Git Product home page Git Product logo

antispider's Introduction

书籍相关问题,可前往穿甲兵技术社区 https://www.chuanjiabing.com 交流讨论

antispider 为书籍《Python3 反爬虫原理与绕过实战》配套代码

👀【公开课】微信小程序逆向零基础实践教程-有实际案例

2k 超清播放地址 -> https://www.chuanjiabing.com/thread/52

小程序零基础逆向内容速览:

1、了解小程序逆向和PC端JS逆向的差异

2、小程序逆向基本流程

3、小程序逆向所需设备与环境

4、学会使用小程序逆向解包工具

5、小程序目录结构

6、实践:小程序逆向静态分析

7、实践:小程序逆向动态调试

详细目录和封面预览

前往掘金社区查看

前往微信公众号查看

本书共 10 章,除第 1 章环境安装配置外和第 3 章爬虫与反爬虫之外,其他章节涉及的 Python 代码均记录在 antispider 中。包括:

  • 第 2 章 WEB网站的构成和页面渲染
  • 第 4 章 信息校验型反爬虫
  • 第 5 章 动态渲染反爬虫
  • 第 6 章 文本混淆反爬虫
  • 第 7 章 特征识别反爬虫
  • 第 8 章 APP 反爬虫
  • 第 9 章 验证码
  • 第 10 章 综合知识

编号说明

章节与对应代码以数字表示,如第 1 章对应 01 目录。9.1 小节对应 09/9-1 目录。

文件名说明

同一小节的代码文件名为该节编号,但考虑到每个小节会有多段代码,所以以英文 one two three 命名。 如 9.1.1 小节出现的第一段 Python 代码的文件名为 9-1-1-one.py,第二段 Python 代码的文件名为 9-1-1-two.py。

特别说明

1、书中部分代码有固定的文件名称,如 Custom64,那么该文件的名称就是 custom64.py。

2、验证码实现的 HTML/CSS/JS 代码存放在目录 09 中,目录名称为 captcha。

3、部分案例中包含图片或密钥文件,文件已存储在指定目录。

代码运行说明

antispider 项目中的代码均经过运行验证,与书本所述相同。使用时只需要搭配书本所述运行即可。

版权说明

antispider 项目所包含的代码为图书配套代码,仅供书籍读者个人学习研究所用,任何个人与机构不得以任何方式摘抄、转载、公开项目代码。

README 更新记录

2020-05-29 很多读者反馈第九章用于训练验证码的示例图片下载链接失效,经过核查发现文件还在,但链接莫名其妙的被微云替换了,新链接为 https://share.weiyun.com/5ptKIUg

百度网盘的字符验证码素材: https://pan.baidu.com/s/1LoQTK51RHbdXSrJ0o8uxqA 密码: tl5i

antispider's People

Contributors

asyncins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

antispider's Issues

关于 xpath 与 css 选择器

第四章 信息校验与反爬虫 postman 示例

cookie 反爬虫

`import requests
from lxml import etree

url = 'http://www.porters.vip/verify/cookie/content.html'
resp = requests.get(url)
if resp.status_code == 200:
html = etree.HTML(resp.text)
res = html.cssselect('.page-header h1') #①
print(res)
else:
print('This request is fail.')`

① 处 使用的是css选择器,需要指定cookie才有内容返回,但我没有加cookie,使用xpath(改为 html.xpath('//h1/text()') ) 后就爬取到了主题,为什么?难道 xpath 与 css 选择器在重定向上有原理差异?

勘误#第99页

原书第99页,print('This request is fial'),失败的英文是fail,作者拼错勒!

有没有9.3.2 实现滑动验证码的完整HTML代码

我参照你的书敲出的代码如下,本来想以txt格式上传,但系统又报错。阁下之前第九章的HTML代码也无效,我自行修改才有正常效果。这个太复杂,不会改。

<title>实现滑动验证码</title> <style> .tracks{ /*滑轨样式*/ width:390px; height:40px; background: #d0c4fe; overflow: hidden; border: 1px solid #c5c5c5; border-radius: 4px; text-align: center; } .hover{ /*滑块样式*/ left: 0px; position: absolute; margin-left: 16px; width: 50px; height: 38px; background: #ad99ff; text-align: center; line-height: 38px; } .hover:hover{ background: #fff; } .slidertips{ /*提示信息样式*/ height: 38px; line-height: 38px; color:#fff; visibility: hidden; } </style> <script> $(function(){ var tracks=document.getElementById('tracks'), sliderblock=document.getElementById('sliderblock'), slidertips=document.getElementById('slidertips'); }) //滑块宽度 var sliderblockWidth=$('#sliderblock').width(); //滑轨长度 var tracksWidth=$('#tracks').width(); var mousemove=false;//mousedown状态 sliderblock.addEventListener('mousedown',function(e){ //监听mousedown事件,记录滑块起始位置 mousemove=true; startCoordinateX=e.clientX //滑块起始位置 }) var distanceCoordianteX=0;//滑块起始位置 tracks.addEventListener('mousemove',function(e){ //监听鼠标移动 if(mousemove){//鼠标点击滑块后才跟踪移动 distanceCoordianteX=e.clientX-startCoordinateX;//滑块当前位置 if(distanceCoordianteX>tracksWidth-sliderblockWidth){ //通过限制滑块位移距离,避免滑块向右移出滑轨 distanceCoordianteX=tracksWidth-sliderblockWidth; }else if(distanceCoordianteX<0){ //通过限制滑块位移距离,避免滑块向左移出滑轨 distanceCoordianteX=0; } //根据移动距离显示滑块位置 sliderblock.style.left=distanceCoordianteX+'px'; } }) sliderblock.addEventListener('mouseup',function(e){ //鼠标松开视为完成滑动,记录滑块当前位置并调用验证方法 var endCoordinateX=e.clientX; verifySliderRetuls(endCoordinateX); }) function verifySliderRetuls(endCoordinateX){//验证滑动结果 mousemove=false;//此时鼠标已松开,防止滑块跟随鼠标移动 //允许误差3像素 if(Math.abs(endCoordinateX-startCoordinateX-tracksWidth)
>>
验证通过!

splash 运行一直在Initializing...

docker运行splash后访问http://localhost:8050后,在render框里面输入链接https://www.baidu.com,点击后一直在Initializing。。。。

020-11-05 12:11:24+0000 [-] Log opened.
2020-11-05 12:11:24.204062 [-] Splash version: 3.3.1
2020-11-05 12:11:24.204543 [-] Qt 5.9.1, PyQt 5.9.2, WebKit 602.1, sip 4.19.4, Twisted 18.9.0, Lua 5.2
2020-11-05 12:11:24.204648 [-] Python 3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609]
2020-11-05 12:11:24.204834 [-] Open files limit: 1048576
2020-11-05 12:11:24.204923 [-] Can't bump open files limit
2020-11-05 12:11:24.308447 [-] Xvfb is started: ['Xvfb', ':639602014', '-screen', '0', '1024x768x24', '-nolisten', 'tcp']
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
2020-11-05 12:11:24.392224 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2020-11-05 12:11:24.392458 [-] memory cache: enabled, private mode: enabled, js cross-domain access: disabled
2020-11-05 12:11:24.525890 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=90.0
2020-11-05 12:11:24.527004 [-] Web UI: enabled, Lua: enabled (sandbox: enabled)
2020-11-05 12:11:24.527598 [-] Site starting on 8050
2020-11-05 12:11:24.527729 [-] Starting factory <twisted.web.server.Site object at 0x7f53dff16cc0>
2020-11-05 12:11:24.528053 [-] Server listening on http://0.0.0.0:8050
2020-11-05 12:11:46.112275 [-] "172.17.0.1" - - [05/Nov/2020:12:11:45 +0000] "GET / HTTP/1.1" 200 7679 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0"
2020-11-05 12:11:46.152861 [-] "172.17.0.1" - - [05/Nov/2020:12:11:45 +0000] "GET /_ui/style.css HTTP/1.1" 200 2591 "http://localhost:8050/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0"
2020-11-05 12:11:51.300247 [-] "172.17.0.1" - - [05/Nov/2020:12:11:50 +0000] "GET /info?wait=0.5&images=1&expand=1&timeout=90.0&url=http%3A%2F%2Fgoogle.com&lua_source= HTTP/1.1" 200 5320 "http://localhost:8050/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0"
2020-11-05 12:11:54.908156 [-] "172.17.0.1" - - [05/Nov/2020:12:11:54 +0000] "GET /favicon.ico HTTP/1.1" 404 153 "http://localhost:8050/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0"
2020-11-05 12:12:05.672961 [-] "172.17.0.1" - - [05/Nov/2020:12:12:05 +0000] "GET /info?wait=0.5&images=1&expand=1&timeout=90.0&url=http%3A%2F%2Fwww.baidu.com&lua_source= HTTP/1.1" 200 5329 "http://localhost:8050/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0"
2020-11-05 12:13:05.673362 [-] Timing out client: IPv4Address(type='TCP', host='172.17.0.1', port=51986)

我尝试了3.3.1跟3.5这两个版本,都是这个问题,操作系统也尝试了centos7.8,ubuntu 20.04/18都尝试过还是这个样子,绝望了,请大佬指导下

原书第16页pyteer.py执行报错

原书第16页代码执行时报以下错误:
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
[0310/181827.222752:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
系统:Ubuntu 18.04.4 LTS

解决办法:修改相应代码
browser = await launch()
修改为
browser = await launch({'args': ['--no-sandbox']})

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.