nladuo / captcha-break Goto Github PK
View Code? Open in Web Editor NEWcaptcha break based on opencv2, tesseract-ocr and some machine learning algorithm.
License: MIT License
captcha break based on opencv2, tesseract-ocr and some machine learning algorithm.
License: MIT License
Traceback (most recent call last):
File "E:/WJ/machinelearning/captcha-break-master/weibo.cn/python/spliter/main.py", line 25, in
split_dataset()
File "E:/WJ/machinelearning/captcha-break-master/weibo.cn/python/spliter/main.py", line 22, in split_dataset
ispliter.split_and_save(im_path)
File "E:\WJ\machinelearning\captcha-break-master\weibo.cn\python\spliter\spliter.py", line 55, in split_and_save
letters = self.split_letters(filename)
File "E:\WJ\machinelearning\captcha-break-master\weibo.cn\python\spliter\spliter.py", line 42, in split_letters
image = self.clear_noise(image)
File "E:\WJ\machinelearning\captcha-break-master\weibo.cn\python\spliter\spliter.py", line 62, in clear_noise
clear_horizontal_noise_line(image)
File "E:\WJ\machinelearning\captcha-break-master\weibo.cn\python\spliter\spliter.py", line 165, in clear_horizontal_noise_line
if is_black(i, now_height+1, image):
File "E:\WJ\machinelearning\captcha-break-master\weibo.cn\python\spliter\spliter.py", line 99, in is_black
b = image[j, i][0]
I try to modify two place in file spliter.py as follow:
j=min( shape[0]-1,j )
b = image[j, i][0]
g = image[j, i][1]
r = image[j, i][2]
average = (int(r) + int(g) + int(b))/3
if r < 244 and abs(average-b) < 4 and abs(average-g) < 4 and abs(average-r) < 4:
return True
return False
except Exception as e:
print(e)
now_width = min(image.shape[1] - 1, now_width)**
while end_width < image.shape[1] \
and image[now_height][end_width][0] < 12 \
and image[now_height][end_width][1] < 12 \
and image[now_height][end_width][2] < 12:
# print(image[now_height][end_width][0],
# image[now_height][end_width][1],
# image[now_height][end_width][2])
end_width += 1
return end_width - now_width
'''bash
python3 train.py 1 ⨯
Traceback (most recent call last):
File "/home/kali/captcha-break/jikexueyuan/python/trainer/train.py", line 4, in
from gen.gen_captcha import gen_dataset, load_templates
File "/home/kali/captcha-break/jikexueyuan/python/trainer/gen/gen_captcha.py", line 3, in
from img_process import rotate_and_cut
ModuleNotFoundError: No module named 'img_process'
'''
Thanks a lot
cd ./recognizer && cmake . && make
执行 cmake .
时报错,说是有库找不到,但是我都安装了你说到的依赖,在网上搜也都说的不清楚。
下面说的几个地址都不知道在哪找。求助。
CMake Error at /usr/share/cmake-3.5/Modules/FindBoost.cmake:1677 (message):
Unable to find the requested Boost libraries.
Boost version: 1.58.0
Boost include path: /usr/include
Could not find the following static Boost libraries:
boost_filesystem
boost_system
No Boost libraries were found. You may need to set BOOST_LIBRARYDIR to the
directory containing Boost libraries or BOOST_ROOT to the location of
Boost.
(Ps: 如果K的取值是调优过后选择的,请忽略该条issue)
您好,识别脚本中KNN应加入网格搜索GridSearchCV,K的取值让模型判断,CV就按10折来搞吧。
如题:出现<class 'Exception'> : [WinError 32] The process cannot access the file because it is being used by another process: './captchas/3d1e1683-1455-4d46-810f-062e7a2905ba.gif'
我试过,不知道为什么,只要这个downloader.py运行着,就无法在其中的任何代码了删除gif文件,建议分成两个文件:
downloader.py
# coding:utf-8
import requests
import uuid
from PIL import Image
import os
from bs4 import BeautifulSoup
url = "http://login.weibo.cn/login/"
for i in range(2000):
try:
resp = requests.get(url)
bsObj = BeautifulSoup(resp.content, "html.parser")
image_url = str(bsObj.img['src'])
#print(image_url)
resp = requests.get(image_url)
filename = str(uuid.uuid4()) + ".gif"
with open("./captchas/" + filename, 'wb') as f:
f.write(resp.content)
try:
with Image.open("./captchas/" + filename) as im:
im.save("./captchas/" + filename.split('.gif')[0] + ".png")
except Exception as ex:
print(Exception, ":", ex)
#os.remove("./captchas/" + filename)
print(filename)
except Exception as ex:
print(Exception, ":", ex)
clean.py
import os
for fn in os.listdir("./captchas/"):
if os.path.splitext(fn)[1] == '.gif':
os.remove("./captchas/"+fn)
另外可以加一个requirements.txt
PIL
requests
bs4
lxml就没必要加了,安装时还要编译,直接用自带的html.parser代替即可
如题,downloader验证码get后,VS14上生成的spliter.exe运行出现:
../downloader/captchas/00179352-2c1b-42c9-83c2-c88402639381.png
OpenCV Error: Assertion failed (dims <= 2 && data && (unsigned)i0 < (unsigned)size.p[0] && (unsigned)(i1*DataType<_Tp>::channels) < (unsigned)(size.p[1]*channels()) && ((((sizeof(size_t)<<28)|0x8442211) >> ((DataType<_Tp>::depth) & ((1 << 3) - 1))*4) & 15) == elemSize1()) in cv::Mat::at, file D:\Coding\C-C++\opencv\opencv2_4_13\build\include\opencv2/core/mat.hpp, line 548
generating: ./dataset/啊
Traceback (most recent call last):
File "/home/hya/PycharmProjects/resource/captcha-break/weibo.cn2/generate_dataset/main.py", line 45, in
os.mkdir(save_path)
OSError: [Errno 2] No such file or directory: './dataset/\xe5\x95\x8a'
for i, ch in enumerate(chars):
if i > 20:
break
save_path = "./dataset/" + str(ch)
print "generating:", save_path
if not os.path.exists(save_path):
os.mkdir(save_path) #这里报错
# generate 360 sample for every character
for i in range(-30, 30, 1):
for j in range(6):
generate_data(ch, i, save_path)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.