Git Product home page Git Product logo

2019-ccf-bdci-ocr-mczj-fake_data_generator's Introduction

整体介绍

赛题背景:https://www.datafountain.cn/competitions/346

我们的队名是:鹏脱单攻略队 后面改为"天晨破晓"

团队成绩:2019CCF-BDCI大赛 最佳创新探索奖 "基于OCR的身份证要素提取"单赛题冠军

文件介绍

chusai_fuyinwuxiao:包含"复印无效"字样水印训练数据的伪造方法介绍和复现说明

rematch_jinzhifuyin:包含"禁止复印"字样水印训练数据的伪造方法介绍和复现说明

word_recognize_train_data:包含文字识别模型大规模数据伪造去水印和小规模(训练集去水印)的数据的制造方法和复现过程说明

Train_DataSet_final:初赛和复赛的处理之后的训练集,主要用作伪造的水印数据的背景

word_recognize_train_data:文字识别所需的训练集制作方法和复现说明

每个文件的功能见该文件里面的readme

考虑到项目体积,源数据只传入了少量样本图片,

生成数据的时间可能会比较漫长 ~~~ 如果条件允许,可以改为多进程实现,在本地我们都是30个核同时跑,但是比赛服务器核比较少实现多进程遇到过问题,所以全部改为了单进程.

比赛过程代码改动次数较多,没有留意保留每一份代码,代码重现有些地方全凭回忆.整理任务较重,没有过多时间一一复现核实,复现过程可能与描述有一定出入,如有问题,还麻烦联系我们,感谢

2019-ccf-bdci-ocr-mczj-fake_data_generator's People

Contributors

mingtzge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

2019-ccf-bdci-ocr-mczj-fake_data_generator's Issues

运行move_watermask_location.py出错

cb@ceo-pc:~/code/ccf_ocr/2/fake_data_generator/chusai_fuyinwuxiao/first_train$ python move_watermask_location.py
0a5f692293f34acfb7fa006b910c2598_1.jpg
Traceback (most recent call last):
File "move_watermask_location.py", line 149, in
m_run(img)
File "move_watermask_location.py", line 101, in m_run
father_cor, img_father = match_img(img_path, roi_img_path, tp_threshold)
File "move_watermask_location.py", line 21, in match_img
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
cv2.error: OpenCV(4.1.1) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

TypeError: rectangle() got an unexpected keyword argument 'width'

(tensorflow) F:\2019-CCF-BDCI-OCR-MCZJ-fake_data_generator-master\chusai_fuyinwuxiao\first_train>python add_rematch_watermask_self.py
00a0d1ba365f44f280a2adc22edf8c5e_0.jpg
00a0d1ba365f44f280a2adc22edf8c5e_1.jpg
../../Train_DataSet_final\00a0d1ba365f44f280a2adc22edf8c5e_1.jpg template failed!!
00a1eec24f304c20ab477e2acf6a73bf_0.jpg
../../Train_DataSet_final\00a1eec24f304c20ab477e2acf6a73bf_0.jpg template failed!!
00a1eec24f304c20ab477e2acf6a73bf_1.jpg
../../Train_DataSet_final\00a1eec24f304c20ab477e2acf6a73bf_1.jpg template failed!!
0a0a3bd703994168b7764b8cbd98d6ef_0.jpg
../../Train_DataSet_final\0a0a3bd703994168b7764b8cbd98d6ef_0.jpg template failed!!
0a0a3bd703994168b7764b8cbd98d6ef_1.jpg
Traceback (most recent call last):
File "add_rematch_watermask_self.py", line 133, in
gen_run(img)
File "add_rematch_watermask_self.py", line 99, in gen_run
im_after = add_text_to_image(origin_img, u'复印无效', bright_thr, new_pt)
File "add_rematch_watermask_self.py", line 40, in add_text_to_image
image_draw.rectangle((p_new, (p_new[0] + 177, p_new[1] + 50)), outline=(0, 0, 0, bright_random), width=4)
TypeError: rectangle() got an unexpected keyword argument 'width'
想咨询下这个问题如何解决,谢谢

first_train和second_finetune生成数据区别

你好!我看了一下初赛中”复印无效”两种生成数据的方式,first_train利用训练数据中水印位于空白区域的图片,使用模板匹配找到水印位置,并在其他地方生成水印;second_finetune直接生成用身份证模板生成水印。我想问的问题是:我觉得后一种生成方式(second_finetune)明显效果好一些(水印),那使用first_train生成数据还有必要吗?这样做是为了多添加一些训练数据吗?还是first_train生成的数据相比于second_finetine有什么优势吗?谢谢!

关于生成“复印无效”数据的问题

你好,非常感谢分享你们的方案和代码,有个问题请教下,生成复印无效数据集第一阶段的的数据集1的时候,choosed_imgs_10_2:"复印无效"水印打在空白处的图片(只有部分样本,用这些样本运行python move_watermask_location.py,只能得到几百张图片), 比赛中choosed_imgs_10_2中有多少样本,如果想要用更多的样本,需要自己去训练集中挑选出这些水印打在空白的地方的样本吗?谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.