Git Product home page Git Product logo

casia_hw_data_processing's Introduction

CASIA 手写中文数据处理

CASIA-HWDB1-2、CASIA-OLHWDB1-2、ICDAR2013联机脱机等中文手写数据处理

单字生成

①CASIA-OLHWDB1


transformCASIA-OLHWDB1.py文件中:

save_dir:换成 你要保存生成的图片的位置
version_dir:换成 原始二进制文件(*.POT)对应路径


其他选项比如:
  • size_mode:可选[basic | resize]。其中basic : 保持原始大小 ; resize : 缩放点集

  • resize_hw:要将图片缩放成多大,默认96X96,如果size_mode='basic'则这个不生效

  • background_color:255 白底黑字;0 黑底白字

  • border:四周留空白的像素个数


然后执行命令

python transformCASIA-OLHWDB1.py

文本行生成

①CASIA-OLHWDB2 (默认linux版本,windows版本看55行,改改就行)


transformCASIA-OLHWDB2.py文件中:
修改22行为:原始PTTS二进制文件
修改23行为:生成的图片要存储的路径

--rectify_flag 这里默认True,即huber线性拟合后进行矫正

然后执行命令

python transformCASIA-OLHWDB2.py

文件夹结构:
 CASIA-OLHWDB2       (总共:52220张文本行, 训练集41710张,测试集10510张)
 ├----CASIA-OLHWDB2.0     (20573张文本行)   (001-P16.ptts => 420-P20.ptts)
 │    ├--Train_Ptts                                                       (336 writers)
 │    └--Test_Ptts                                                        (84  writers)
 ├----CASIA-OLHWDB2.1     (17282张文本行)
 │    ├--Train_Ptts                       (1001-P16.ptts => 1240-P20.ptts, 240 writers)
 │    └--Test_Ptts                        (1241-P16.ptts => 1300-P20.ptts, 60  writers)
 └----CASIA-OLHWDB2.2     (14365张文本行)
      ├--Train_Ptts                       (501-P14.ptts  => 740-P18.ptts,  239 writers, 缺第671位writer)
      └--Test_Ptts                        (741-P14.ptts  => 800-P18.ptts,  60  writers)
          └--XXX.ptts

内容比较多,如果有错请指正。图片的核心区域估计仅供参考,应该还能再改改。


常用类别 classes

  • HWDB2_2703.py (CASAI-HWDB2文本行的训练集总字符类别)
  • HWDB2_7373.py
  • OLHWDB12_7356.py (CASAI-OLHWDB1单字的训练集总字符类别)
  • OLHWDB12_7366.py (CASAI-OLHWDB1单字、CASAI-OLHWDB2文本行的训练集总字符类别)
  • OLHWDB2_2650.py (CASAI-OLHWDB2文本行的训练集总字符类别)

TODO

单字生成:

  • CASIA-HWDB1
  • CASIA-HWDB2
  • CASIA-OLHWDB1
  • CASIA-OLHWDB2

文本行生成:

  • CASIA-HWDB2
  • CASIA-OLHWDB2 包括拟合后矫正、核心区域估计(core estimate)、path signature
  • ICDAR2013-offline 包括拟合后矫正
  • ICDAR2013-online 包括拟合后矫正、core estimate

  • 速度优化

数据来源: CASIA Online and Offline Chinese Handwriting Databases

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.