Git Product home page Git Product logo

cail2018-toy's Introduction

2018**“法研杯”法律智能挑战赛 CAIL2018

1. Official Website

2018**‘法研杯’法律智能挑战赛

2. Time nodes

  • 第一阶段(2018.05.15-2018.07.14):
    • ~ 6月 5日,基于Small数据的模型提交截至。向评测结果高于基准算法成绩的团队发布Large数据
    • ~ 6月12日,基于Large-test数据对前期模型进行重新评测刷榜
    • ~ 7月14日,最终模型提交截至。
  • 第二阶段(2018.07.14-2018.08.14):
    • 主办方根据一个月的新增数据对最终模型进行封闭评测

3. Notice

3.1. Necessary adjustment

在将本项目代码clone或download到本地运行时,需要对如下文件处做简单修改:

  • ./predictor中创建model/目录(github上无法上传空文件夹)
  • ./utils/util.py中的第9行DATA_DIR,改为本地数据文件所在目录
  • 运行./test.py前,将第11行改为测试文件所在目录,第12行改为测试输出结果存放目录
  • 运行./score.py前,将第187行改为上述测试文件所在目录,第188行改为测试输出结果存放目录

3.2. Requirement

  • Language Environment

    • Python 3.5
  • Packages

    • jieba
    • pandas
    • sklearn

3.3. Unfinished Parts

  • ./preprocess/*

4. Updates

2018-05-18 [feng]

  • 数据文件太大,将文件夹从项目中删除
  • 默认数据目录为../data/CAIL2018-small-data,见util.py文件DATA_DIR常量
  • 使用清华中文分词工具thulac-python
  • thulac分词工具速度过慢,暂时使用jieba,后续可以考虑C++版本的各种分词工具
  • Notice:法条预测中,有些案件对应多个法条
  • 添加util.py文件
  • 添加preprocess.py文件,对数据进行中文分词,整合json2csv文件函数
  • 添加stopwords.txt文件,来源GitHub · stopwords-iso/stopwords-zh

2018-05-26 [feng]

  • 使用jieba多线程分词
  • 导入从搜狗词库下载的法律词典
  • 删除CODE_OF_CONDUCT.md文件
  • 添加dictionary/文件夹,包含用户词典及由.scel(搜狗的用户词典文件)文件解码处理的代码
  • 修正util.py中的24行的一处bug

2018-05-28 [feng]

  • 重新组织代码结构,依照官方提供svm_baseline代码
  • 删除preprocess.py
  • 添加train.py文件, ./predictor/目录等

2018-06-01 [feng]

  • 重新组织代码结构:
    • uti.py,law.txt, accu.txt, userdict.txt等文件均放入./utils/目录下
    • 现有的./predictor/目录在模型训练完后,即可直接打包提交
    • 添加本地测试与跑分文件:./test.py./score.py

5. TODOs

  • 考虑将停用词处理放入TD-IDF模型内部
  • 人工对分词结果进行适当修正
  • 对数据进行预分析,即./preprocess/目录下相关内容

6. Scores

0 SVM baseline on small-data

task-1 task-2 task-3 total-score
71.83 68.79 47.83 188.45

1st upload using linearSVC

succeeded after 8 stupid attempts by @FengBlil

date: 05-31

task-1 task-2 task-3 total-score
72.92 69.43 52.56 194.92

2nd upload using RandomForestClassifier

date: 06-01

task-1 task-2 task-3 total-score
62.20 59.99 48.73 170.92

7. Members

Team Members:

cail2018-toy's People

Contributors

fengbli avatar mcorange1997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cail2018-toy's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.