Git Product home page Git Product logo

meddict's Introduction

meddict

以现有的中文医学词表以及基于机器翻译的UMLS为基础,基于规则进行处理后生成用于分词的中文医学词典。

文件说明

来自人工翻译的词表

result/segment/thesaurus文件夹下:

来自UMLS翻译(主要为机器翻译)的词表

result/segment/umls文件夹下(根据质量从高到低进行排序):

  • umls_iciba.txt:来自金山词霸的带[医]等标签的翻译
  • umls_bgequal_baike.txt:来自百度翻译与谷歌翻译无序相等且包含中文的词条,且被百度百科维基百科所收录
  • umls_bgequal.txt:来自百度翻译与谷歌翻译无序相等且包含中文的词条
  • umls_baike.txt:来自UMLS的翻译词条,且被百度百科维基百科所收录
  • umls.txt:来自UMLS的翻译词条,有待进一步挖掘

整合词表

result/segment/combine文件夹下:

  • meddict_human.txt:整合上述所有来自人工翻译的词表
  • meddict_human_machine.txt:整合以下词表
    • 上述所有来自人工翻译的词表
    • umls_iciba.txt
    • umls_bgequal.txt
    • umls_baike.txt

词条数目统计

词表 词条数
hpo.txt 11216
icd10_gov.txt 29080
mesh.txt 20638
snomed.txt 10519
snomedct.txt 116086
umls_iciba.txt 112755
umls_bgequal_baike.txt 43763
umls_bgequal.txt 269181
umls_baike.txt 163680
umls.txt 3560886
meddict_human.txt 166613
meddict_human_machine.txt 554867

词表来源

现有中文医学词表

UMLS机器翻译

运行示例

运行生成hpo.txt的代码:

cd code
export PYTHONPATH=`pwd`:$PYTHONPATH
python segment/hpo.py

可能需要安装的python包

pip install tqdm
pip install seaborn
pip install xlrd
pip install opencc-python-reimplemented
pip install zhon

meddict's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.