Git Product home page Git Product logo

nnctn's Introduction

Introduction

nnctn is a nn-based cancer type Chinese specification normalization and trace-up program.

It takes variable cancer diagnostic descriptions in Chinese, normalize them to the stardard specifications in Chinese.

It has a neural network for improve the performance based on the users feedback.

It provides a api for other evaluation scrore functions.

Usage

nnctn has three sub-commands, each for build the index database, search the query string and train the nn model.

$ python nnctn.py -h
usage: PROG [-h] {build,search,train} ...

nn-based cancer type specification normalization and trace

positional arguments:
  {build,search,train}  sub-command help
    build               build index database for standardard cancer type
                        specifications and nn network
    search              search the query string in the index database and
                        return hits
    train               train the nn network using user\`s query string and
                        expected output

optional arguments:
  -h, --help            show this help message and exit

sub-command build takes in a excel file which have a stardard Chinese cancer specifications with the column name "中文". Default is the "type_of_cancer-含中文名.xlsx" in this directory.

The program brings a pre-build "cancertypeindex.db" and a pre-build "nn.db". Delete this two files before you execute the build sub-command.

$ python nnctn.py build -h
usage: PROG build [-h] [-i INPUT]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        standardard cancer type specifications list

sub-command search takes in a query string which represents the dignostic information in Chinese.

$ python nnctn.py search -h
usage: PROG search [-h] -q QUERY

optional arguments:
  -h, --help            show this help message and exit
  -q QUERY, --query QUERY
                        cancer diagnostic descriptions as query strings

sub-command train takes in a query string and a expected output, output must be in the query string`s search result.

There are several essential training items in the "training.set".

python nnctn.py train -h
usage: PROG train [-h] -q QUERY -e EXPECTED

optional arguments:
  -h, --help            show this help message and exit
  -q QUERY, --query QUERY
                        cancer diagnostic descriptions as query strings
  -e EXPECTED, --expected EXPECTED
                        expected output

Exameles

build

$ python nnctn.py build
building index database
Indexing 侵袭性血管黏液瘤
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 1.287 seconds.
Prefix dict has been built succesfully.
Indexing 间变性星形细胞瘤
Indexing 激活B细胞型
Indexing 急性嗜碱性白血病
Indexing 肾上腺皮质腺瘤
Indexing 腺样囊性乳腺癌
...

search

10 best hits will be printed in default.

$ python nnctn.py search -q "甲状腺乳头状癌,手术后"
甲状腺乳头状癌,手术后
        score   target
        1.968152	乳头状甲状腺癌
        1.750000	乳头状胃腺癌
        1.550000	子宫浆膜癌/子宫乳头状浆膜癌
        1.550000	尿路上皮乳头状瘤
        1.550000	乳腺实性乳头状癌
        1.550000	乳头状肾细胞癌
        1.550000	乳头状脑膜瘤
        1.550000	黏液乳头状室管膜瘤
        1.550000	尿道上皮乳头状瘤
        1.550000	脉络丛乳头状瘤瘤

train

$ python nnctn.py train -q "卵巢癌" -e "卵巢癌/输卵管癌"
update successful

nnctn's People

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

qistark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.