Git Product home page Git Product logo

deid_clinical_data's Introduction

Some Named Entity Recognition (NER) approaches applied to multiple datasets and extended for the main goal of this work which is the deidentification of clinical data of electronic health records.

Code layout

References & interesting links to consider :

Stanford NLP-NER tutorial: https://github.com/cs230-stanford/cs230-code-examples/tree/master/pytorch/nlp

Log-Linear Models, MEMMs, and CRFs: http://www.cs.columbia.edu/~mcollins/crf.pdf

BiLSTM-CRF: https://jovian.ai/abdulmajee/bilstm-crf#C29 https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html

Transfer learning - Bert: https://colab.research.google.com/github/abhimishra91/transformers-tutorials/blob/master/transformers_ner.ipynb

Interesting tutorial, simplistic implementation: https://www.depends-on-the-definition.com/sequence-tagging-lstm-crf/

Interesting repo, advanced implementation (CONLL): https://github.com/allanj/pytorch_neural_crf/blob/master/src/model/module/linear_crf_inferencer.py

Ongoing plan

  • Paper reading related to the topic (deidentification)
    • Ch-8 Sequence tagging NLP Book from Jufrasky
    • other papers and blogposts (see section Reference & interesting links to consider )
  • i2b2 data request
  • explore i2b2 dataset
  • build code on easier/standard datasets (e.g CoNLL-2003 or Kaggle NER Dataset)
  • lstm, bilstm, then bilstm-crf (consider implementation of pytorch-crf, and crf "from scratch")
  • apply to i2b2
  • compare to transfer learning (Bert, SciBert, Electra)
  • FASTER CRF

TODO

Explore : character embedding https://gist.github.com/DuaneNielsen/4e45408948b0aca9b66b7a55ddec8950 https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Some notes:

train :

activate the environement variables : env-crf (#TODO need to generate install requirements)

python train.py --data_dir data/small --model_dir experiments/base_model

more links

Allennlp (not sure if the code is working) https://github.com/marumalo/bilstm-crf

NER, multiple papers with code http://nlpprogress.com/english/named_entity_recognition.html

An interesting Paper with pseudocode https://arxiv.org/pdf/1508.01991.pdf reference for build upon pytorch tutorial https://github.com/jidasheng/bi-lstm-crf

more codes : https://github.com/marumalo/bilstm-crf using AllelNLP

https://github.com/jidasheng/bi-lstm-crf/tree/master/bi_lstm_crf

deid_clinical_data's People

Contributors

b-sayah avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.