Git Product home page Git Product logo

medacy's Introduction

spaCy

medaCy

๐Ÿฅ Medical Text Mining and Information Extraction with spaCy ๐Ÿฅ

MedaCy is a text processing and learning framework built over spaCy to support the lightning fast prototyping, training, and application of highly predictive medical NLP models. It is designed to streamline researcher workflow by providing utilities for model training, prediction and organization while insuring the replicability of systems.

alt text

๐ŸŒŸ Features

  • Highly predictive, shared-task dominating out-of-the-box trained models for medical named entity recognition.
  • Customizable pipelines with detailed development instructions and documentation.
  • Allows the designing of replicable NLP systems for reproducing results and encouraging the distribution of models whilst still allowing for privacy.
  • Active community development spearheaded and maintained by NLP@VCU.
  • Detailed API.

๐Ÿ’ญ Where to ask questions

MedaCy is actively maintained by a team of researchers at Virginia Commonwealth University. The best way to receive immediate responses to any questions is to raise an issue. Make sure to first consult the API. See how to formulate a good issue or feature request in the Contribution Guide.

๐Ÿ’ป Installation Instructions

MedaCy can be installed for general use or for pipeline development / research purposes.

Application Run
Prediction and Model Training (stable) pip install git+https://github.com/NLPatVCU/medaCy.git
Prediction and Model Training (latest) pip install git+https://github.com/NLPatVCU/medaCy.git@development
Pipeline Development and Contribution See Contribution Instructions

๐Ÿ“š Power of medaCy

After installing medaCy and medaCy's clinical model, simply run:

from medacy.model.model import Model

model = Model.load_external('medacy_model_clinical_notes')
annotation = model.predict("The patient was prescribed 1 capsule of Advil for 5 days.")
print(annotation)

and receive instant predictions:

[
    ('Drug', 40, 45, 'Advil'),
    ('Dosage', 27, 28, '1'), 
    ('Form', 29, 36, 'capsule'),
    ('Duration', 46, 56, 'for 5 days')
]

MedaCy can also be used through its command line interface, documented here

To explore medaCy's other models or train your own, visit the examples section.

Reference

@ARTICLE {
    author  = "Andriy Mulyar, Natassja Lewinski and Bridget McInnes",
    title   = "TAC SRIE 2018: Extracting Systematic Review Information with MedaCy",
    journal = "National Institute of Standards and Technology (NIST) 2018 Systematic Review Information Extraction (SRIE) > Text Analysis Conference",
    year    = "2018",
    month   = "nov"
}

License

This package is licensed under the GNU General Public License.

Authors

Current contributors: Steele Farnsworth, Anna Conte, Gabby Gurdin, Aidan Kierans, Aidan Myers, and Bridget T. McInnes

Former contributors: Andriy Mulyar, Jorge Vargas, Corey Sutphin, and Bobby Best

Acknowledgments

medacy's People

Contributors

aidanmyers avatar andriymulyar avatar bmcinnes avatar coreysutphin avatar daikikatsuragawa avatar dendendelen avatar ggurdin avatar jvargas2 avatar paulsonnt avatar r-best avatar sammahen avatar sema4-ericschles avatar swfarnsworth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

medacy's Issues

MetaMap mapped_terms_to_spacy_ann Bug

Description

Conducted in development branch, attempted to metamap a file and generate a spacy annotation file. Got the error listed below.

Traceback (most recent call last): File "medtest.py", line 5, in <module> me_ann = medy.mapped_terms_to_spacy_ann(me_dict) File "/home/jeff/.local/lib/python3.6/site-packages/medacy/pipeline_components/metamap/metamap.py", line 196, in mapped_terms_to_spacy_ann for span in self.get_span_by_term(term): #if a single entity corresonds to a disjunct span File "/home/jeff/.local/lib/python3.6/site-packages/medacy/pipeline_components/metamap/metamap.py", line 255, in get_span_by_term if int(term['ConceptPIs']['@Count']) == 1: TypeError: string indices must be integers

Steps/Code to Reproduce

python3 medtest.py

Expected Results

Annotation output.

Actual Results

See error above

Versions

Linux-4.15.0-33-generic-x86_64-with-LinuxMint-19-tara
Python 3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0]
NumPy 1.15.4
SciPy 1.2.0
medacy 0.0.6

[FEATURE REQUEST] Expand the scope of medaCy past NER

What problem does your feature solve?
medaCy is capable of support much more than NER - the current codebase would not take much refactoring to set-up medaCy to run pipelines for others tasks such as relationship extraction.

Describe the solution you'd like
Refactor medaCy to make room for other types of medical text processing systems to be included. Pipeline components, pipelines, and tools could be left where they are - NER, relationship extraction, etc become root directories each containing a sub-directory of model.

Additional context
Looking big picture here.

Include a better interface for prediction.

In the Model class:
The .predict() method should ideally default to accepting a string of text and output structured predictions from that text (a spacy compatible annotation style would be useful). Currently, the .predict() method does not allow for this - it does bulk predictions.

Refactoring this functionality away into a bulk_predict() method would still allow for this.

_restore_from_ascii() method in metamap.py throws "TypeError: list indices must be integers not strings" when dealing with a converted document.

Description

If a non-ascii character is actually converted, when the program goes to restore the non-ascii character in the _restore_from_ascii_ method in metamap.py, it throws a TypeError. The offending line is for mapping in metamap_dict['metamap']['MMOs']['MMO']['Utterances']['Utterance']['Phrases']['Phrase']['Mappings']['Mapping']:.

Steps/Code to Reproduce

Include a character such as ะป in your document and attempt to metamap. The error will be thrown when it attempts to restore the document back.

Expected Results

Text has non-ascii characters again and the character spans in the metamap dictionary are updated to reflect the restoration.

Actual Results

Error thrown.

Versions

Linux-4.15.0-43-generic-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
medacy 0.0.8

[FEATURE REQUEST] Option for label ranking in outputs

Description
A CRF produces label probability outputs. Currently, we are simply using the highest probability label as the predicted entity label. It would be useful to allow for an option to output a label ranking for a given token.

Uses
Multi-label prediction, hierarchical label prediction. Capable of handling nested entities.

Where to get started
A contributor would simply have to interface the correct attributes set by the sklearn-crfsuite wrapper into the Annotation class. Some discussion would have to be had to insure compatibility with other parts of the package.

Make token merging optional during token annotation in each PipelineComponent.

Currently, tokens are merged by default in components such as the MetaMap annotator or the various UnitAnnotators. This is so that annotated groups of tokens are seen as individual block by the end classifier. This functionality is often wanted and should be default but still the option of turning off this merging should be provided to the end developer of a pipeline. This should be made de-facto for any new components but the re-factoring of the MetaMap and individual unit annotation components will be required.

[FEATURE REQUEST] Functionality for analyzing the differences between two Annotation objects.

What problem does your feature solve?
A method to do analysis of annotations (namely for the application of looking at differences between gold and predicted annotations).

Describe the solution you'd like
The Annotation class should be given some static methods like Annotation.diff(ann_object_1, ann_object_2) will output the difference between to annotation objects. Maybe some parameter for leniency to deal with fuzzy annotation matching.

Interface sklearn to compute various evaluation metrics between two annotation files (assuming one is gold and one is predicted).

Additional context
This would be very useful for result analysis and guiding the building of pipelines.

Separate models into new python packages.

What problem does your feature solve?
Models should not be provided with medaCy, rather they should be available for installation and compatible with medaCy.
Describe the solution you'd like
This will work very similarly to #59 .

AttributeError: 'NoneType' object has no attribute 'netloc'

Description

Hi, I try to install the medacy and medacy_model_clinical_notes model with google colab jupyter notebook.

Steps/Code to Reproduce

  1. Install medacy - successful
    !pip install git+https://github.com/NLPatVCU/medaCy.git
  2. Install medacy_model_clinical_notes, not successfully
    !pip install git+https://github.com/NLPatVCU/medaCy_model_clinical_notes.git

Collecting git+https://github.com/NLPatVCU/medaCy_model_clinical_notes.git
Cloning https://github.com/NLPatVCU/medaCy_model_clinical_notes.git to /tmp/pip-req-build-iyx8hwuq
Requirement already satisfied: medacy>=0.0.3 in /usr/local/lib/python3.6/dist-packages (from medacy-model-clinical-notes==1.0.1) (0.0.9)
Exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/cli/base_command.py", line 179, in main
status = self.run(options, args)
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/commands/install.py", line 315, in run
resolver.resolve(requirement_set)
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/resolve.py", line 131, in resolve
self._resolve_one(requirement_set, req)
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/resolve.py", line 357, in _resolve_one
add_req(subreq, extras_requested=available_requested)
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/resolve.py", line 314, in add_req
use_pep517=self.use_pep517
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/req/constructors.py", line 328, in install_req_from_req_string
if req.url and comes_from.link.netloc in domains_not_allowed:
AttributeError: 'NoneType' object has no attribute 'netloc'

Versions

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.version)
import scipy; print("SciPy", scipy.version)
import medacy; print("medacy", medacy.version)
-->
Linux-4.14.79+-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0]
NumPy 1.14.6
SciPy 1.1.0
medacy 0.0.9

Could anyone help out with this issue please, thank you!

Logging number of files loaded for training incorrectly stays the same despite different number of files being passed in

Description

When I fit a model initially, it prints out:
DEBUG:root:Loaded 4 files for training
DEBUG:root:Loaded 121 files for training
DEBUG:root:Loaded 138 files for training

where the 138 number is the correct number. When I rerun the script to fit the model I point it to a different directory containing five files and get the exact same log output, although it only trains on the specified five files.

Steps/Code to Reproduce

model.fit(train_loader) #Contains ~138 samples to train on

Then after it finishes...

model.fit(sample_loader) #Contains 5 samples

Expected Results

model.fit(sample_loader) will log that there are five files to train on

Actual Results

DEBUG:root:Loaded 4 files for training
DEBUG:root:Loaded 121 files for training
DEBUG:root:Loaded 138 files for training

Versions

Problem with installing medaCy

Description

Hi. I am trying to install medaCy on my system using the instructions given in the README, however, I am getting some error caused due to unavailability of some SpaCy models.

Steps/Code to Reproduce

Run:
pip install git+https://github.com/NLPatVCU/medaCy.git

Output



Collecting git+https://github.com/NLPatVCU/medaCy.git
       Cloning https://github.com/NLPatVCU/medaCy.git to /tmp/pip-k5kMwQ-build
Collecting en_core_web_sm@ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0 (from medacy==0.1.0)
        Could not find a version that satisfies the requirement en_core_web_sm@ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0 (from medacy==0.1.0) (from versions: )
No matching distribution found for en_core_web_sm@ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0 (from medacy==0.1.0)


Versions

  • Linux-4.4.0-17134-Microsoft-x86_64-with-Ubuntu-18.04-bionic
  • NumPy 1.16.3
  • SciPy 1.3.0
  • medaCy not installed, the issue is about that only

Refactor to implement sci-kit learn like functionality

It may be a good idea to implement a sci-kit learn like feel to the model building process. That is, a class with a 'fit' and 'predict' method. This would be alongside the existing code for learning and predicting.

ImportError: Package not installed: medacy_model_clinical_notes

3 model = Model.load_external('medacy_model_clinical_notes')
   4 annotation = model.predict("The patient was prescribed 1 capsule of Advil for 5 days.")
   5 print(annotation)
   /usr/local/lib/python3.6/dist-packages/medacy/ner/model/model.py in 
   load_external(package_name)
  458         """
  459         if importlib.util.find_spec(package_name) is None:

--> 460 raise ImportError("Package not installed: %s" % package_name)
461 return importlib.import_module(package_name).load()
462

       ImportError: Package not installed: medacy_model_clinical_notes

-->

Half of the document is considered for a specific span

Description

When generating the MedaCy ground truth files from 5 - fold cross validation, in some files a very large span is considered as the span for some Dose instances. This behavior is not observed when it is the only file that is trained. Following are the examples of 2 such files in TAC (2008) data set.
File : PMC1257590.ann
Dose 929 932 400

PMC4847079.ann
Dose 1394 1398 25.6
Dose 1409 1413 30.7

Steps/Code to Reproduce

Run 5 fold cross validation using the systematic_review_pipeline on the following files individually and with more than two files and compare the MedaCy ground truth files for the mentioned Dose instances above

  1. PMC1257590.ann
  2. PMC4847079.ann

Expected Results

File : PMC1257590.ann
Ground truth:
Dose 929 932 400
MedaCy truth:
Dose 929 932 400
Predictions:
Dose 929 932 400

File : PMC4847079.ann
Ground truth:
Dose 1394 1398 25.6
Dose 1409 1413 30.7
MedaCy truth:
Dose 1394 1398 25.6
Dose 1409 1413 30.7
Predictions:
Dose 1394 1398 25.6
Dose 1409 1413 30.7

Actual Results

File : PMC1257590.ann
Ground truth:
Dose 929 932 400
MedaCy truth:(when run individually)
Dose 929 932 400
MedaCy truth:(when run with more than 2 files)
Dose 929 10478 400 ฮผg/kg) of BPA (> 99% purity; Sigma-Aldrich.......................
Predictions:
Dose 929 932 400

File : PMC4847079.ann
Ground truth:
Dose 1394 1398 25.6
Dose 1409 1413 30.7
MedaCy truth:(when run individually)_
Dose 1394 1397 25.
Dose 1397 1398 6
Dose 1409 1412 30.
Dose 1412 1413 7
MedaCy truth:(when run with more than 2 files)
Dose 1394 1397 25.
Dose 1397 3460 6 mg/m3 and 30.7 mg/m3, respectively. Concentration measurements were taken using a portable DataRAM......
Dose 1409 3461 30.7 mg/m3, respectively. Concentration measurements were taken using a portable DataRAM.....................
Predictions:
None

Versions

Linux-3.10.0-693.11.6.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Python 3.4.5
NumPy 1.15.4
SciPy 1.2.0

[FEATURE REQUEST] Make Model class pickle-able

What problem does your feature solve?
Define a way to serialize a Model object such that the pipeline and inner machine learning model are connected in a single binary file.
Model objects can then be serialized and compressed for remote installation.

Describe the solution you'd like
Serialized models could be hosted anywhere and installed from anywhere.
model = medacy.load(serialized_model) # this would load an instance of the Model class

ImportError: Package not installed: medacy_model_clinical_notes

---------------------------------------------------------------------------

ImportError Traceback (most recent call last)
in
1 from medacy.ner.model import Model
2
----> 3 model = Model.load_external('medacy_model_clinical_notes')
4 annotation = model.predict("The patient was prescribed 1 capsule of Advil for 5 days.")
5 print(annotation)

~/anaconda3/lib/python3.7/site-packages/medacy/ner/model/model.py in load_external(package_name)
458 """
459 if importlib.util.find_spec(package_name) is None:
--> 460 raise ImportError("Package not installed: %s" % package_name)
461 return importlib.import_module(package_name).load()
462

ImportError: Package not installed: medacy_model_clinical_notes

[FEATURE REQUEST] Use medaCy with spaCy pipeline

This is more of a question/clarification about existing functionality. I would like to use use a medaCy in the way that one would typically use spaCy, in terms of pipeline components. That is, create a doc and use the doc attributes (ents, annotations, etc). Is there a way to load something like a clinical note and use it like you would in a spaCy pipeline? Is there a way to extend the spaCy pipeline with medaCy models so that annotations can be visualized with displaCy, or some approximation of these things. I read the docs and looked at the code base, but it wasn't clear to me whether this was currently possible or not. Any help and/or clarification would be appreciated. I'm currently trying out medaCy to extract drug doseage information from clinical notes. The ner is doing a great job of extraction, being able to use this model in the way one would use models in spaCy would be very helpful for our proof of concept stage.

Predictions directory not created automatically when running cross validation

Description

When running the cross validation model, the prediction directory is used to write predictions during bulk prediction. If the prediction directory is not created manually inside the data set folder it throws the following error:
'FileNotFoundError: [Errno 2] No such file or directory: '/home/mahendrand/VE/SMM4H/data_smmh4h/task2/training/dataset_1/predictions/326575463835250689.ann''

Steps/Code to Reproduce

if you do not create a prediction directory manually inside the data set folder and run the following model:
model.cross_validate(num_folds = 5, dataset = training_dataset, write_predictions=True)
The error will be thrown when it attempts to do the bulk prediction

Expected Results

When the path to the dataset is passed as the parameter, the prediction directory is expected to be created automatically inside the data set folder

Actual Results

error is thrown as above

Versions

Linux-3.10.0-693.11.6.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Python 3.4.5
NumPy 1.15.4
SciPy 1.2.0
medacy 0.0.9

[FEATURE REQUEST] Add an option to use word embeddings during training

What problem does your feature solve?
Includes more features to use during training.

Describe the solution you'd like
The VCU NLP lab has a collection of word embeddings (specifically glove vectors) trained over various medical corpora. Incorporate this into medaCy's FeatureExtractor.

Additional context
Please get in contact with me if you would like to pursue this feature - I have the embeddings packaged up for use with spaCy.

Separate data into new python packages.

What problem does your feature solve?
Dataset's should be separated into unique python packages. This will allow users to install, build, and work with data in an efficient manner while allowing complete control over who has access to that data.

  • medaCy can be broken into three parts: loading data, loading models, manipulating models and data (that is, training, prediction, meta-information extraction, etc). medaCy currently does all three in one package - but storing data and models takes a lot of memory. The architecture for the package should transition towards the direction of leaving only code that orchestrates the interaction between data and models in this repository while allowing actual data and models to be interfaced from separate compatible python packages. A nomenclature like medacy_model_***** and medacy_dataset_**** for external package naming makes sense although outside developers could clearly name their packages freely.

  • Dataset's that are included for benchmarking/testing purposes can be made into open repositories while anyone could also create a private version of a dataset by following directions on how to make their private package interface medaCy.

Describe the solution you'd like

  • DataLoader should be refactored to something like Dataset (this nomenclature makes more sense if one considers what the current DataLoader class does).
  • Each python package corresponding to a dataset can be installed/removed freely and has medaCy as an installation requirement.
  • Each medacy compatible dataset package should return an instantiated, ready-to-work-with version of the Dataset class. This class should include all functionality currently present in DataLoader alongside also sidestepping the init method from a machine directory to load data present in a medacy compatible data package.
  • Detailed instructions should be provided to write a custom interfacing data package (maybe provide a boilerplate template repository).
  • Ideally, the internals of Dataset will store a generator containing raw text files and annotation files alongside meta-data about the dataset (entities and types of entity relationships). These can then be looped through at will by the current code for processing documents - steps should be taken to insure this process is as memory efficient as possible as big corpora could be used.

installation issues on windows

Description

issues when trying to install medacy on windows.

Steps/Code to Reproduce

on cmd i am running:
pip install git+https://github.com/NLPatVCU/medaCy.git

Expected Results

medacy installed

Actual Results

Building wheel for ujson (setup.py) ... error
ERROR: Complete output from command 'c:\python\python37\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\om\AppData\Local\Temp\pip-install-5k7yz4al\ujson\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\om\AppData\Local\Temp\pip-wheel-572h8xdt' --python-tag cp37:
ERROR: Warning: 'classifiers' should be a list, got type 'filter'
running bdist_wheel
running build
running build_ext
building 'ujson' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/

ERROR: Failed building wheel for ujson

Versions

Windows-10-10.0.17763-SP0
Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)]
NumPy 1.16.2
ModuleNotFoundError: No module named 'scipy'
ModuleNotFoundError: No module named 'medacy'

Bad input array shape

Description

I'm getting a error called could not broadcast input array from shape (96) into shape (128)

Steps/Code to Reproduce

import medacy_model_clinical_notes
from medacy.model import Model
model = Model.load_external('medacy_model_clinical_notes')
annotations = model.predict("The patient took 5 mg of aspirin.")

Expected Results

{
'entities': {
'T3': ('Drug', 40, 45, 'Advil'),
'T1': ('Dosage', 27, 28, '1'),
'T2': ('Form', 29, 36, 'capsule'),
'T4': ('Duration', 46, 56, 'for 5 days')
},
'relations': []
}

Actual Results


ValueError Traceback (most recent call last)
in
6 #model = Model(pipeline)
7 f = open('/usr/local/lib/python3.6/dist-packages/medacy_model_clinical_notes/model/n2c2_2018_no_metamap_2018_12_22_16.49.17.pkl','rb')
----> 8 model1 = medacy_model_clinical_notes.load()
9 #model = Model.load_external('medacy_model_clinical_notes')
10 f.close()

/usr/local/lib/python3.6/dist-packages/medacy_model_clinical_notes/medacy_model_clinical_notes.py in load()
6 def load():
7 entities = ['Drug', 'Form', 'Route', 'ADE', 'Reason', 'Frequency', 'Duration', 'Dosage', 'Strength']
----> 8 pipeline = ClinicalPipeline(entities=entities)
9 model = Model(pipeline, n_jobs=1)
10 model_directory = resource_filename('medacy_model_clinical_notes', 'model')

/usr/local/lib/python3.6/dist-packages/medacy/pipelines/clinical_pipeline.py in init(self, metamap, entities)
22 description="""Pipeline tuned for the extraction of ADE related entities from the 2018 N2C2 Shared Task"""
23 super().init("clinical_pipeline",
---> 24 spacy_pipeline=spacy.load("en_core_web_sm"),
25 description=description,
26 creators="Andriy Mulyar (andriymulyar.com)", #append if multiple creators

/usr/local/lib/python3.6/dist-packages/spacy/init.py in load(name, **overrides)
16 if depr_path not in (True, False, None):
17 deprecation_warning(Warnings.W001.format(path=depr_path))
---> 18 return util.load_model(name, **overrides)
19
20

/usr/local/lib/python3.6/dist-packages/spacy/util.py in load_model(name, **overrides)
112 return load_model_from_link(name, **overrides)
113 if is_package(name): # installed as package
--> 114 return load_model_from_package(name, **overrides)
115 if Path(name).exists(): # path to model data directory
116 return load_model_from_path(Path(name), **overrides)

/usr/local/lib/python3.6/dist-packages/spacy/util.py in load_model_from_package(name, **overrides)
133 """Load a model from an installed package."""
134 cls = importlib.import_module(name)
--> 135 return cls.load(**overrides)
136
137

/usr/local/lib/python3.6/dist-packages/en_core_web_sm/init.py in load(**overrides)
10
11 def load(**overrides):
---> 12 return load_model_from_init_py(file, **overrides)

/usr/local/lib/python3.6/dist-packages/spacy/util.py in load_model_from_init_py(init_file, **overrides)
171 if not model_path.exists():
172 raise IOError(Errors.E052.format(path=path2str(data_path)))
--> 173 return load_model_from_path(data_path, meta, **overrides)
174
175

/usr/local/lib/python3.6/dist-packages/spacy/util.py in load_model_from_path(model_path, meta, **overrides)
154 component = nlp.create_pipe(name, config=config)
155 nlp.add_pipe(component, name=name)
--> 156 return nlp.from_disk(model_path)
157
158

/usr/local/lib/python3.6/dist-packages/spacy/language.py in from_disk(self, path, disable)
645 if not (path / 'vocab').exists():
646 exclude['vocab'] = True
--> 647 util.from_disk(path, deserializers, exclude)
648 self._path = path
649 return self

/usr/local/lib/python3.6/dist-packages/spacy/util.py in from_disk(path, readers, exclude)
509 for key, reader in readers.items():
510 if key not in exclude:
--> 511 reader(path / key)
512 return path
513

/usr/local/lib/python3.6/dist-packages/spacy/language.py in (p, proc)
641 if not hasattr(proc, 'to_disk'):
642 continue
--> 643 deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
644 exclude = {p: False for p in disable}
645 if not (path / 'vocab').exists():

pipeline.pyx in spacy.pipeline.Tagger.from_disk()

/usr/local/lib/python3.6/dist-packages/spacy/util.py in from_disk(path, readers, exclude)
509 for key, reader in readers.items():
510 if key not in exclude:
--> 511 reader(path / key)
512 return path
513

pipeline.pyx in spacy.pipeline.Tagger.from_disk.load_model()

pipeline.pyx in spacy.pipeline.Tagger.from_disk.load_model()

/usr/local/lib/python3.6/dist-packages/thinc/neural/_classes/model.py in from_bytes(self, bytes_data)
350 name = name.decode('utf8')
351 dest = getattr(layer, name)
--> 352 copy_array(dest, param[b'value'])
353 i += 1
354 if hasattr(layer, '_layers'):

/usr/local/lib/python3.6/dist-packages/thinc/neural/util.py in copy_array(dst, src, casting, where)
46 def copy_array(dst, src, casting='same_kind', where=None):
47 if isinstance(dst, numpy.ndarray) and isinstance(src, numpy.ndarray):
---> 48 dst[:] = src
49 elif isinstance(dst, cupy.ndarray):
50 src = cupy.array(src, copy=False)

ValueError: could not broadcast input array from shape (96) into shape (128)

Versions

NotImplementedError: object proxy must define __reduce_ex__()

Description

Installed medaCy and the model (medacy_model_clinical_notes) on a Mac using the GitHub instructions. When running the GitHub example ("The Power of medaCy") using Anaconda, I get the following error:

NotImplementedError: object proxy must define reduce_ex()

This is thrown in pickle.py. See the attached console trace for details:

medaCy.console.trace.txt

Steps/Code to Reproduce

from medacy.ner.model import Model

model = Model.load_external('medacy_model_clinical_notes')
annotation = model.predict("The patient was prescribed 1 capsule of Advil for 5 days.")
print(annotation)

Expected Results

{
'entities': {
'T3': ('Drug', 40, 45, 'Advil'),
'T1': ('Dosage', 27, 28, '1'),
'T2': ('Form', 29, 36, 'capsule'),
'T4': ('Duration', 46, 56, 'for 5 days')
},
'relations': []
}

Actual Results

The above mentioned error.

Versions

Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.2 (default, Dec 29 2018, 00:00:04)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.15.4
SciPy 1.1.0
medacy 0.1.1

Code coverage

Should have a coverage tool with our unit tests, shouldn't take too much work, just need to find the best one and install it, they're usually plug and play with the major testing frameworks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.