Git Product home page Git Product logo

metamaplite's Issues

FileNotFoundException when building Indices

I built a custom UMLS and then attempted to index the output using bin/create_indexes.bat, but ran into an issue:

Exception in thread "main" java.io.FileNotFoundException: C:\path\to\indices\meshtcrelaxed\postings (The system cannot find the path specified)
        at java.io.RandomAccessFile.open0(Native Method)
        at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:124)
        at irutils.MappedMultiKeyIndexDiskBasedGeneration.writeFinalIndex(MappedMultiKeyIndexDiskBasedGeneration.java:265)
        at gov.nih.nlm.nls.metamap.dfbuilder.BuildIndex.createIndex(BuildIndex.java:120)
        at gov.nih.nlm.nls.metamap.dfbuilder.BuildIndex.main(BuildIndex.java:155)

This seems to be caused by the containing directory (in my case, 'meshtcrelaxed') not existing when instantiating a new RandomAccessFile class here: https://github.com/lhncbc/metamaplite/blob/8aae39319a4a4b40a013180bf6cde09b172c78a8/src/main/java/irutils/MappedMultiKeyIndexDiskBasedGeneration.java#L265

If I manually create the containing directory C:\path\to\indices\meshtcrelaxed and then re-run, I don't encounter this error.

--chemdnersldi option not available

./metamaplite.sh --chemdnersldi input_file.txt

This returns

unknown option: --chemdnersldi

Content of input_file.txt

1|'Heart Attack'
2|'John had a huge heart attack'

Update:
After going through the source code: https://github.com/lhncbc/metamaplite/blob/master/src/main/java/gov/nih/nlm/nls/ner/MetaMapLite.java

found that I should be using the command:
./metamaplite.sh --inputformat=sldiwi input_file.txt

I was wrong in assuming that I should be using chemdnersldi.
SingleLineDelimitedInputWithID should be the document input format.

EntityLookup4 not checking for phrase type as well as PoS

While investigating a difference in the behavior of EntityLookup4 vs EntityLookup5, I ultimately traced it back to the part of findLongestMatch() that checks to see whether the part of speech of the first token of tokenSubList is in allowedPartOfSpeechSet: https://github.com/lhncbc/metamaplite/blob/d3171e5d1deb2ceeeeeca9b757a85b8617e5c01b/src/main/java/gov/nih/nlm/nls/metamap/lite/EntityLookup5.java#LL393C12-L393C12

In EntityLookup5, the check will allow tokens that are not of an allowed PoS if the phrase under consideration is of a type listed in allowedPhraseTypeSet. The corresponding place in EntityLookup4 doesn't do this check, so certain things are getting bounced out from EntityLookup4 that EntityLookup5 allows: https://github.com/lhncbc/metamaplite/blob/d3171e5d1deb2ceeeeeca9b757a85b8617e5c01b/src/main/java/gov/nih/nlm/nls/metamap/lite/EntityLookup4.java#L352

Is there a reason for this? If not, I will add in the corresponding check to EntityLookup4 and do a PR.

Class Mrconso missing from package dfbuilder

Class Mrconso is missing from package gov.nih.nlm.nls.metamap.dfbuilder, and so the class ExtractTreecodes does not compile. I had to remove the class so I could use the latest version of the code.

FileNotFoundException when opening default gv.words.temp.filename on Windows

The default bin/create_indexes.bat runs into a FileNotFoundException when running GenerateVariants (see https://github.com/lhncbc/metamaplite/blob/827a5c1f7a0174247ea499117d82745af827c628/bin/create_indexes.bat#L58-L60) as the default value for wordsFilename is /tmp/words.txt.tmp:
https://github.com/lhncbc/metamaplite/blob/8aae39319a4a4b40a013180bf6cde09b172c78a8/src/main/java/gov/nih/nlm/nls/metamap/dfbuilder/GenerateVariants.java#L226-L227. It appears Java on Windows is unable to resolve the /tmp directory.

I was able to workaround the issue by adding the system property gv.words.temp.filename to the command line as just words.txt.tmp:

java -Xmx4g "-Dgv.words.temp.filename=words.txt.tmp" -cp %projectdir%\target\metamaplite-%MML_VERSION%-standalone.jar ^
     gov.nih.nlm.nls.metamap.dfbuilder.GenerateVariants ^
     %MRCONSO% %IVFDIR%\tables\vars.txt

Can this be added the create_indexes.bat script? Or, /tmp be replaced by something like System.getProperty("java.io.tmpdir")?

Not getting response in near real time

As your Document say , The primary goal of MetaMapLite to provide a near real-time named-entity recognizer which is not a rigorous as MetaMap but much faster while allowing users to customize and augment its behavior for specific purposes.

But when i tried with running metamaplite.sh as below, i am not getting response in real time. On an average it take 5-6 seconds.

./metamaplite.sh example.txt

As per below screen shot there should be significant improvement in term of processing time between metamap and metamaplite. But when i tried to run metamaplite its not giving response in near real time.

Please let me know if i need to do any set of configuration.

image

Metamaplite in other language.

First let me thank you for this great work.

I'm working as a part of research on NER in spanish text, I want to use metamap,
how do you think the pipeline is to achieve this goal?

MetamapLite cannot use outside of the dir 'public_mm_lite'?

image

Qhen I run the code outside of the dir 'public_mm_lite', it always show error of configuration failure of indexing dataset saying "data/ivf/2020AA/USAbase does not exist, aborting", although I have set indexdir with an absolute path like "./metamaplite.sh --indexdir=ABSOLUTE_PATH/data/ivf/2020AA/USAbase".

Is this a normal case?

Improper concept index in MMI output

For the input file: 00000086.txt

In MetaMap, the outputs used to come like these:

'00000086-0'|MMI|17.80|Mediastinum|C0025066|[blor]|["MEDIASTINUM"-tx-2-"mediastinum"-noun-0]|TX|50/11|A01.923.761.800.500
'00000086-73'|MMI|8.34|Lung|C0024109|[bpoc]|["LUNGS"-tx-1-"lungs"-noun-0]|TX|1/5|A04.411

But the output in MetaMapLite comes like these:
00000000.tx|MMI|2.37|Mediastinum|C0025066|[blor]|"Mediastinum"-text-0-"mediastinum"-NN-0|63/11|A01.923.761.800.500
00000000.tx|MMI|0.98|Lung|C0024109|[bpoc]|"Lungs"-text-0-"lungs"-NNS-0|103/5|A04.411

The first field in MetaMap output represents filename and the char start pos of the sentence. But in MetaMapLite that information is missing. This makes programs using that information fail.

I had tested using the release: MetaMapLite 3.6.1p1

`getBasename` selects higher-level directory if it contains dots/periods

When processing files without extensions, getBasename will split on a directory name if it contains periods in it:

https://github.com/lhncbc/metamaplite/blob/a7d10264a023afda497356f50faa4385ab7e3908/src/main/java/gov/nih/nlm/nls/ner/MetaMapLite.java#L1003-L1011

E.g., if I supply the path /mapr/r.ds/mml/file_0, it will attempt to write the output json file to /mapr/r.json rather than /mapr/r.ds/mml/file_0.json:

[main] INFO gov.nih.nlm.nls.ner.MetaMapLite - Loading and processing /mapr/r.ds/mml/file_0
[main] INFO gov.nih.nlm.nls.ner.MetaMapLite - outputing results to /mapr/r.json
Exception in thread "main" java.io.FileNotFoundException: /mapr/r.json (Operation not permitted)

Workaround
I've used the workaround of requiring all input files to have a .txt extension (e.g., /mapr/r.ds/mml/file_0.txt in the above example). This will put the output in the correct directory (e.g., /mapr/r.ds/mml/file_0.json).

How to change logging level?

How can I alter the logging level when running Metamaplite? I'd like to set it to at least WARN in order to reduce the amount of console output when running the application, but I can't seem to change the defaults.

I tried (on 3.6.2rc8):

  • Running metamaplite.bat but changing 'debug' to 'warn' in config/log4j2.xml
  • Specifying the changed log4j2.xml file with both the parameters log4j.configurationFile and metamaplite.log4jconfig in metamaplite.bat
  • Passing in the --log4jconfig option to point to config/log4j2.xml (this results in an error: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at gov.nih.nlm.nls.ner.MetaMapLite.main(MetaMapLite.java:1323)).

JVM Version Compatibility

Hello! I see from the README that MetaMapLite is tested/designed with Java 1.8; a quick experiment with Java 18 showed that it would compile just fine with a more modern dialect of the language, except for gov.nih.nlm.nls.tools.MedlineDomReader, which relies on a deprecated set of J2EE classes, specifically javax.xml.bind.annotation.W3CDomHandler. Has there been any thought to re-working this part of the program to use a different (and non-deprecated) method?

Negation Missing in JSON Output

The json output in version MetaMapLite 3.6.2rc6 (and previous versions) does not contain negation output from the NegEx or Context algorithms.

E.g., running echo "no covid-19" | ./metamaplite.sh --pipe --outputformat=json

[
  {
    "matchedtext": "covid-19",
    "evlist": [
      {
        "score": 0,
        "matchedtext": "covid-19",
        "start": 3,
        "length": 8,
        "id": "ev0",
        "conceptinfo": {
          "conceptstring": "COVID-19",
          "sources": [
            "MTH",
            "SNOMEDCT_US"
          ],
          "cui": "C5203670",
          "preferredname": "COVID-19",
          "semantictypes": [
            "dsyn"
          ]
        }
      }
    ],
    "docid": "00000000.tx",
    "start": 3,
    "length": 8,
    "id": "en0",
    "fieldid": "text"
  }
]

Whereas mmi output contains the information:

00000000.tx|MMI|0.46|COVID-19|C5203670|[dsyn]|"COVID-19"-text-0-"covid-19"-NN-1|text|3/8||

Can this be added?

public_mm_lite directory not found

I am trying to use the package as a standalone command line tool thus I may be missing some Java project magic.
How is it supposed to be created?

Indexing of UMLS subset

Hi,
I am following the Generating Tables section in the Design.md to create the index files for a custome subset of the UMLS. The CreateIndexes program runs without any errors (just the warnings about missing meshtcrelaxed.txt and vars.txt).

However, the irutils.MappedMultiKeyIndexLookup lookup command does not return any results when applied on the generated indexes. When applying the same command on the indices distributed with MetaMapLite, results are returned. When browsing the subset for which the indices are generated with Metamorphosys, the searched term is also found.

If there is any debug information I can provide, I'm more than happy to provide it. Thanks for your help!

Semantic Type/Source filtering happens after longest match exlcuding matches

If the string 'Metastatic prostate cancer' is processed with the semantic type of bpoc (Body Part, Organ or Organ Component) then no concepts are found.

This is because 'Metastatic prostate cancer' matches concept C1282496 and C0936223 but these are both neop (Neoplastic Process) concepts and are discarded when the semantic type filtering is applied.

I would have expected the longest match to be applied after the semantic type and source filtering.

i.e. the concept C0033572: prostate should have been returned.

Files not found lib/bioc-1.0.1.jar lib/context-2012.jar

Hello I can build with java. 1.8 all the maven steps except:

$ mvn install:install-file
-Dfile=lib/context-2012.jar
-DgroupId=context
-DartifactId=context
-Dversion=2012
-Dpackaging=jar

$ mvn install:install-file
-Dfile=lib/bioc-1.0.1.jar
-DgroupId=bioc
-DartifactId=bioc
-Dversion=1.0.1
-Dpackaging=jar

 Should I try to recreate context-2012.jar from https://github.com/chapmanbe/negex/tree/master/GeneralNegEx.Java.v.1.2.05092009

and Bioc from wherhwere else? Thank you for your help and maintaining this repo

w/system: a resource failed to call end(close).

Although this may not happen on desktop or laptop machine, directly using metamaplite in android produces w/system: a resource failed to call end(close). This error is basically caused by un-released fileinputstream or fileoutputstream. On android, the newly opened fileinputstream or fileoutputstream needs to be closed explicitly. In particular, this error comes from 2 sides when using metamaplite on android:

  1. from metamaplite:
    For example, in Example.java and Example2.java, the newly created FileReader is not closed after usage, so they need to be explicitly closed as shown below:
            FileReader fr = new FileReader("./config/metamaplite.properties");
            myProperties.load(fr);
            ...
            fr.close();
  1. from 3rd party library:
    For example, in org.apache.opennlp, opennlp/tools/util/model/BaseModel.java(loadModel) and opennlp/tools/util/model/BaseModel.java(finishLoadingArtifacts), ZipInputStream zip is not explicitly closed. This can be achieved by the following:
  • In loadModel method:
            final ZipInputStream zip = new ZipInputStream(in);
            ...
            zip.close()
  • In finishLoadingArtifacts
            ZipInputStream zip = new ZipInputStream((InputStream)in);
            ...
            zip.close()

If more attention is paied, here I replaced the final ZipInputStream zip = new ZipInputStream(in); with ZipInputStream zip = new ZipInputStream((InputStream)in);. Yes, only in this way I can re-compile the the opennlp source code into a jar file in maven. If there is an alternative, please let me know.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.