graphaware / neo4j-nlp Goto Github PK

View Code? Open in Web Editor NEW

335.0 48.0 82.0 21.55 MB

NLP Capabilities in Neo4j

Home Page: https://hume.graphaware.com/

Java 100.00%

neo4j nlp graph-database machine-learning algorithms stanford-corenlp opennlp

neo4j-nlp's Introduction

GraphAware Natural Language Processing Has Been Retired

As of May 2021, this repository has been retired.

GraphAware Natural Language Processing

This Neo4j plugin offers Graph Based Natural Language Processing capabilities.

The main module, this module, provide a common interface for underlying text processors as well as a Domain Specific Language built atop stored procedures and functions making your Natural Language Processing workflow developer friendly.

It comes in 2 versions, Community (open-sourced) and Enterprise with the following NLP features :

Feature Matrix

	Community Edition	Enterprise Edition
Text information Extraction	✔	✔
Multi-languages in the same database		✔
Custom NamedEntityRecognition model builder		✔
ConceptNet5 Enricher	✔	✔
Microsoft Concept Enricher	✔	✔
Keyword Extraction	✔	✔
TextRank Summarization	✔	✔
Topics Extraction		✔
Word Embeddings (Word2Vec)	✔	✔
Similarity Computation	✔	✔
PDF Parsing	✔	✔
Apache Spark Binding for Distributed Algorithms		✔
Doc2Vec implementation		✔
User Interface		✔
ML Prediction capabilities		✔
Entity Merging		✔

Two NLP processor implementations are available, respectively Stanford NLP and OpenNLP (OpenNLP receives less frequent updates, StanfordNLP is recommended).

Installation

From version 3.5.1.53.15 you need to download the language models, see below

From the GraphAware plugins directory, download the following jar files :

neo4j-framework (the JAR for this is labeled "graphaware-server-enterprise-all")
neo4j-nlp
neo4j-nlp-stanfordnlp
The language model to be downloaded from https://stanfordnlp.github.io/CoreNLP/#download

and copy them in the plugins directory of Neo4j.

Take care that the version numbers of the framework you are using match with the version of Neo4J you are using. This is a common setup problem. For example, if you are using Neo4j 3.4.0 and above, all of the JARs you download should contain 3.4 in their version number.

plugins/ directory example :

-rw-r--r--  1 ikwattro  staff    58M Oct 11 11:15 graphaware-nlp-3.5.1.53.14.jar
-rw-r--r--@ 1 ikwattro  staff    13M Aug 22 15:22 graphaware-server-community-all-3.5.1.53.jar
-rw-r--r--  1 ikwattro  staff    16M Oct 11 11:28 nlp-stanfordnlp-3.5.1.53.14.jar
-rw-r--r--@ 1 ikwattro  staff   991M Oct 11 11:45 stanford-english-corenlp-2018-10-05-models.jar

Append the following configuration in the neo4j.conf file in the config/ directory:

  dbms.unmanaged_extension_classes=com.graphaware.server=/graphaware
  com.graphaware.runtime.enabled=true
  com.graphaware.module.NLP.1=com.graphaware.nlp.module.NLPBootstrapper
  dbms.security.procedures.whitelist=ga.nlp.*

Start or restart your Neo4j database.

Note: both concrete text processors are quite greedy - you will need to dedicate sufficient memory for to Neo4j heap space.

Additionally, the following indexes and constraints are suggested to speed performance:

CREATE CONSTRAINT ON (n:AnnotatedText) ASSERT n.id IS UNIQUE;
CREATE CONSTRAINT ON (n:Tag) ASSERT n.id IS UNIQUE;
CREATE CONSTRAINT ON (n:Sentence) ASSERT n.id IS UNIQUE;
CREATE INDEX ON :Tag(value);

Or use the dedicated procedure :

CALL ga.nlp.createSchema()

Define which language you will use in this database :

CALL ga.nlp.config.setDefaultLanguage('en')

Quick Documentation in Neo4j Browser

Once the extension is loaded, you can see basic documentation on all available procedures by running this Cypher query:

CALL dbms.procedures() YIELD name, signature, description
WHERE name =~ 'ga.nlp.*'
RETURN name, signature, description ORDER BY name asc;

Getting Started

Text extraction

Pipelines and components

The text extraction phase is done with a Natural Language Processing pipeline, each pipeline has a list of enabled components.

For example, the basic tokenizer pipeline has the following components :

Sentence Segmentation
Tokenization
StopWords Removal
Stemming
Part Of Speech Tagging
Named Entity Recognition

It is mandatory to create your pipeline first :

CALL ga.nlp.processor.addPipeline({textProcessor: 'com.graphaware.nlp.processor.stanford.StanfordTextProcessor', name: 'customStopWords', processingSteps: {tokenize: true, ner: true, dependency: false}, stopWords: '+,result, all, during', 
threadNumber: 20})

The available optional parameters (default values are in brackets):

name: desired name of a new pipeline
textProcessor: to which text processor should the new pipeline be added
processingSteps: pipeline configuration (available in both Stanford and OpenNLP unless stated otherwise)
- tokenize (default: true): perform tokenization
- ner (default: true): Named Entity Recognition
- sentiment (default: false): run sentiment analysis on sentences
- coref (default: false): Coreference Resolution (identify multiple mentions of the same entity, such as "Barack Obama" and "he")
- relations (default: false): run relations identification between two tokens
- dependency (default: false, StanfordNLP only): extract typed dependencies (ex.: amod - adjective modifier, conj - conjunct, ...)
- cleanxml (default: false, StanfordNLP only): remove XML tags
- truecase (default: false, StanfordNLP only): recognizes the "true" case of tokens (how they would be capitalized in well-edited text)
- customNER: list of custom NER model identifiers (as a string, model identifiers separated by “,”)
stopWords: specify words that are required to be ignored (if the list starts with +, the following words are appended to the default stopwords list, otherwise the default list is overwritten)
threadNumber (default: 4): for multi-threading
excludedNER: (default: none) specify a list of NE to not be recognized in upper case, for example for excluding NER_Money and NER_O on the Tag nodes, use ['O', 'MONEY']

To set a pipeline as a default pipeline:

CALL ga.nlp.processor.pipeline.default(<your-pipeline-name>)

To delete a pipeline, use this command:

CALL ga.nlp.processor.removePipeline(<pipeline-name>, <text-processor>)

To see details of all existing pipelines:

CALL ga.nlp.processor.getPipelines()

Example

Let's take the following text as example :

Scores of people were already lying dead or injured inside a crowded Orlando nightclub,
and the police had spent hours trying to connect with the gunman and end the situation without further violence.
But when Omar Mateen threatened to set off explosives, the police decided to act, and pushed their way through a
wall to end the bloody standoff.

Simulate your original corpus

Create a node with the text, this node will represent your original corpus or knowledge graph :

CREATE (n:News)
SET n.text = "Scores of people were already lying dead or injured inside a crowded Orlando nightclub,
and the police had spent hours trying to connect with the gunman and end the situation without further violence.
But when Omar Mateen threatened to set off explosives, the police decided to act, and pushed their way through a
wall to end the bloody standoff.";

Perform the text information extraction

The extraction is done via the annotate procedure which is the entry point to text information extraction

MATCH (n:News)
CALL ga.nlp.annotate({text: n.text, id: id(n)})
YIELD result
MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)
RETURN result

Available parameters of annotate procedure:

text: text to annotate represented as a string
id: specify ID that will be used as id property of the new AnnotatedText node
textProcessor (default: "Stanford", if not available than the first entry in the list of available text processors)
pipeline (default: tokenizer)
checkLanguage (default: true): run language detection on provided text and check whether it's supported

This procedure will link your original :News node to an :AnnotatedText node which is the entry point for the graph based NLP of this particular News. The original text is broken down into words, parts of speech, and functions. This analysis of the text acts as a starting point for the later steps.

Running a batch of annotations

If you have a big set of data to annotate, we recommend to use APOC :

CALL apoc.periodic.iterate(
"MATCH (n:News) RETURN n",
"CALL ga.nlp.annotate({text: n.text, id: id(n)})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:true})

It is important to keep the batchSize and iterateList options as mentioned in the example. Running the annotation procedure in parallel will create deadlocks.

Enrich your original knowledge

We implement external knowledge bases in order to enrich the knowledge of your current data.

As of now, two implementations are available :

ConceptNet5
Microsoft Concept Graph

This enricher will extend the meaning of tokens (Tag nodes) in the graph.

MATCH (n:Tag)
CALL ga.nlp.enrich.concept({enricher: 'conceptnet5', tag: n, depth:1, admittedRelationships:["IsA","PartOf"]})
YIELD result
RETURN result

The available parameters (default values are in brackets):

tag: tag to be enriched
enricher ("conceptnet5"): choose microsoft or conceptnet5
depth (2): how deep to go in concept hierarchy
admittedRelationships: choose desired concept relationships types, please refer to the ConceptNet Documentation for details
pipeline: choose pipeline name to be used for cleansing of concepts before storing them to your DB; your system default pipeline is used otherwise
filterByLanguage (true): allow only concepts of languages specified in outputLanguages; if no languages are specified, the same language as tag is required
outputLanguages ([]): return only concepts with specified languages
relDirection ("out"): desired direction of relationships in concept hierarchy ("in", "out", "both")
minWeight (0.0): minimal admitted concept relationship weight
limit (10): maximal number of concepts per tag
splitTag (false): if true, tag is first tokenised and then individual tokens enriched

Tags have now a IS_RELATED_TO relationships to other enriched concepts.

List of available procedures

Keyword Extraction

MATCH (a:AnnotatedText)
CALL ga.nlp.ml.textRank({annotatedText: a, stopwords: '+,other,email', useDependencies: true})
YIELD result RETURN result

annotatedText is a mandatory parameter which refers to the annotated document that is required to be analyzed.

Available optional parameters (default values are in brackets):

keywordLabel (Keyword): label name of the keyword nodes
useDependencies (true): use universal dependencies to enrich extracted keywords and key phrases by tags related through COMPOUND and AMOD relationships
dependenciesGraph (false): use universal dependencies for creating tag co-occurrence graph (default is false, which means that a natural word flow is used for building co-occurrences)
cleanKeywords (true): run cleaning procedure
topXTags (1/3): set a fraction of highest-rated tags that will be used as keywords / key phrases
respectSentences (false): respect or not sentence boundaries for co-occurrence graph building
respectDirections (false): respect or not directions in co-occurrence graph (how the words follow each other)
iterations (30): number of PageRank iterations
damp (0.85): PageRank damping factor
threshold (0.0001): PageRank convergence threshold
removeStopwords (true): use a stopwords list for co-occurrence graph building and final cleaning of keywords
stopwords: customize stopwords list (if the list starts with +, the following words are appended to the default stopwords list, otherwise the default list is overwritten)
admittedPOSs: specify which POS labels are considered as keyword candidates; needed when using different language than English
forbiddenPOSs: specify list of POS labels to be ignored when constructing co-occurrence graph; needed when using different language than English
forbiddenNEs: specify list of NEs to be ignored

For a detailed TextRank algorithm description, please refer to our blog post about Unsupervised Keyword Extraction.

Using universal dependencies for keyword enrichment (useDependencies option) can result in keywords with unnecessary level of detail, for example a keyword space shuttle logistics program. In many use cases we might be interested to also know that given document speaks generally about space shuttle (or logistic program). To do that, run post-processing with one of these options:

direct - each key phrase of n number of tags is checked against all key phrases from all documents with 1 < m < n number of tags; if the former contains the latter key phrase, then a DESCRIBES relationship is created from the m-keyphrase to all annotated texts of the n-keyphrase
subgroups - the same procedure as for direct, but instead of connecting higher level keywords directly to AnnotatedTexts, they are connected to the lower level keywords with HAS_SUBGROUP relationships

// Important note: create subsequent indices to optimise the post-process method performance
CREATE INDEX ON :Keyword(numTerms)
CREATE INDEX ON :Keyword(value)

CALL ga.nlp.ml.textRank.postprocess({keywordLabel: "Keyword", method: "subgroups"})
YIELD result
RETURN result

keywordLabel is an optional argument set by default to "Keyword".

The postprocess operation by default is processing on all keywords, which can be very heavy on large graphs. You can specify the annotatedText on which to apply the postprocess operation with the annotatedText argument :

MATCH (n:AnnotatedText) WITH n LIMIT 100
CALL ga.nlp.ml.textRank.postprocess({annotatedText: n, method:'subgroups'}) YIELD result RETURN count(n)

Example for running it efficiently on the full set of Keywords with APOC :

CALL apoc.periodic.iterate(
'MATCH (n:AnnotatedText) RETURN n',
'CALL ga.nlp.ml.textRank.postprocess({annotatedText: n, method:"subgroups"}) YIELD result RETURN count(n)',
{batchSize: 1, iterateList:false}
)

TextRank Summarization

Similar approach to the keyword extraction can be employed to implement simple summarization. A densely connect graph of sentences is created, with Sentence-Sentence relationships representing their similarity based on shared words (number of shared words vs sum of logarithms of number of words in a sentence). PageRank is then used as a centrality measure to rank the relative importance of sentences in the document.

To run this algorithm:

MATCH (a:AnnotatedText)
CALL ga.nlp.ml.textRank.summarize({annotatedText: a}) YIELD result
RETURN result

Available parameters:

annotatedText
iterations (30): number of PageRank iterations
damp (0.85): PageRank damping factor
threshold (0.0001): PageRank convergence threshold

The summarisation procedure saves new properties to Sentence nodes: summaryRelevance (PageRank value of given sentence) and summaryRank (ranking; 1 = highest ranked sentence). Example query for retrieving summary:

match (n:Kapitel)-[:HAS_ANNOTATED_TEXT]->(a:AnnotatedText)
where id(n) = 233
match (a)-[:CONTAINS_SENTENCE]->(s:Sentence)
with a, count(*) as nSentences
match (a)-[:CONTAINS_SENTENCE]->(s:Sentence)-[:HAS_TAG]->(t:Tag)
with a, s, count(distinct t) as nTags, (CASE WHEN nSentences*0.1 > 10 THEN 10 ELSE toInteger(nSentences*0.1) END) as nLimit
where nTags > 4
with a, s, nLimit
order by s.summaryRank
with a, collect({text: s.text, pos: s.sentenceNumber})[..nLimit] as summary
unwind summary as sent
return sent.text
order by sent.pos

Sentiment Detection

You can also determine whether the text presented is positive, negative, or neutral. This procedure requires an AnnotatedText node, which is produced by ga.nlp.annotate above.

MATCH (t:MyNode)-[]-(a:AnnotatedText) 
CALL ga.nlp.sentiment(a) YIELD result 
RETURN result;

This procedure will simply return "SUCCESS" when it is successful, but it will apply the :POSITIVE, :NEUTRAL or :NEGATIVE label to each Sentence. As a result, when sentiment detection is complete, you can query for the sentiment of sentences as such:

MATCH (s:Sentence)
RETURN s.text, labels(s)

Language Detection

CALL ga.nlp.detectLanguage("What language is this in?") 
YIELD result return result

NLP based filter

CALL ga.nlp.filter({text:'On 8 May 2013,
    one week before the Pakistani election, the third author,
    in his keynote address at the Sentiment Analysis Symposium, 
    forecast the winner of the Pakistani election. The chart
    in Figure 1 shows varying sentiment on the candidates for 
    prime minister of Pakistan in that election. The next day, 
    the BBC’s Owen Bennett Jones, reporting from Islamabad, wrote 
    an article titled Pakistan Elections: Five Reasons Why the 
    Vote is Unpredictable, in which he claimed that the election 
    was too close to call. It was not, and despite his being in Pakistan, 
    the outcome of the election was exactly as we predicted.', filter: 'Owen Bennett Jones/PERSON, BBC, Pakistan/LOCATION'}) YIELD result 
return result

Cosine similarity computation

Once tags are extracted from all the news or other nodes containing some text, it is possible to compute similarities between them using content based similarity. During this process, each annotated text is described using the TF-IDF encoding format. TF-IDF is an established technique from the field of information retrieval and stands for Term Frequency-Inverse Document Frequency. Text documents can be TF-IDF encoded as vectors in a multidimensional Euclidean space. The space dimensions correspond to the tags, previously extracted from the documents. The coordinates of a given document in each dimension (i.e., for each tag) are calculated as a product of two sub-measures: term frequency and inverse document frequency.

MATCH (a:AnnotatedText) 
//WHERE ...
WITH collect(a) as nodes
CALL ga.nlp.ml.similarity.cosine({input: <list_of_annotated_texts>[, query: <tfidf_query>, relationshipType: "CUSTOM_SIMILARITY", ...]}) YIELD result
RETURN result

Available parameters (default values are in brackets):

input: list of input nodes - AnnotatedTexts
relationshipType (SIMILARITY_COSINE): type of similarity relationship, use it along with query
query: specify your own query for extracting tf and idf in form ... RETURN id(Tag), tf, idf
propertyName (value): name of an existing node property (array of numerical values) which contains already prepared document vector

Word2vec

Word2vec is a shallow two-layer neural network model used to produce word embeddings (words represented as multidimensional semantic vectors) and it is one of the models used in ConceptNet Numberbatch.

To add source model (vectors) into a Lucene index

CALL ga.nlp.ml.word2vec.addModel(<path_to_source_dir>, <path_to_index>, <identifier>)

<path_to_source_dir> is a full path to the directory with source vectors to be indexed
<path_to_index> is a full path where the index will be stored
<identifier> is a custom string that uniquely identifies the model

To list available models:

CALL ga.nlp.ml.word2vec.listModels

The model can now be used to compute cosine similarities between words:

WITH ga.nlp.ml.word2vec.wordVector('äpple', 'swedish-numberbatch') AS appleVector,
ga.nlp.ml.word2vec.wordVector('frukt', 'swedish-numberbatch') AS fruitVector
RETURN ga.nlp.ml.similarity.cosine(appleVector, fruitVector) AS similarity

1st parameter: word
2nd parameter: model identifier

Or you can ask directly for a word2vec of a node which has a word stored in property value:

MATCH (n1:Tag), (n2:Tag)
WHERE ...
WITH ga.nlp.ml.word2vec.vector(n1, <model_name>) AS vector1,
ga.nlp.ml.word2vec.vector(n2, <model_name>) AS vector2
RETURN ga.nlp.ml.similarity.cosine(vector1, vector2) AS similarity

We can also permanently store the word2vec vectors to Tag nodes:

CALL ga.nlp.ml.word2vec.attach({query:'MATCH (t:Tag) RETURN t', modelName:'swedish-numberbatch'})

query: query which returns tags to which embedding vectors should be attached
modelName: model to use

You can also get the nearest neighbors with the following procedure :

CALL ga.nlp.ml.word2vec.nn('analyzed', 10, 'fasttext') YIELD word, distance RETURN word, distance

For large models, for example full fasttext for english, approximately 2 million words, it will be inefficient to compute the nearest neighbors on the fly.

You can load the model into memory in order to have faster nearest neighbors ( fasttext 1M word vectors generally takes 27 seconds if needed to read from disk, ~300ms in memory) :

Make sure to have efficient heap memory dedicated to Neo4j :

dbms.memory.heap.initial_size=3000m
dbms.memory.heap.max_size=5000m

Load the model into memory :

CALL ga.nlp.ml.word2vec.load(<maxNeighbors>, <modelName>)

And retrieve it with

CALL ga.nlp.ml.word2vec.nn(<word>,<maxNeighbors>,<modelName>)

Using other models

You can use any word embedding model as long as the following is true :

Every line contain the word + the vector
The file has a .txt extension

For example, you can load the models from fasttext and just rename the file from .vec to .txt : https://fasttext.cc/docs/en/english-vectors.html

Parsing PDF Documents

CALL ga.nlp.parser.pdf("file:///Users/ikwattro/_graphs/nlp/import/myfile.pdf") YIELD number, paragraphs

The procedure return rows with columns number being the page number and paragraphs being a List<String> of paragraph texts.

You can also pass an http or https url to the procedure for loading a file from a remote location.

Exclude content from the pdf

In some cases, pdf documents have some recurrent useless content like page footers etc, you can excluded them from the parsing by passing a list of regexes defining the parts to exclude :

CALL ga.nlp.parser.pdf("myfile.pdf", ["^[0-9]$","^Licensed to"])

Use a different user Agent than TIKA

TIKA can be recognized as crawler and be denied access to some sites containing pdf's. You can override this by passing a UserAgent option :

CALL ga.nlp.parser.pdf($url, [], {UserAgent: 'Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.7.2) Gecko/20040803'})

Extras

Parsing raw content from a file

RETURN ga.nlp.parse.raw(<path-to-file>) AS content

Storing only certain Tag/Tokens

In certain situations, it would be useful to store only certain values instead of the full graph, note though that it might reduce the ability to extract insights ( textRank ) for eg :

CALL ga.nlp.processor.addPipeline({
name:"whitelist",
whitelist:"hello,john,ibm",
textProcessor:"com.graphaware.nlp.enterprise.processor.EnterpriseStanfordTextProcessor",
processingSteps:{tokenize:true, ner:true}})

CALL ga.nlp.annotate({text:"Hello, my name is John and I worked at IBM.", id:"test-123", pipeline:"whitelist", checkLanguage:false})
YIELD result
RETURN result

Parsing WebVTT

WebVTT is the format for Web Video Text Tracks, such as Youtube Transcripts of videos : https://fr.wikipedia.org/wiki/WebVTT

CALL ga.nlp.parser.webvtt("url-to-transcript.vtt") YIELD startTime, endTime, text

Listing files from directory(ies)

CALL ga.nlp.utils.listFiles(<path-to-directory>, <extensionFilter>)

// eg:

CALL ga.nlp.utils.listFiles("/Users/ikwattro/dev/papers", ".pdf") YIELD filePath RETURN filePath

The above procedure list files of the current directory only, if you need to walk the children directories as well, use walkdir :

CALL ga.nlp.utils.walkdir("/Users/ikwattro/dev/papers", ".pdf") YIELD filePath RETURN filePath

Additional Procedures

ga.nlp.config.model.list()

List stored models and their paths

ga.nlp.refreshPipeline()

Remove and re-create a pipeline with the same configuration ( useful when using static ner files that have been changed for eg )

License

GraphAware is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

neo4j-nlp's People

Contributors

Stargazers

Watchers

Forkers

manojprasanth moxious pininja369 rakacza luanne billho glacierck gopicares nirmalvatsyayan icavestica calvinalvin mars-wei avinashtgoje csageder zhoujialinmumu desims tangyuan1994 wangguojie jiyulongxu muralimani shaiful-hisham takashihakase tiagoooliveira dot2gua mojitmj whitten leichangqing milenkomarkovic carloslema vanyaseth fancyerii reynoldsm88 linmu7177 derrickfwang allensmile eddings husttb ufukhurriyetoglu angelo337 sevaroy hemant-jain shanmukhsista masterquan qybo1234 leo731121 roysh jiangbin216 codeofgod ai-learn-use v1ckeyr databill86 mainjavagmail zhanghua123 bigdataz srravula1 davidlanz harrythebarry srihari-palivela sumanthreddykaliki dharmesh78 villadsclaes gyannetics karthik-kk ebrucucen smartgamer kaizhiyu mariapapadopoulou bestjex djrobinson ikwattro magaton ian-donaldson hadryan akhilkishore nirav1929 geelion fanmous tantailong alex-soldatkin thebigduck jonnyb111

neo4j-nlp's Issues

Neo4j fails to start with GraphAware NLP plugin, because of java version error

I am on ubuntu (16.04) and running Neo4j (3.4.0, from tarball), and using Java 8 (HotSpot 1.8.0_171).

I have the following plugins:

graphaware-nlp-3.4.0.52.8.jar
nlp-stanfordnlp-3.4.0.52.8.jar
graphaware-server-enterprise-all-3.4.0.52.jar

I get the following error on startup:

Directories in use:
  home:         /home/ad/neo4j-community-3.4.0
  config:       /home/ad/neo4j-community-3.4.0/conf
  logs:         /home/ad/neo4j-community-3.4.0/logs
  plugins:      /home/ad/neo4j-community-3.4.0/plugins
  import:       /home/ad/neo4j-community-3.4.0/import
  data:         /home/ad/neo4j-community-3.4.0/data
  certificates: /home/ad/neo4j-community-3.4.0/certificates
  run:          /home/ad/neo4j-community-3.4.0/run
Starting Neo4j.
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
2018-06-01 10:27:19.089+0000 INFO  ======== Neo4j 3.4.0 ========
2018-06-01 10:27:19.115+0000 INFO  Starting...
2018-06-01 10:27:20.737+0000 INFO  Bolt enabled on 127.0.0.1:7687.
2018-06-01 10:27:20.739+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime enabled, bootstrapping...
2018-06-01 10:27:20.755+0000 INFO [c.g.r.b.RuntimeKernelExtension] Bootstrapping module with order 1, ID NLP, using com.graphaware.nlp.module.NLPBootstrapper
2018-06-01 10:27:20.807+0000 INFO  Registering module NLP with GraphAware Runtime.
2018-06-01 10:27:20.811+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime bootstrapped, starting the Runtime...
2018-06-01 10:27:24.105+0000 INFO  Shutting down GraphAware Runtime... 
2018-06-01 10:27:24.105+0000 INFO  Shutting down module NLP
2018-06-01 10:27:24.105+0000 INFO  Terminating task scheduler...
2018-06-01 10:27:24.106+0000 INFO  Task scheduler terminated successfully.
2018-06-01 10:27:24.106+0000 INFO  GraphAware Runtime shut down.
2018-06-01 10:27:24.109+0000 ERROR [c.g.r.b.RuntimeKernelExtension] Could not start GraphAware Runtime because the database didn't get to a usable state within 5 minutes.
2018-06-01 10:27:24.115+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@55fc6344' was successfully initialized, but failed to start. Please see the attached cause exception "module-info has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@55fc6344' was successfully initialized, but failed to start. Please see the attached cause exception "module-info has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@55fc6344' was successfully initialized, but failed to start. Please see the attached cause exception "module-info has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0".
	at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:220)
	at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:111)
	at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:79)
	at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@55fc6344' was successfully initialized, but failed to start. Please see the attached cause exception "module-info has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0".
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
	at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:212)
	... 3 more
....
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.proc.Procedures@149162a4' was successfully initialized, but failed to start. Please see the attached cause exception "module-info has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0".
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
	at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
	at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:208)
	... 9 more
Caused by: java.lang.UnsupportedClassVersionError: module-info has been compiled by a more recent version of the Java Runtime (class file version 53.0), this version of the Java Runtime only recognizes class file versions up to 52.0
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.neo4j.kernel.impl.proc.ProcedureJarLoader$1.fetchNextOrNull(ProcedureJarLoader.java:137)
	at org.neo4j.kernel.impl.proc.ProcedureJarLoader$1.fetchNextOrNull(ProcedureJarLoader.java:114)
	at org.neo4j.collection.PrefetchingRawIterator.peek(PrefetchingRawIterator.java:50)
	at org.neo4j.collection.PrefetchingRawIterator.hasNext(PrefetchingRawIterator.java:36)
	at org.neo4j.kernel.impl.proc.ProcedureJarLoader.loadProcedures(ProcedureJarLoader.java:87)
	at org.neo4j.kernel.impl.proc.ProcedureJarLoader.loadProceduresFromDir(ProcedureJarLoader.java:78)
	at org.neo4j.kernel.impl.proc.Procedures.start(Procedures.java:323)
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
	... 11 more
2018-06-01 10:27:24.116+0000 INFO  Neo4j Server shutdown initiated by request

This seems to be a class version error, where the library has been compiled against Java 9. However, Oracle Java 9 is no longer supported, and Neo4j does not work with OpenJDK Java 9. And Neo4j does not work with higher versions (Oracle Java 10).

Please advise.

ga.nlp.ml.word2vec.attach gives java.lang.RuntimeException: Error

When using the following commando match (n:Tag) call ga.nlp.ml.word2vec.attach(n) YIELD result return result

It gives
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure ga.nlp.ml.word2vec.attach: Caused by: java.lang.RuntimeException: Error

Installed base

Neo 3.3.1
graphaware-nlp-3.3.1.51.2
nlp-opennlp-3.3.1.51.2
nlp-stanfordnlp-3.3.1.51.2
graphaware-server-community-all-3.3.1.51

Document how to choose the conceptnet relationships we want

What if any indexes should be created?

I'm running this, on about 36k tweets:

/* Annotate with language */
MATCH (t:Tweet)
CALL ga.nlp.detectLanguage(t.text)
YIELD result
SET t.language = result
RETURN count(t);

/* Only supports english */
MATCH (t:Tweet { language: "en" })
CALL ga.nlp.annotate({text: t.text, id: id(t)})
YIELD result
MERGE (t)-[:HAS_ANNOTATED_TEXT]->(result)
RETURN count(result);

Language detection worked great. Annotation is drastically slowing down as the server processes more entries. Typically this would be a sign that some index ought to be being used, but I don't know which. Any suggestions?

Server log entries look like this:

[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.TextProcessorsManager - Using text processor: com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor
[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - POS: [NN] ne: [O] lemma: discovery
[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - POS: [NN] ne: [O] lemma: success
[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - POS: [NNP] ne: [O] lemma: discovery
[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - POS: [NNS, VBZ] ne: [O] lemma: breed
[neo4j.Pooled-3] WARN com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - extractPhrases(): phrases index empty, aborting extraction
[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - POS: [NNS] ne: [O] lemma: thought
[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - POS: [NNS] ne: [O] lemma: question
[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - POS: [NN] ne: [O] lemma: quest
[neo4j.Pooled-3] WARN com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - extractPhrases(): phrases index empty, aborting extraction
[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - POS: [VBP] ne: [O] lemma: think2017-11-12 16:17:32.212+0000 INFO  Start storing annotatedText 53528

[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - POS: [IN] ne: [O] lemma: like
[neo4j.Pooled-3] INFO com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - POS: [NNP] ne: [O] lemma: champion
[neo4j.Pooled-3] WARN com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor - extractPhrases(): phrases index empty, aborting extraction
2017-11-12 16:17:32.550+0000 INFO  end storing annotatedText 53528

At the beginning of the process for low "annotatedText" numbers, this was going by maybe 10/sec. Late in the load, it's more like 1/sec. That kind of a rate gets to be a problem quickly with a large corpus.

textprocessor dosent seem to care about stopword

Step1:
CALL ga.nlp.processor.addPipeline ({textProcessor: 'com.graphaware.nlp.processor.stanford.StanfordTextProcessor', name: 'customStopwords', stopWords: '+,själv, dig, från, vilkas, dem, ett, varit, varför, att, era, som, dess, skulle, våra, på, sådana, har, blivit, det, vad, eller, sin, efter, i, varje, sådan, de, ditt, han, dessa, vi, med, då, den, mig, denna, ingen, under, henne, sådant, du, hade, vilken, till, över, vår, är, jag, nu, sedan, hans, vid, vara, hur, min, här, sitta, än, ju, blev, ut, bli, sina, hennes, detta, oss, alla, någon, allt, utan, blir, några, åt, vårt, där, samma, inte, inom, hon, något, upp, honom, var, sig, vilket, vart, er, och, kunde, ej, vars, mot, men, ni, ha, din, ert, för, mina, vilka, så, kan, vem, man, en, icke, mitt, när, mycket, deras, mellan, om, dina, av', threadNumber: 20})

Step 2
MATCH (n:Problem)
CALL ga.nlp.annotate({text: n.text, id: id(n)})
YIELD result
MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)
RETURN result

The above annotation still processes the above listed stopwords.
For example there are Tag that contains the value "dig"

No Procedure with ga.nlp.search registered for this database instance.

If I call the following procedure, CALL ga.nlp.search("Chatbot") YIELD result, score
MATCH (result)<-[]-(news:News)
RETURN DISTINCT news, score
ORDER BY score desc;
I get the error "There is no procedure with the name ga.nlp.search registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed." Am I missing anything?

I have installed the following plugins for neo4j-
graphaware-nlp-3.3.1.51.2
graphaware-server-community-all-3.3.1.51
nlp-stanfordnlp-3.3.1.51.2
graphaware-neo4j-to-elasticsearch-3.3.1.51.7

Issue with (en) while ingesting from concept net

Sometimes happen that concept net 5 return a phrase/word with (en) at the end. This causes error while navigating the hierarchy.

[ga.nlp.concept ()] Illegal character in path

Hi again,

This query in my graph (my graph is already annotated) :

MATCH (a:AnnotatedText)
CALL ga.nlp.concept({node:a, depth: 2}) YIELD result
RETURN result

Returns :

java.lang.IllegalArgumentException: Illegal character in path at index 47: http://conceptnet5.media.mit.edu/data/5.4/c/en/�

custom pipelines unique name check

Duplicate tags with "n/a" as language

MATCH (n:Tag) WHERE n.value = "Edit" RETURN n

╒══════════════════════════════╕
│"n"                           │
╞══════════════════════════════╡
│{"pos":["NNP"],"ne":["O"],"lan│
│guage":"en","id":"Edit_en","va│
│lue":"Edit"}                  │
├──────────────────────────────┤
│{"pos":["NNP"],"ne":["O"],"lan│
│guage":"n/a","id":"Edit_n/a","│
│value":"Edit"}                │
└──────────────────────────────┘

Unable to start neo4j 3.3.4 with graphaware nlp plugins

My error seems to be slightly different than the one reported #81 hence I've created another issue.

My configuration:

OSX 10.13.4
neo4j 3.3.4
java jdk jdk-8.0_172.jdk

graphaware components installed in plugin folder:

graphaware-nlp-3.4.0.52.8.jar
graphaware-server-enterprise-all-3.4.0.52.jar
nlp-stanfordnlp-3.4.0.52.8.jar

Changes to the conf file:

# dbms.directories.import=import
dbms.security.auth_enabled=false
dbms.memory.heap.initial_size=2512m
dbms.memory.heap.max_size=4512m  # should be sufficient to start?

dbms.unmanaged_extension_classes=com.graphaware.server=/graphaware
com.graphaware.runtime.enabled=true
com.graphaware.module.NLP.1=com.graphaware.nlp.module.NLPBootstrapper
dbms.security.procedures.whitelist=ga.nlp.*

Before starting the file with graphaware plugins I delete the database, to start with a clean slare.

This is the error:

2018-05-24 13:04:36.466+0000 INFO  ======== Neo4j 3.3.4 ========
2018-05-24 13:04:36.664+0000 INFO  Starting...
2018-05-24 13:04:38.616+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2018-05-24 13:04:40.533+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime enabled, bootstrapping...
2018-05-24 13:04:40.569+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@14e2e1c3' was successfully initialized, but failed to start. Please see the attached cause exception "org.neo4j.kernel.impl.core.EmbeddedProxySPI". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@14e2e1c3' was successfully initialized, but failed to start. Please see the attached cause exception "org.neo4j.kernel.impl.core.EmbeddedProxySPI".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@14e2e1c3' was successfully initialized, but failed to start. Please see the attached cause exception "org.neo4j.kernel.impl.core.EmbeddedProxySPI".
	at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:220)
	at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:111)
	at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:79)
	at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@14e2e1c3' was successfully initialized, but failed to start. Please see the attached cause exception "org.neo4j.kernel.impl.core.EmbeddedProxySPI".
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
	at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:212)
	... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, /Users/e/engines/neo4j/3.3.4/libexec/data/databases/graph.db
	at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
	at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:126)
	at org.neo4j.server.CommunityNeoServer.lambda$static$0(CommunityNeoServer.java:58)
	at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:88)
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
	... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'com.graphaware.runtime.bootstrap.RuntimeKernelExtension@12f8b1d8' was successfully initialized, but failed to start. Please see the attached cause exception "org.neo4j.kernel.impl.core.EmbeddedProxySPI".
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
	at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
	at org.neo4j.kernel.extension.KernelExtensions.start(KernelExtensions.java:84)
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
	at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
	at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:207)
	... 9 more
Caused by: java.lang.NoClassDefFoundError: org/neo4j/kernel/impl/core/EmbeddedProxySPI
	at com.graphaware.common.kv.GraphKeyValueStore.<init>(GraphKeyValueStore.java:31)
	at com.graphaware.runtime.metadata.GraphPropertiesMetadataRepository.<init>(GraphPropertiesMetadataRepository.java:57)
	at com.graphaware.runtime.GraphAwareRuntimeFactory.createRuntime(GraphAwareRuntimeFactory.java:63)
	at com.graphaware.runtime.bootstrap.RuntimeKernelExtension.start(RuntimeKernelExtension.java:111)
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
	... 14 more
Caused by: java.lang.ClassNotFoundException: org.neo4j.kernel.impl.core.EmbeddedProxySPI
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 19 more
2018-05-24 13:04:40.570+0000 INFO  Neo4j Server shutdown initiated by request

Case sensitiveness for Tags

MATCH (n:Memory)-[:HAS_CONTENT]->(mrc)-[:HAS_ANNOTATED_TEXT]->(at)-[:CONTAINS_SENTENCE]->(s)-[:HAS_TAG]->(tag)
WHERE id(n) = 3430
WITH tag, count(*) AS c ORDER BY c DESC
LIMIT 50
MATCH (tag)-[:IS_RELATED_TO]->(other)
RETURN tag.value, c, collect(other.value)

tag.value    c    collect(other.value)
open    4    [expose, governance, Governance, programming, interval, mathematics, chess, yield, secret, introduction, opportunity]

Governance + governance

Customized Node/Rel Labels

It would be very useful to be able to customize the node labels, to avoid collisions with existing models, or to aid mnemonic and visual separation. For my needs, it would be sufficient to set a label prefix used for all generated labels. While I don't see the need to mess with the relationship names, perhaps some would want that.

ga.nlp.ml.similarity.cosine

This is simular to an old issue.

What I want to accomplish:
Create a simularty relation between all annotated text (simularity)
When using the following command:
MATCH (a:AnnotatedText) with collect(a) as list
CALL ga.nlp.ml.similarity.cosine({input:list})
YIELD result RETURN result

It only return "2" but no relations are created.

In previous version I used
MATCH (a:AnnotatedText) with collect(a) as list CALL ga.nlp.ml.similarity.cosine(list, null, "SIMULARITY") YIELD result return result
Which worked fine!

How should the procedure be called in order to have relations created between annotaded nodes?
In the readme it states CALL ga.nlp.ml.cosine.compute({}) YIELD result but that does not work

Tested on the following plug-ins:
graphaware-nlp-3.3.2.52.6.jar
graphaware-server-enterprise-all-3.3.3.52.jar
nlp-opennlp-3.3.2.52.6.jar
nlp-stanfordnlp-3.3.2.52.6.jar

@ikwattro

ConceptNet5 takes forever

I not sure if this an error or if should be this way but when I run the below command
CALL ga.nlp.enrich.concept({enricher: 'conceptnet5', tag: n, depth:2, admittedRelationships:["IsA","PartOf"]})

it takes forever to compleate >5 hours (i have never waited than this). There are quite a lot of tags (data comes from a book) but it is not a huge amout a words (from my prespective)

regards
andreas

Skipping words with å ä ö

Errordescription

When using ga.nlp.annotate checkLanguage:false on a text containing Words with å ä ö, annotate skips the entire words.
Example of text in a node
"Det är idag kallt. Året är 2018"

The Words "är" and "året" seems to be skipped by the annotate funktions as Nodes "Tag" and "TagOccurence"

Installed base

Installed package on a Neo 3.3.1:
graphaware-nlp-3.3.1.51.2
nlp-opennlp-3.3.1.51.2
nlp-stanfordnlp-3.3.1.51.2
graphaware-server-community-all-3.3.1.51

Used comand:

MATCH (n:Dokument) CALL ga.nlp.annotate({text: n.text, id: id(n), checkLanguage: false}) YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result) RETURN result

enrich.concept Long->Integer ClassCastException

MATCH (n:Tag {value:"college"})
WITH n
CALL ga.nlp.enrich.concept({enricher:"conceptnet5", tag:n, depth:2, admittedRelationships:["RelatedTo","UsedFor","PartOf"], limit:50})
YIELD result
RETURN true

Failed to invoke procedure `ga.nlp.enrich.concept`: Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer

Do not store NER_O label

Or make it configurable

Don't store NER_O label

When a tag is not identified as any named entity, don't store any additional node label (currently it's NER_O).

[NERDemo] Compilation error, bad package name

Hi!

Just cloning the neo4j-nlp Graphaware project and I have a compilation error on the class NERDemo.

Indeed the class NERDemo is in the package:
com.graphaware.nlp.processor.stanford

And its package instruction is:
package com.graphaware.nlp.processor;

No need to use it, it is simply for your information.

Best regards.

[Enricher] Deprecate Microsoft due to service closing

Unable to start neo4j server. Getting the following error. Kindly help me resolve this.

2018-05-21 16:14:13.467+0000 INFO Starting...
2018-05-21 16:14:19.615+0000 INFO Bolt enabled on 127.0.0.1:7687.
2018-05-21 16:14:25.887+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime enabled, bootstrapping...
2018-05-21 16:14:25.915+0000 INFO [c.g.r.b.RuntimeKernelExtension] Bootstrapping module with order 1, ID NLP, using com.graphaware.nlp.module.NLPBootstrapper
2018-05-21 16:14:26.003+0000 INFO Registering module NLP with GraphAware Runtime.
2018-05-21 16:14:26.005+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime bootstrapped, starting the Runtime...
2018-05-21 16:14:49.456+0000 INFO Shutting down GraphAware Runtime...
2018-05-21 16:14:49.456+0000 INFO Shutting down module NLP
2018-05-21 16:14:49.456+0000 INFO Terminating task scheduler...
2018-05-21 16:14:49.457+0000 INFO Task scheduler terminated successfully.
2018-05-21 16:14:49.457+0000 INFO GraphAware Runtime shut down.
2018-05-21 16:14:49.462+0000 ERROR [c.g.r.b.RuntimeKernelExtension] Could not start GraphAware Runtime because the database didn't get to a usable state within 5 minutes.
2018-05-21 16:14:49.464+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@59d31e9e' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@59d31e9e' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@59d31e9e' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:220)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:111)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:79)
at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@59d31e9e' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:212)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, /usr/local/Cellar/neo4j/3.3.4/libexec/data/databases/graph.db
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:126)
at org.neo4j.server.CommunityNeoServer.lambda$static$0(CommunityNeoServer.java:58)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:88)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.proc.Procedures@6b0e2e81' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:207)
... 9 more
Caused by: java.lang.VerifyError: Cannot inherit from final class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader$1.fetchNextOrNull(ProcedureJarLoader.java:141)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader$1.fetchNextOrNull(ProcedureJarLoader.java:118)
at org.neo4j.collection.PrefetchingRawIterator.peek(PrefetchingRawIterator.java:50)
at org.neo4j.collection.PrefetchingRawIterator.hasNext(PrefetchingRawIterator.java:36)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader.loadProcedures(ProcedureJarLoader.java:91)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader.loadProceduresFromDir(ProcedureJarLoader.java:82)
at org.neo4j.kernel.impl.proc.Procedures.start(Procedures.java:276)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 11 more
2018-05-21 16:14:49.465+0000 INFO Neo4j Server shutdown initiated by request

getPipelines procedure should be more friendly

add more info like number of threads, options, stopwords etc. remove the need for providing a name.

Application thread is blocked

Hi,

I am not sure this is a neo4j-nlp related but I have the following problem.

I need to run sentiment analysis on a large number of texts (2.5mil). I am testing how to approach it, so far I've come up with

// NLP iterate sentiment call apoc.periodic.iterate( "MATCH (a:AnnotatedText) WHERE size(labels(a)) = 1 RETURN a LIMIT 10000", "CALL ga.nlp.sentiment(a) YIELD result RETURN result;" ,{batchSize: 1000, iterateList: false})

This approach works for a while, but when I check the logs I get a warning messages like '
2018-01-22 12:53:00.371+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 14301ms.`

What could I do to improve performance and get rid of these messages. I am using neo4j 3.3 enterprise and the latest version of the plugin (for apoc and nlp framework).

Thanks.

Procedure not found....

Hi, Kindly help me on this

getting error on running nlp.annotate procedure in Neo4j community edition

Specifiy db schema for NLP

We need to specify in the docs the constraints on indexes needed

ERROR Unsupported language : n/a -- Java 1.8.0_161

2018-03-07 20:47:09.391+0000 ERROR Unsupported language : n/a
java.lang.RuntimeException: Unsupported language : n/a
at com.graphaware.nlp.NLPManager.checkTextLanguage(NLPManager.java:253)
at com.graphaware.nlp.NLPManager.annotateTextAndPersist(NLPManager.java:135)
at com.graphaware.nlp.NLPManager.annotateTextAndPersist(NLPManager.java:130)
at com.graphaware.nlp.dsl.procedure.AnnotateProcedure.annotate(AnnotateProcedure.java:39)
at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
at org.neo4j.kernel.impl.proc.ReflectiveProcedureCompiler$ReflectiveProcedure.apply(ReflectiveProcedureCompiler.java:598)
at org.neo4j.kernel.impl.proc.ProcedureRegistry.callProcedure(ProcedureRegistry.java:201)
at org.neo4j.kernel.impl.proc.Procedures.callProcedure(Procedures.java:256)
at org.neo4j.kernel.impl.api.OperationsFacade.callProcedure(OperationsFacade.java:1440)
at org.neo4j.kernel.impl.api.OperationsFacade.procedureCallWrite(OperationsFacade.java:1394)
at org.neo4j.cypher.internal.spi.v3_3.TransactionBoundQueryContext$$anonfun$20.apply(TransactionBoundQueryContext.scala:706)
at org.neo4j.cypher.internal.spi.v3_3.TransactionBoundQueryContext$$anonfun$20.apply(TransactionBoundQueryContext.scala:706)
at org.neo4j.cypher.internal.spi.v3_3.TransactionBoundQueryContext.callProcedure(TransactionBoundQueryContext.scala:726)
at org.neo4j.cypher.internal.spi.v3_3.TransactionBoundQueryContext.callReadWriteProcedure(TransactionBoundQueryContext.scala:707)
at org.neo4j.cypher.internal.compatibility.v3_3.ExceptionTranslatingQueryContext$$anonfun$callReadWriteProcedure$1.apply(ExceptionTranslatingQueryContext.scala:153)
at org.neo4j.cypher.internal.compatibility.v3_3.ExceptionTranslatingQueryContext$$anonfun$callReadWriteProcedure$1.apply(ExceptionTranslatingQueryContext.scala:153)
at org.neo4j.cypher.internal.spi.v3_3.ExceptionTranslationSupport$class.translateException(ExceptionTranslationSupport.scala:32)
at org.neo4j.cypher.internal.compatibility.v3_3.ExceptionTranslatingQueryContext.translateException(ExceptionTranslatingQueryContext.scala:39)
at org.neo4j.cypher.internal.spi.v3_3.ExceptionTranslationSupport$class.translateIterator(ExceptionTranslationSupport.scala:45)
at org.neo4j.cypher.internal.compatibility.v3_3.ExceptionTranslatingQueryContext.translateIterator(ExceptionTranslatingQueryContext.scala:39)
at org.neo4j.cypher.internal.compatibility.v3_3.ExceptionTranslatingQueryContext.callReadWriteProcedure(ExceptionTranslatingQueryContext.scala:153)
at org.neo4j.cypher.internal.spi.v3_3.DelegatingQueryContext.callReadWriteProcedure(DelegatingQueryContext.scala:217)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.executionplan.EagerReadWriteCallMode.callProcedure(ProcedureCallMode.scala:55)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.pipes.ProcedureCallPipe$$anon$$$$ad93371f26992c1f243949f3f9c28e5$$$$ateResultsByAppending$1.apply(ProcedureCallPipe.scala:68)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.pipes.ProcedureCallPipe$$anon$$$$ad93371f26992c1f243949f3f9c28e5$$$$ateResultsByAppending$1.apply(ProcedureCallPipe.scala:66)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator$$anonfun$hasNext$1.apply$mcZ$sp(ResultIterator.scala:59)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator$$anonfun$hasNext$1.apply(ResultIterator.scala:57)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator$$anonfun$hasNext$1.apply(ResultIterator.scala:57)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator$$anonfun$failIfThrows$1.apply(ResultIterator.scala:84)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator.decoratedCypherException(ResultIterator.scala:93)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator.failIfThrows(ResultIterator.scala:82)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator.hasNext(ResultIterator.scala:56)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator.foreach(ResultIterator.scala:48)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator.to(ResultIterator.scala:48)
at scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:294)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator.toList(ResultIterator.scala:48)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.EagerResultIterator.(ResultIterator.scala:34)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.ClosingIterator.toEager(ResultIterator.scala:52)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.buildResultIterator(DefaultExecutionResultBuilderFactory.scala:120)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.createResults(DefaultExecutionResultBuilderFactory.scala:103)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.build(DefaultExecutionResultBuilderFactory.scala:74)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.BuildInterpretedExecutionPlan$$anonfun$getExecutionPlanFunction$1.apply(BuildInterpretedExecutionPlan.scala:100)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.BuildInterpretedExecutionPlan$$anonfun$getExecutionPlanFunction$1.apply(BuildInterpretedExecutionPlan.scala:83)
at org.neo4j.cypher.internal.compatibility.v3_3.runtime.BuildInterpretedExecutionPlan$InterpretedExecutionPlan.run(BuildInterpretedExecutionPlan.scala:116)
at org.neo4j.cypher.internal.compatibility.v3_3.Compatibility$ExecutionPlanWrapper$$anonfun$run$1.apply(Compatibility.scala:183)
at org.neo4j.cypher.internal.compatibility.v3_3.Compatibility$ExecutionPlanWrapper$$anonfun$run$1.apply(Compatibility.scala:179)
at org.neo4j.cypher.internal.compatibility.v3_3.exceptionHandler$runSafely$.apply(exceptionHandler.scala:90)
at org.neo4j.cypher.internal.compatibility.v3_3.Compatibility$ExecutionPlanWrapper.run(Compatibility.scala:179)
at org.neo4j.cypher.internal.PreparedPlanExecution.execute(PreparedPlanExecution.scala:29)
at org.neo4j.cypher.internal.ExecutionEngine.execute(ExecutionEngine.scala:120)
at org.neo4j.cypher.internal.javacompat.ExecutionEngine.executeQuery(ExecutionEngine.java:62)
at org.neo4j.bolt.v1.runtime.TransactionStateMachineSPI$1.start(TransactionStateMachineSPI.java:146)
at org.neo4j.bolt.v1.runtime.TransactionStateMachine$State$1.run(TransactionStateMachine.java:247)
at org.neo4j.bolt.v1.runtime.TransactionStateMachine.run(TransactionStateMachine.java:82)
at org.neo4j.bolt.v1.runtime.BoltStateMachine$State$2.run(BoltStateMachine.java:408)
at org.neo4j.bolt.v1.runtime.BoltStateMachine.run(BoltStateMachine.java:200)
at org.neo4j.bolt.v1.messaging.BoltMessageRouter.lambda$onRun$3(BoltMessageRouter.java:93)
at org.neo4j.bolt.v1.runtime.concurrent.RunnableBoltWorker.execute(RunnableBoltWorker.java:152)
at org.neo4j.bolt.v1.runtime.concurrent.RunnableBoltWorker.run(RunnableBoltWorker.java:104)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:109)

Error on start

Hi,

I try to run the nlp framework on a clean 3.3 edition.

My plugin dir is

graphaware-server-community-all-3.3.0.51.jar
graphaware-nlp-3.3.0.51.1.jar
nlp-opennlp-3.3.0.51.1.jar
nlp-stanfordnlp-3.3.0.51.1.jar

I am not sure what I am doing wrong but neo4j will not start, see error below. Thanks for your help.

Bob

======

2017-11-08 21:10:51.647+0000 INFO ======== Neo4j 3.3.0 ========
2017-11-08 21:10:51.683+0000 INFO Starting...
2017-11-08 21:11:02.636+0000 INFO Bolt enabled on 127.0.0.1:7687.
2017-11-08 21:11:08.563+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime enabled, bootstrapping...
2017-11-08 21:11:08.581+0000 WARN Topology change adapter is not available on community edition
2017-11-08 21:11:08.582+0000 INFO [c.g.r.b.RuntimeKernelExtension] Bootstrapping module with order 1, ID NLP, using com.graphaware.nlp.module.NLPBootstrapper
2017-11-08 21:11:08.640+0000 INFO Registering module NLP with GraphAware Runtime.
2017-11-08 21:11:08.641+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime bootstrapped, starting the Runtime...
2017-11-08 21:11:12.630+0000 INFO Shutting down GraphAware Runtime...
2017-11-08 21:11:12.632+0000 INFO Shutting down module NLP
2017-11-08 21:11:12.632+0000 INFO Terminating task scheduler...
2017-11-08 21:11:12.633+0000 INFO Task scheduler terminated successfully.
2017-11-08 21:11:12.633+0000 INFO GraphAware Runtime shut down.
2017-11-08 21:11:12.634+0000 ERROR [c.g.r.b.RuntimeKernelExtension] Could not start GraphAware Runtime because the database didn't get to a usable state within 5 minutes.
2017-11-08 21:11:12.639+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@1e86b2d1' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@1e86b2d1' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@1e86b2d1' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:218)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:111)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:79)
at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@1e86b2d1' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:210)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, /Users/Bob/Desktop/neo4j-aws-3.3.0/data/databases/graph.db
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:126)
at org.neo4j.server.CommunityNeoServer.lambda$static$0(CommunityNeoServer.java:58)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:88)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.proc.Procedures@6a1b4854' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:207)
... 9 more
Caused by: java.lang.VerifyError: Cannot inherit from final class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader$1.fetchNextOrNull(ProcedureJarLoader.java:141)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader$1.fetchNextOrNull(ProcedureJarLoader.java:118)
at org.neo4j.collection.PrefetchingRawIterator.peek(PrefetchingRawIterator.java:50)
at org.neo4j.collection.PrefetchingRawIterator.hasNext(PrefetchingRawIterator.java:36)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader.loadProcedures(ProcedureJarLoader.java:91)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader.loadProceduresFromDir(ProcedureJarLoader.java:82)
at org.neo4j.kernel.impl.proc.Procedures.start(Procedures.java:275)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 11 more
2017-11-08 21:11:12.640+0000 INFO Neo4j Server shutdown initiated by request

Document annotate(): add complete list of parameters

Index out of bounds exception during annotation leading to failed commits

Config:
Neo4J Desktop 1.1.1
Neo4J 3.3.5 Enterprise
plugins:
apoc-3.3.0.3
graphaware-nlp-3.3.3.52.7
graphaware-server-enterprise-all-3.3.3.52
nlp-stanfordnlp-3.3.3.52.7

I am running an annotation using Stanford CoreNLP (w. true:ner, sentiment, relations, dependency) and I seem to be getting a lot of errors, notably (see also logs below):

java.lang.IllegalArgumentException: Got a close tag p which does not match any open tag
java.lang.IndexOutOfBoundsException

Any suggestions?

From the logs:

2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] Error during iterate.commit:
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 1396 times: org.neo4j.graphdb.TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] Error during iterate.execute:
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 12 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IllegalArgumentException: Got a close tag p which does not match any open tag
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 206 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 2 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.NullPointerException
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 4 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 0
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 1 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 10, Size: 0
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 2 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 7, Size: 0
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 11 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 4, Size: 0
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 1032 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 23 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 0
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 4 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 6, Size: 0
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 2 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 12, Size: 0
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 86 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 0
2018-05-19 16:22:17.721+0000 WARN [o.n.k.i.p.Procedures] 11 times: Failed to invoke procedure `ga.nlp.annotate`: Caused by: java.lang.IndexOutOfBoundsException: Index: 5, Size: 0
2018-05-19 16:26:39.826+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [5098]:  Starting check pointing...
2018-05-19 16:26:39.826+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [5098]:  Starting store flush...
2018-05-19 16:26:40.061+0000 INFO [o.n.k.i.s.c.CountsTracker] About to rotate counts store at transaction 5098 to [C:\Users\Fotis.Zapantis\.Neo4jDesktop\neo4jDatabases\database-80a77733-2c74-4a9b-9a81-697a3d63f8ca\installation-3.3.5\data\databases\graph.db\neostore.counts.db.b], from [C:\Users\Fotis.Zapantis\.Neo4jDesktop\neo4jDatabases\database-80a77733-2c74-4a9b-9a81-697a3d63f8ca\installation-3.3.5\data\databases\graph.db\neostore.counts.db.a].
2018-05-19 16:26:40.092+0000 INFO [o.n.k.i.s.c.CountsTracker] Successfully rotated counts store at transaction 5098 to [C:\Users\Fotis.Zapantis\.Neo4jDesktop\neo4jDatabases\database-80a77733-2c74-4a9b-9a81-697a3d63f8ca\installation-3.3.5\data\databases\graph.db\neostore.counts.db.b], from [C:\Users\Fotis.Zapantis\.Neo4jDesktop\neo4jDatabases\database-80a77733-2c74-4a9b-9a81-697a3d63f8ca\installation-3.3.5\data\databases\graph.db\neostore.counts.db.a].
2018-05-19 16:26:45.266+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [5098]:  Store flush completed
2018-05-19 16:26:45.266+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [5098]:  Starting appending check point entry into the tx log...
2018-05-19 16:26:45.266+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [5098]:  Appending check point entry into the tx log completed
2018-05-19 16:26:45.266+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [5098]:  Check pointing completed
2018-05-19 16:26:45.266+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [0]:  Starting log pruning.
2018-05-19 16:26:45.266+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [0]:  Log pruning complete.
2018-05-19 17:45:39.866+0000 INFO [o.n.c.i.ExecutionEngine] Discarded stale query from the query cache after 17599 seconds: MATCH (a)-[r]->(b) WHERE id(a) IN $existingNodeIds AND id(b) IN $newNodeIds RETURN r;
2018-05-19 17:45:39.881+0000 INFO [o.n.c.i.EnterpriseCompatibilityFactory] Discarded stale query from the query cache after 17599 seconds: MATCH (a)-[r]->(b) WHERE id(a) IN $existingNodeIds AND id(b) IN $newNodeIds RETURN r;
2018-05-19 17:46:41.965+0000 INFO [o.n.k.i.p.Procedures] starting batching from `MATCH (n:Record) WHERE NOT (n)-[:HAS_ANNOTATED_TEXT]->() AND EXISTS (n.Text) AND length(n.Text) > 30 AND length(n.Text) < 2000 RETURN n` operation using iteration `CALL ga.nlp.annotate({text:n.Text, id:id(n), checkLanguage: false}) YIELD result MERGE(n)-[:HAS_ANNOTATED_TEXT]->(result)` in separate thread
2018-05-19 17:46:41.968+0000 INFO [c.g.n.u.ProcessorUtils] Taking default pipeline from configuration : customStopWords2

AND

java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
	at java.util.ArrayList.rangeCheck(ArrayList.java:653)
	at java.util.ArrayList.get(ArrayList.java:429)
	at com.graphaware.nlp.processor.stanford.StanfordTextProcessor.extractRelationship(StanfordTextProcessor.java:319)
	at com.graphaware.nlp.processor.stanford.StanfordTextProcessor.lambda$annotateText$0(StanfordTextProcessor.java:128)
	at java.util.ArrayList.forEach(ArrayList.java:1249)
	at com.graphaware.nlp.processor.stanford.StanfordTextProcessor.annotateText(StanfordTextProcessor.java:107)
	at com.graphaware.nlp.NLPManager.annotateTextAndPersist(NLPManager.java:143)
	at com.graphaware.nlp.NLPManager.annotateTextAndPersist(NLPManager.java:131)
	at com.graphaware.nlp.dsl.procedure.AnnotateProcedure.annotate(AnnotateProcedure.java:39)
	at sun.reflect.GeneratedMethodAccessor119.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.neo4j.kernel.impl.proc.ReflectiveProcedureCompiler$ReflectiveProcedure.apply(ReflectiveProcedureCompiler.java:596)
	at org.neo4j.kernel.impl.proc.ProcedureRegistry.callProcedure(ProcedureRegistry.java:202)
	at org.neo4j.kernel.impl.proc.Procedures.callProcedure(Procedures.java:257)
	at org.neo4j.kernel.impl.api.OperationsFacade.callProcedure(OperationsFacade.java:1440)
	at org.neo4j.kernel.impl.api.OperationsFacade.procedureCallWrite(OperationsFacade.java:1394)
	at org.neo4j.cypher.internal.spi.v3_3.TransactionBoundQueryContext$$anonfun$20.apply(TransactionBoundQueryContext.scala:706)
	at org.neo4j.cypher.internal.spi.v3_3.TransactionBoundQueryContext$$anonfun$20.apply(TransactionBoundQueryContext.scala:706)
	at org.neo4j.cypher.internal.spi.v3_3.TransactionBoundQueryContext.callProcedure(TransactionBoundQueryContext.scala:729)
	at org.neo4j.cypher.internal.spi.v3_3.TransactionBoundQueryContext.callReadWriteProcedure(TransactionBoundQueryContext.scala:707)
	at org.neo4j.cypher.internal.compatibility.v3_3.ExceptionTranslatingQueryContext$$anonfun$callReadWriteProcedure$1.apply(ExceptionTranslatingQueryContext.scala:153)
	at org.neo4j.cypher.internal.compatibility.v3_3.ExceptionTranslatingQueryContext$$anonfun$callReadWriteProcedure$1.apply(ExceptionTranslatingQueryContext.scala:153)
	at org.neo4j.cypher.internal.spi.v3_3.ExceptionTranslationSupport$class.translateException(ExceptionTranslationSupport.scala:32)
	at org.neo4j.cypher.internal.compatibility.v3_3.ExceptionTranslatingQueryContext.translateException(ExceptionTranslatingQueryContext.scala:39)
	at org.neo4j.cypher.internal.spi.v3_3.ExceptionTranslationSupport$class.translateIterator(ExceptionTranslationSupport.scala:46)
	at org.neo4j.cypher.internal.compatibility.v3_3.ExceptionTranslatingQueryContext.translateIterator(ExceptionTranslatingQueryContext.scala:39)
	at org.neo4j.cypher.internal.compatibility.v3_3.ExceptionTranslatingQueryContext.callReadWriteProcedure(ExceptionTranslatingQueryContext.scala:153)
	at org.neo4j.cypher.internal.spi.v3_3.DelegatingQueryContext.callReadWriteProcedure(DelegatingQueryContext.scala:217)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.executionplan.EagerReadWriteCallMode.callProcedure(ProcedureCallMode.scala:55)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.pipes.ProcedureCallPipe$$anon$$$$ad93371f26992c1f243949f3f9c28e5$$$$ateResultsByAppending$1.apply(ProcedureCallPipe.scala:68)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.pipes.ProcedureCallPipe$$anon$$$$ad93371f26992c1f243949f3f9c28e5$$$$ateResultsByAppending$1.apply(ProcedureCallPipe.scala:66)
	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.pipes.EmptyResultPipe.internalCreateResults(EmptyResultPipe.scala:28)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.pipes.PipeWithSource.createResults(Pipe.scala:62)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.pipes.PipeWithSource.createResults(Pipe.scala:59)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.createResults(DefaultExecutionResultBuilderFactory.scala:102)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.executionplan.DefaultExecutionResultBuilderFactory$ExecutionWorkflowBuilder.build(DefaultExecutionResultBuilderFactory.scala:74)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.BuildInterpretedExecutionPlan$$anonfun$getExecutionPlanFunction$1.apply(BuildInterpretedExecutionPlan.scala:100)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.BuildInterpretedExecutionPlan$$anonfun$getExecutionPlanFunction$1.apply(BuildInterpretedExecutionPlan.scala:83)
	at org.neo4j.cypher.internal.compatibility.v3_3.runtime.BuildInterpretedExecutionPlan$InterpretedExecutionPlan.run(BuildInterpretedExecutionPlan.scala:116)
	at org.neo4j.cypher.internal.compatibility.v3_3.Compatibility$ExecutionPlanWrapper$$anonfun$run$1.apply(Compatibility.scala:183)
	at org.neo4j.cypher.internal.compatibility.v3_3.Compatibility$ExecutionPlanWrapper$$anonfun$run$1.apply(Compatibility.scala:179)
	at org.neo4j.cypher.internal.compatibility.v3_3.exceptionHandler$runSafely$.apply(exceptionHandler.scala:90)
	at org.neo4j.cypher.internal.compatibility.v3_3.Compatibility$ExecutionPlanWrapper.run(Compatibility.scala:179)
	at org.neo4j.cypher.internal.PreparedPlanExecution.execute(PreparedPlanExecution.scala:29)
	at org.neo4j.cypher.internal.ExecutionEngine.execute(ExecutionEngine.scala:120)
	at org.neo4j.cypher.internal.javacompat.ExecutionEngine.executeQuery(ExecutionEngine.java:62)
	at org.neo4j.kernel.impl.proc.ProcedureGDBFacadeSPI.executeQuery(ProcedureGDBFacadeSPI.java:143)
	at org.neo4j.kernel.impl.factory.GraphDatabaseFacade.execute(GraphDatabaseFacade.java:451)
	at org.neo4j.kernel.impl.factory.GraphDatabaseFacade.execute(GraphDatabaseFacade.java:434)
	at apoc.periodic.Periodic.lambda$iterate$8(Periodic.java:266)
	at apoc.periodic.Periodic.retry(Periodic.java:272)
	at apoc.periodic.Periodic.lambda$iterateAndExecuteBatchedInSeparateThread$12(Periodic.java:339)
	at apoc.util.Util.lambda$inTxFuture$4(Util.java:172)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

Cosine documentation

Received via email a while back :

Hi Chris,

I just spent some time on nlp cosine similarity and during that I found inconsistences through https://github.com/graphaware/neo4j-nlp/blob/master/README.md.

So if I'm right maybe you want to change them.

First CALL ga.nlp.ml.cosine.compute({}) YIELD result should be ga.nlp.ml.similarity.cosine. And also leading by errors I realized that call should be something like

MATCH (a:AnnotatedText) with collect(a) as list

CALL ga.nlp.ml.similarity.cosine({input:list})

YIELD result RETURN result. <- it works for me

When I run procedure list function I can see

"ga.nlp.ml.similarity.cosine(similarityRequest :: MAP?) :: (result :: ANY?)"

but I couldn’t find similarityRequest example.

Error on Starting Neo4j

Hello
i have version 3.3.0-alpha07 community Edition
and i copy

graphaware-nlp-3.3.0.51.1.jar
nlp-opennlp-3.3.0.51.1.jar
nlp-stanfordnlp-3.3.0.51.1.jar

following files to plugin folder and add below lines to "AppData\Roaming\Neo4j Community Edition\neo4j.conf"

dbms.unmanaged_extension_classes=com.graphaware.server=/graphaware
com.graphaware.runtime.enabled=true
com.graphaware.module.NLP.2=com.graphaware.nlp.module.NLPBootstrapper

and when i start neo4j i get below error message

Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@7f31cc1c' was successfully initialized, but failed to start. Please see the attached cause exception "class org.apache.commons.lang3.time.FastDateParser$5 cannot access its superclass org.apache.commons.lang3.time.FastDateParser$NumberStrategy".

Promote UUID usage, strongly

Since node ids are reused, when doing some tests and deleting just the Original nodes ( so not the annotated text), it happens often that recreating a node would then be assigned a re-used id.

Calling then the annotate procedure would produce no effect as there will be already existing AnnotatedText node for the given id.

Providing AnnotateText level handy user defined functions

Writing queries to navigate the NLP structure is sometimes painful as it is long.

We could envisage functions where given an annotated text node we can easily ask for eg :

getTags
getSentences
getSentencesInOrder
getTopics
getRelatedTags (above specific score)

etc ...

TextRank - original Tag ids should be an array of Tag objects

ga.nlp.ml.similarity.cosine

Hi,
Think there might be a bug, but not certain.
Error description:
After using the ga.nlp.ml.similarity.cosine function no relationships are created. It only returns "6"

Installed package on a Neo 3.3.1:
graphaware-nlp-3.3.1.51.2
nlp-opennlp-3.3.1.51.2
nlp-stanfordnlp-3.3.1.51.2
graphaware-server-community-all-3.3.1.51

What i do:
Steg one: Create three News nodes
CREATE (n:News) SET n.text = ".";

Step two: annotate the text
MATCH (n:News) CALL ga.nlp.annotate({text: n.text, id: id(n)}) YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result) RETURN result

Step Three: run ga.nlp.ml.similarity.cosine
MATCH (a:AnnotatedText) with collect(a) as list CALL ga.nlp.ml.similarity.cosine(list, 0, null, "SIMULARITY") YIELD result return result

Master doesn't test clean due to failure of loaded module

Full dump of a certain test run that fails:

https://gist.github.com/moxious/326ec7cba29bdc76975120998cbc1f33

As far as I can tell on tracing root causes, it's this:

org.neo4j.graphdb.QueryExecutionException: Failed to invoke procedure `ga.nlp.ml.textRank`: Caused by: java.lang.RuntimeException: java.lang.NullPointerException

TextRankProcedureTest extends NLPIntegrationTest which in turn uses this setup:

        GraphAwareRuntime runtime = GraphAwareRuntimeFactory.createRuntime(getDatabase());
        runtime.registerModule(new NLPModule("NLP", NLPConfiguration.defaultConfiguration(), getDatabase()));
        runtime.start();
        runtime.waitUntilStarted();
        keyValueStore = new GraphKeyValueStore(getDatabase());

I'm guessing this is the mechanism used to start a neo4j instance with the module jacked in, but can't see further than this what the issue may be. I traced it to here: https://graphaware.com/site/framework/latest/apidocs/com/graphaware/runtime/GraphAwareRuntime.html#registerModule-com.graphaware.runtime.module.RuntimeModule- but beyond that I'm uncertain.

Cannot inherit from final class

After this issue #79
I get another error trying to use GraphAware Natural Language Processing with the following environment :
OS : Windows 10 Family 64bits
Neo4J : neo4j-community-3.3.5

And the only plugins installed are:
nlp-stanford-nlp-3.3.2.52.7.jar
graphaware-nlp-3.3.3.52.7.jar
graphaware-server-community-all-3.3.3.52.jar

I'm getting this error after runing bin\neo4j console :

C:\Program Files\neo4j-community-3.3.5>bin\neo4j console
2018-05-28 12:38:46.981+0000 INFO ======== Neo4j 3.3.5 ========
2018-05-28 12:38:47.144+0000 INFO Starting...
2018-05-28 12:38:51.561+0000 INFO Bolt enabled on 127.0.0.1:7687.
2018-05-28 12:38:56.313+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime enabled, bootstrapping...
2018-05-28 12:38:56.433+0000 INFO [c.g.r.b.RuntimeKernelExtension] Bootstrapping module with order 1, ID NLP, using com.graphaware.nlp.module.NLPBootstrapper
2018-05-28 12:38:56.780+0000 INFO Registering module NLP with GraphAware Runtime.
2018-05-28 12:38:56.792+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime bootstrapped, starting the Runtime...
2018-05-28 12:40:24.445+0000 INFO Shutting down GraphAware Runtime...
2018-05-28 12:40:24.449+0000 ERROR [c.g.r.b.RuntimeKernelExtension] Could not start GraphAware Runtime because the database didn't get to a usable state within 5 minutes.
2018-05-28 12:40:24.449+0000 INFO Shutting down module NLP
2018-05-28 12:40:24.456+0000 INFO Terminating task scheduler...
2018-05-28 12:40:24.463+0000 INFO Task scheduler terminated successfully.
2018-05-28 12:40:24.469+0000 INFO GraphAware Runtime shut down.
2018-05-28 12:40:24.496+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@3123fb56' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@3123fb56' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@3123fb56' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:220)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:111)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:79)
at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@3123fb56' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:212)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, C:\Program Files\neo4j-community-3.3.5\data\databases\graph.db
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:126)
at org.neo4j.server.CommunityNeoServer.lambda$static$0(CommunityNeoServer.java:58)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:88)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.proc.Procedures@2175efe4' was successfully initialized, but failed to start. Please see the attached cause exception "Cannot inherit from final class".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:207)
... 9 more
Caused by: java.lang.VerifyError: Cannot inherit from final class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.access$100(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader$1.fetchNextOrNull(ProcedureJarLoader.java:141)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader$1.fetchNextOrNull(ProcedureJarLoader.java:118)
at org.neo4j.collection.PrefetchingRawIterator.peek(PrefetchingRawIterator.java:50)
at org.neo4j.collection.PrefetchingRawIterator.hasNext(PrefetchingRawIterator.java:36)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader.loadProcedures(ProcedureJarLoader.java:91)
at org.neo4j.kernel.impl.proc.ProcedureJarLoader.loadProceduresFromDir(ProcedureJarLoader.java:82)
at org.neo4j.kernel.impl.proc.Procedures.start(Procedures.java:276)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 11 more
2018-05-28 12:40:24.516+0000 INFO Neo4j Server shutdown initiated by request

I do not get this error with open-nlp (but getting #79 ).

No procedure registered for this database

Any plugins required separately for cosine and for other procedures?

I have installed below plugins:
graphaware-nlp-3.1.3.1.jar
graphaware-server-community-all-3.2.5.51.jar
nlp-opennlp-3.1.3.1.jar

version-3.2.7

but am able to run annotation procedures but not the below procedures:

ga.nlp.language:
There is no procedure with the name ga.nlp.language registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.
ga.nlp.ml.cosine.compute:
There is no procedure with the name ga.nlp.ml.cosine.compute registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.

Please let me know regarding the same.

pipelines deleted manually are not removed from module pipelines

deleting the db doesnt make the modules aware of it.

Idea, add a transactionEventHandler that checks the deletion of Pipeline nodes, inform the *Processor classes and remove them from the available pipelines

Installation guide is misleading

The installation guide mentions to download neo4j-nlp and neo4j-nlp-stanfordnlp (or neo4j-nlp-stanfordnlp) from https://products.graphaware.com/ but they are not avalable there ... which is confusing.

Also the neo4j-framework probably refers to framework-server-community but I'm not sure.

Can the installation instruction be updated on how to install the graphaware plugin ?

Regards,
Constantin

Exception in thread "GraphAware Starter" java.lang.NoSuchFieldError: pipelineInfos

When trying to use GraphAware Natural Language Processing with the following environment :
OS : Windows 10 Family 64bits
Neo4J : neo4j-community-3.3.5

And the plugins :

nlp-opennlp-3.3.2.52.6.jar
graphaware-nlp-3.3.3.52.7.jar
graphaware-server-community-all-3.3.3.52.jar

I'm getting this error after runing bin\neo4j console :

C:\Program Files\neo4j-community-3.3.5>bin\neo4j console
2018-05-18 13:18:34.316+0000 INFO ======== Neo4j 3.3.5 ========
2018-05-18 13:18:34.476+0000 INFO Starting...
2018-05-18 13:18:40.353+0000 INFO Bolt enabled on 127.0.0.1:7687.
2018-05-18 13:18:46.218+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime enabled, bootstrapping...
2018-05-18 13:18:46.349+0000 INFO [c.g.r.b.RuntimeKernelExtension] Bootstrapping module with order 1, ID NLP, using com.graphaware.nlp.module.NLPBootstrapper2018-05-18 13:18:46.706+0000 INFO Registering module NLP with GraphAware Runtime.
2018-05-18 13:18:46.715+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime bootstrapped, starting the Runtime...
2018-05-18 13:20:21.594+0000 INFO Starting GraphAware...
2018-05-18 13:20:21.602+0000 INFO Loading module metadata...
2018-05-18 13:20:21.607+0000 INFO Loading metadata for module NLP
2018-05-18 13:20:21.860+0000 INFO Module NLP seems to have been registered for the first time.
2018-05-18 13:20:21.863+0000 INFO Module NLP seems to have been registered for the first time, will try to initialize...
2018-05-18 13:20:21.866+0000 INFO InitializeUntil set to 9223372036854775807 and it is 1526649621866. Will initialize.
2018-05-18 13:20:21.869+0000 INFO Initializing Community NLP Module
[GraphAware Starter] INFO org.reflections.Reflections - Reflections took 1271 ms to scan 3 urls, producing 238 keys and 519 values
2018-05-18 13:20:23.222+0000 INFO Loading text processor: com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor with class: com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor
2018-05-18 13:20:43.329+0000 INFO Started.
2018-05-18 13:20:44.087+0000 INFO [c.g.s.f.b.GraphAwareServerBootstrapper] started
2018-05-18 13:20:44.090+0000 INFO Mounted unmanaged extension [com.graphaware.server] at [/graphaware]
2018-05-18 13:20:45.167+0000 INFO Google Analytics enabled
Exception in thread "GraphAware Starter" java.lang.NoSuchFieldError: pipelineInfos
at com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor.createFullPipeline(OpenNLPTextProcessor.java:80)
at com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor.init(OpenNLPTextProcessor.java:51)
at com.graphaware.nlp.processor.TextProcessorsManager.lambda$initiateTextProcessors$1(TextProcessorsManager.java:61)
at java.util.HashMap$Values.forEach(Unknown Source)
at com.graphaware.nlp.processor.TextProcessorsManager.initiateTextProcessors(TextProcessorsManager.java:60)
at com.graphaware.nlp.processor.TextProcessorsManager.(TextProcessorsManager.java:37)
at com.graphaware.nlp.NLPManager.init(NLPManager.java:105)
at com.graphaware.nlp.module.NLPModule.initialize(NLPModule.java:62)
at com.graphaware.runtime.manager.ProductionTxDrivenModuleManager.initialize(ProductionTxDrivenModuleManager.java:57)
at com.graphaware.runtime.manager.BaseTxDrivenModuleManager.initializeIfAllowed(BaseTxDrivenModuleManager.java:128)
at com.graphaware.runtime.manager.BaseTxDrivenModuleManager.handleNoMetadata(BaseTxDrivenModuleManager.java:72)
at com.graphaware.runtime.manager.BaseTxDrivenModuleManager.handleNoMetadata(BaseTxDrivenModuleManager.java:39)
at com.graphaware.runtime.manager.BaseModuleManager.loadMetadata(BaseModuleManager.java:143)
at com.graphaware.runtime.manager.BaseModuleManager.loadMetadata(BaseModuleManager.java:125)
at com.graphaware.runtime.TxDrivenRuntime.loadMetadata(TxDrivenRuntime.java:130)
at com.graphaware.runtime.ProductionRuntime.loadMetadata(ProductionRuntime.java:80)
at com.graphaware.runtime.BaseGraphAwareRuntime.startModules(BaseGraphAwareRuntime.java:154)
at com.graphaware.runtime.TxDrivenRuntime.startModules(TxDrivenRuntime.java:146)
at com.graphaware.runtime.ProductionRuntime.startModules(ProductionRuntime.java:70)
at com.graphaware.runtime.BaseGraphAwareRuntime.start(BaseGraphAwareRuntime.java:134)
at com.graphaware.runtime.bootstrap.RuntimeKernelExtension.lambda$start$9(RuntimeKernelExtension.java:117)
at java.lang.Thread.run(Unknown Source)

I've tried with another version of nlo-opennlp but still getting the exact same error.

TextRank improvements

### TF-IDF

On a large corpora, computing the tf-idf value of Tag nodes can be expensive, slowing down considerably the TextRank process.

Evaluate the necessity of tf-idf
If removing tf-id shows some regression in the quality of the keywords extracted, then solve as the following : if the Tag has a higher degree of 750 for the HAS_TAG relationship, use it the degree, otherwise traverse up to the AnnotatedText for having the distinct count

### Default blacklist extension

Improve the blacklist in order to avoid some useless keywords that are common to all corporas :

MATCH (n:Keyword) WHERE size(n.keywordsList) = 1
WITH n WHERE true
MATCH (t:Tag {id: n.keywordsList[0] + "en"})
USING INDEX t:Tag(id)
WHERE ANY(x IN t.pos WHERE x STARTS WITH "VB" OR x IN ["JJ","JJS", "JJR","RB","RBR","RBS","RP","WR","IN","WP"])
DETACH DELETE n

This is a post-process query, but it needs to be implemented in the code, only valid when a potential keyword is a single Tag.

Tag ids and values should be trimmed

{
  "language": "en",
  "id": "          Babcost_en",
  "pos": [],
  "value": "          Babcost",
  "lastTxId": "1510259447215",
  "ne": [
    "ORGANIZATION"
  ]
}

Failed to invoke procedure `ga.nlp.ml.similarity.cosine`: Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Missing key 'input'

Howdy!

I am using graphaware-nlp-3.32

Error:

Failed to invoke procedure ga.nlp.ml.similarity.cosine: Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Missing key 'input'

Code:
MATCH (at:AnnotatedText) CALL ga.nlp.ml.similarity.cosine({node:at}) YIELD result RETURN result

any and all help is much appreciated!

Move to normal way of procedures

The procedures should be rewritten to be in the neo standard way, as we are doing the same for all the other GA modules.

I know @alenegro81 :)

ga.nlp.ml.textRank.summarize - stackoverflow

When running the commando
MATCH (a:AnnotatedText) CALL ga.nlp.ml.textRank.summarize({annotatedText: a}) YIELD result RETURN result

its generates "Failed to invoke procedure ga.nlp.ml.textRank.summarize: Caused by: java.lang.StackOverflowError"

Nothing in neos logfiles

Provide procedures for writing steps

Writing to the graph could be done via another way than the official plugins, for example I might want to use some python library to do NER and still be able to write it back to the graph in the GA NLP defined way.

Having procedures where you can pass for example Map objects would be useful and could be reused among plugins as well.

No GraphAware Runtime is registered with the given database

I downloaded the plugins of graphaware nlp,open-nlp,framework and copied the jar files to the plugins directory.

And as per the steps in neo4j , i included the following lines in neo4j.config file
dbms.unmanaged_extension_classes=com.graphaware.server=/graphaware
com.graphaware.runtime.enabled=true
com.graphaware.module.NLP.2=com.graphaware.nlp.module.NLPBootstrapper

After inserting this the localhost:7474 is not starting.

But when i comment these lines localhost starts and works properly but doesnt include the plugins.

Version : enterprise 3.1.3
Link for reference of old issue in(git)

Error in LocalLost after commenting those lines:
Failed to invoke procedure ga.nlp.annotate: Caused by: java.lang.RuntimeException:
java.lang.IllegalStateException: No GraphAware Runtime is registered with the given database

Error in log file:

2017-11-07 10:41:03.839+0000 INFO ======== Neo4j 3.1.3 ========
2017-11-07 10:41:04.120+0000 INFO Starting...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/share/neo4j/lib/slf4j-nop-1.7.22.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/var/lib/neo4j/plugins/nlp-opennlp-3.1.3.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.helpers.NOPLoggerFactory]
2017-11-07 10:41:04.985+0000 INFO Bolt enabled on localhost:7687.
2017-11-07 10:41:05.010+0000 INFO Initiating metrics...
2017-11-07 10:41:07.374+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime enabled, bootstrapping...
2017-11-07 10:41:07.444+0000 INFO [c.g.r.b.RuntimeKernelExtension] Bootstrapping module with order 2, ID NLP, using com.graphaware.nlp.module.NLPBootstrapper
2017-11-07 10:41:07.523+0000 INFO Registering module NLP with GraphAware Runtime.
2017-11-07 10:41:07.523+0000 INFO [c.g.r.b.RuntimeKernelExtension] GraphAware Runtime bootstrapped, starting the Runtime...
2017-11-07 10:41:21.893+0000 INFO Starting GraphAware...
2017-11-07 10:41:21.894+0000 INFO Loading module metadata...
2017-11-07 10:41:21.894+0000 INFO Loading metadata for module NLP
2017-11-07 10:41:21.946+0000 INFO Module NLP seems to have been registered for the first time.
2017-11-07 10:41:21.947+0000 INFO Module NLP seems to have been registered for the first time, will try to initialize...
2017-11-07 10:41:21.947+0000 INFO InitializeUntil set to 9223372036854775807 and it is 1510051281947. Will initialize.
2017-11-07 10:41:24.709+0000 INFO Started.
2017-11-07 10:41:24.811+0000 INFO Mounted REST API at: /db/manage
2017-11-07 10:41:24.823+0000 INFO [c.g.s.f.b.GraphAwareServerBootstrapper] started
2017-11-07 10:41:24.825+0000 INFO Mounted unmanaged extension [com.graphaware.server] at [/graphaware]
Exception in thread "GraphAware Starter" java.lang.RuntimeException: Error while initializing model of class: class opennlp.tools.namefind.TokenNameFinderModel
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.loadModel(OpenNLPPipeline.java:503)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.lambda$loadNamedEntitiesFinders$2(OpenNLPPipeline.java:161)
at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1691)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.loadNamedEntitiesFinders(OpenNLPPipeline.java:158)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.init(OpenNLPPipeline.java:118)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.(OpenNLPPipeline.java:108)
at com.graphaware.nlp.processor.opennlp.PipelineBuilder.build(PipelineBuilder.java:79)
at com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor.createPhrasePipeline(OpenNLPTextProcessor.java:106)
at com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor.init(OpenNLPTextProcessor.java:56)
at com.graphaware.nlp.processor.TextProcessorsManager.lambda$initiateTextProcessors$0(TextProcessorsManager.java:61)
at java.util.HashMap$Values.forEach(HashMap.java:980)
at com.graphaware.nlp.processor.TextProcessorsManager.initiateTextProcessors(TextProcessorsManager.java:60)
at com.graphaware.nlp.processor.TextProcessorsManager.(TextProcessorsManager.java:37)
at com.graphaware.nlp.NLPManager.init(NLPManager.java:95)
at com.graphaware.nlp.module.NLPModule.initialize(NLPModule.java:52)
at com.graphaware.runtime.manager.ProductionTxDrivenModuleManager.initialize(ProductionTxDrivenModuleManager.java:57)
at com.graphaware.runtime.manager.BaseTxDrivenModuleManager.initializeIfAllowed(BaseTxDrivenModuleManager.java:128)
at com.graphaware.runtime.manager.BaseTxDrivenModuleManager.handleNoMetadata(BaseTxDrivenModuleManager.java:72)
at com.graphaware.runtime.manager.BaseTxDrivenModuleManager.handleNoMetadata(BaseTxDrivenModuleManager.java:39)
at com.graphaware.runtime.manager.BaseModuleManager.loadMetadata(BaseModuleManager.java:143)
at com.graphaware.runtime.manager.BaseModuleManager.loadMetadata(BaseModuleManager.java:125)
at com.graphaware.runtime.TxDrivenRuntime.loadMetadata(TxDrivenRuntime.java:130)
at com.graphaware.runtime.ProductionRuntime.loadMetadata(ProductionRuntime.java:80)
at com.graphaware.runtime.BaseGraphAwareRuntime.startModules(BaseGraphAwareRuntime.java:154)
at com.graphaware.runtime.TxDrivenRuntime.startModules(TxDrivenRuntime.java:146)
at com.graphaware.runtime.ProductionRuntime.startModules(ProductionRuntime.java:70)
at com.graphaware.runtime.BaseGraphAwareRuntime.start(BaseGraphAwareRuntime.java:134)
at com.graphaware.runtime.bootstrap.RuntimeKernelExtension.lambda$start$8(RuntimeKernelExtension.java:117)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.loadModel(OpenNLPPipeline.java:499)
... 29 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at opennlp.tools.ml.model.AbstractModelReader.getParameters(AbstractModelReader.java:140)
at opennlp.tools.ml.maxent.io.GISModelReader.constructModel(GISModelReader.java:78)
at opennlp.tools.ml.model.GenericModelReader.constructModel(GenericModelReader.java:62)
at opennlp.tools.ml.model.AbstractModelReader.getModel(AbstractModelReader.java:85)
at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:32)
at opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:29)
at opennlp.tools.util.model.BaseModel.finishLoadingArtifacts(BaseModel.java:309)
at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:239)
at opennlp.tools.util.model.BaseModel.(BaseModel.java:173)
at opennlp.tools.namefind.TokenNameFinderModel.(TokenNameFinderModel.java:103)
at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.loadModel(OpenNLPPipeline.java:499)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.lambda$loadNamedEntitiesFinders$2(OpenNLPPipeline.java:161)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline$$Lambda$239/1188677545.accept(Unknown Source)
at java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1691)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.loadNamedEntitiesFinders(OpenNLPPipeline.java:158)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.init(OpenNLPPipeline.java:118)
at com.graphaware.nlp.processor.opennlp.OpenNLPPipeline.(OpenNLPPipeline.java:108)
at com.graphaware.nlp.processor.opennlp.PipelineBuilder.build(PipelineBuilder.java:79)
at com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor.createPhrasePipeline(OpenNLPTextProcessor.java:106)
at com.graphaware.nlp.processor.opennlp.OpenNLPTextProcessor.init(OpenNLPTextProcessor.java:56)
at com.graphaware.nlp.processor.TextProcessorsManager.lambda$initiateTextProcessors$0(TextProcessorsManager.java:61)
at com.graphaware.nlp.processor.TextProcessorsManager$$Lambda$234/2094381213.accept(Unknown Source)
at java.util.HashMap$Values.forEach(HashMap.java:980)
at com.graphaware.nlp.processor.TextProcessorsManager.initiateTextProcessors(TextProcessorsManager.java:60)
at com.graphaware.nlp.processor.TextProcessorsManager.(TextProcessorsManager.java:37)
at com.graphaware.nlp.NLPManager.init(NLPManager.java:95)
at com.graphaware.nlp.module.NLPModule.initialize(NLPModule.java:52)
at com.graphaware.runtime.manager.ProductionTxDrivenModuleManager.initialize(ProductionTxDrivenModuleManager.java:57)
please help me out

Stanford NLP gem constantly running out of memory

I am trying to annotate user complaints using the Stanford tokenizerAndDependency pipeline:

 MATCH(n:Complaint)` WHERE NOT (n)-[]->()
 WITH n LIMIT 10
 CALL ga.nlp.annotate({text:n.content, id: id(n), pipeline:'tokenizerAndDependency'}) YIELD result
 MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)
 RETURN n, result;

But it is glacially slow and despite allocating 10Gb for the heap usually runs out of heap space having processed one or two documents at the most. Typically documents are no more than 400 words long (frequently less) however they are often quite colloquial with less than perfect grammar, especially capitalisation. This happens for both the Stanford NLP tokeniser and tokeniserAndDependency pipeline. If I use the nlp-opennlp pipeline it all works fine and pretty quickly.

I can't really share a typical document in a public forum as we have restrictions on sharing information however I could send you one or two examples if needed.

It's so slow I'm sure I must be doing something very wrong.

I'm running neo4j 3.3.2 on an AWS m5.xlarge (2 cores 16Gb Ram) with the following configuration.

 com.graphaware.runtime.stats.disabled=true
 com.graphaware.server.stats.disabled=true
 dbms.unmanaged_extension_classes=com.graphaware.server=/graphaware
 com.graphaware.runtime.enabled=true
 com.graphaware.module.NLP.1=com.graphaware.nlp.module.NLPBootstrapper
 dbms.security.procedures.whitelist=ga.nlp.*
 dbms.memory.heap.initial_size=8G
 dbms.memory.heap.max_size=10G`

The only gems in my plugins dir are:

graphaware-nlp-3.3.2.52.6.jar  
graphaware-server-enterprise-all-3.3.2.52.jar
nlp-stanfordnlp-3.3.2.52.6.jar

These are the indexes:

INDEX ON :Tag(value)
INDEX ON :AnnotatedText(id)
INDEX ON :Complaint(id)
INDEX ON :Sentence(id)
INDEX ON :Tag(id)

graphaware / neo4j-nlp Goto Github PK

neo4j-nlp's Introduction

GraphAware Natural Language Processing Has Been Retired

GraphAware Natural Language Processing

Feature Matrix

Installation

Quick Documentation in Neo4j Browser

Getting Started

Text extraction

Pipelines and components

Example

Enrich your original knowledge

List of available procedures

Keyword Extraction

TextRank Summarization

Sentiment Detection

Language Detection

NLP based filter

Cosine similarity computation

Word2vec

Using other models

Parsing PDF Documents

Exclude content from the pdf

Use a different user Agent than TIKA

Extras

Parsing raw content from a file

Storing only certain Tag/Tokens

Parsing WebVTT

Listing files from directory(ies)

Additional Procedures

ga.nlp.config.model.list()

ga.nlp.refreshPipeline()

License

neo4j-nlp's People

Contributors

Stargazers

Watchers

Forkers

neo4j-nlp's Issues

Installed base

Errordescription

Installed base

Used comand:

Recommend Projects

Recommend Topics

Recommend Org