Git Product home page Git Product logo

dbpedia-spotlight / dbpedia-spotlight-model Goto Github PK

View Code? Open in Web Editor NEW
176.0 12.0 42.0 398 KB

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text. Improving Efficiency and Accuracy in Multilingual Entity Extraction approach

Home Page: http://www.dbpedia-spotlight.org

License: Apache License 2.0

Java 42.12% Scala 56.00% Shell 0.43% PigLatin 0.21% Python 1.24%
dbpedia-spotlight annotations

dbpedia-spotlight-model's Introduction

IMPORTANT: No longer in active development. Please use https://github.com/dbpedia-spotlight/dbpedia-spotlight-model instead.


DBpedia Spotlight

Links

website - http://www.dbpedia-spotlight.org

status service - http://status.dbpedia-spotlight.org

download service - http://download.dbpedia-spotlight.org

demo service - http://demo.dbpedia-spotlight.org

CI -http://jenkins.dbpedia-spotlight.org

General Notes

Since v1.0, DBpedia Spotlight was split into two versions, under the same API, as follow:

We will keep this repository just to historical references. Every issue opened should be closed and reopened in their respective repositories.

This important movement was the way that we found to deliver faster fixes and new releases, providing solutions for each annotation approach.

Our first achievement is related with licensing. DBpedia Spotlight Model is now full compliance with Apache 2.0. It means that you can use it without any commercial restrictions.

We are so excited because there's even more great news to come.

If you require any further information, feel free to contact us via [email protected]. We are already very excited to spend time with you on further community meetings and to publish new DBpedia releases.

Keep annotating,

All the best

Shedding Light on the Web of Documents

DBpedia Spotlight looks for ~3.5M things of unknown or ~320 known types in text and tries to link them to their global unique identifiers in DBpedia.

Demonstration

Go to our Demonstration page, copy+paste some text and play with the parameters to see how it works.

Call our web service

You can use our demonstration Web Service directly from your application.

curl http://model.dbpedia-spotlight.org/en/annotate  \
  --data-urlencode "text=President Obama called Wednesday on Congress to extend a tax break
  for students included in last year's economic stimulus package, arguing
  that the policy provides more generous assistance." \
  --data "confidence=0.35"

or for JSON:

curl http://model.dbpedia-spotlight.org/en/annotate  \
  --data-urlencode "text=President Obama called Wednesday on Congress to extend a tax break
  for students included in last year's economic stimulus package, arguing
  that the policy provides more generous assistance." \
  --data "confidence=0.35" \
  -H "Accept: application/json"

Run your own server

If you need service reliability and lower response times, you can run DBpedia Spotlight in your own In-House Server. Just download a model and Spotlight from here to get started.

wget http://downloads.dbpedia-spotlight.org/spotlight/dbpedia-spotlight-0.7.1.jar
wget http://downloads.dbpedia-spotlight.org/2016-04/en/model/en.tar.gz
tar xzf en.tar.gz
java -jar dbpedia-spotlight-latest.jar en http://localhost:2222/rest

Models and data

Models and raw data for most languages are available here.

Citation

If you use DBpedia Spotlight in your research, please cite the following paper:

@inproceedings{isem2013daiber,
  title = {Improving Efficiency and Accuracy in Multilingual Entity Extraction},
  author = {Joachim Daiber and Max Jakob and Chris Hokamp and Pablo N. Mendes},
  year = {2013},
  booktitle = {Proceedings of the 9th International Conference on Semantic Systems (I-Semantics)}
}

Licenses

All the original code produced for DBpedia Spotlight is licensed under Apache License, 2.0. Some modules have dependencies on LingPipe under the Royalty Free License. Some of our original code (currently) depends on GPL-licensed or LGPL-licensed code and is therefore also GPL or LGPL, respectively. We are currently cleaning up the dependencies to release two builds, one purely GPL and one purely Apache License, 2.0.

The documentation on this website is shared as Creative Commons Attribution-ShareAlike 3.0 Unported License.

More information on citation and how to cite the deprecated Lucene version can be found here.

Documentation

More documentation is available from the DBpedia Spotlight wiki.

FAQ

Check the FAQ here

dbpedia-spotlight-model's People

Contributors

augusto-herrmann avatar julio-noe avatar kfitzgerald avatar m1ci avatar manonthegithub avatar ragnarok85 avatar sandroacoelho avatar skunnyk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbpedia-spotlight-model's Issues

Opensource Guide Compliance

Please, check if the project is compliance with Opensource Guide

Documentation

  • Project has a LICENSE file with an open source license
  • Project has basic documentation
    • README,
    • CONTRIBUTING,
    • CODE_OF_CONDUCT
  • The name is easy to remember, gives some idea of what the project does, and does not conflict with an existing project or infringe on trademarks
  • The issue queue is up-to-date, with issues clearly organized and labeled

Code

  • Project uses consistent code conventions and clear function/method/variable names
  • The code is clearly commented, documenting intentions and edge cases
  • There are no sensitive materials in the revision history, issues, or pull requests (for example, passwords or other non-public information)

People

If you’re an individual:

  • You've talked to the legal department and/or understand the IP and open source policies of your company (if you're an employee somewhere)

If you’re a company or organization:

  • You've talked to your legal department
  • You have a marketing plan for announcing and promoting the project
  • Someone is committed to managing community interactions (responding to issues, reviewing and merging pull requests
  • At least two people have administrative access to the project

How to load the spotters

Hey,

In the source code there are a bunch of spotters available. I can not figure out how to get them loaded. Can someone give me a short tutorial on setting up different spotters? Would be awesome if you additionally link me the needed models.

I wanted to try out the different spotters on the demo server http://model.dbpedia-spotlight.org/en/spot .

Thats the code snipped i used:

        text = ticket["ticket_short_description_translation"] + \
               "/n/n" + ticket["ticket_description_translation"]
        payload = {"text": text, "spotterName": spotter}
        headers = {'Accept': 'application/json'}
        result = requests.get('http://model.dbpedia-spotlight.org/en/spot', params=payload, headers=headers)
        results.append(result.json())

With the following spotters:

    spotters = ["LingPipeSpotter", "AtLeastOneNounSelector", "CoOccurrenceBasedSelector",
                "NESpotter", "KeyphraseSpotter", "OpenNLPChunkerSpotter", "WikiMarkupSpotter",
                "SpotXmlParser", "AhoCorasickSpotter", "Default"]

But unlucky i get the absolutly same results ... doesnt look like the spotters changed. The api even accepted a "bliblablubspotter". Which indicates that only the default seems to run.

What am I getting wrong?

How to identify the detected resource in the input text from json response

Hi, I am wondering how to relate detected resources with the input text. For example, for the input:
"The Battle of Gettysburg was fought July 1–3, 1863."
I used the following curl command to query a local instance of spotlight:

curl -X POST http://localhost:2222/rest/annotate -H 'accept: application/json' -H 'content-type: application/x-www-form-urlencoded' --data-urlencode "text=The Battle of Gettysburg was fought July 1–3, 1863." --data-urlencode "confidence=0.35"

Which returns the following:

{
   "@text":"The Battle of Gettysburg was fought July 1–3, 1863.",
   "@confidence":"0.35",
   "@support":"0",
   "@types":"",
   "@sparql":"",
   "@policy":"whitelist",
   "Resources":[
      {
         "@URI":"http://dbpedia.org/resource/Battle_of_Gettysburg",
         "@support":"2871",
         "@types":"Wikidata:Q1656682,DUL:Event,Schema:Event,DBpedia:SocietalEvent,DBpedia:MilitaryConflict,DBpedia:Event",
         "@surfaceForm":"Gettysburg",
         "@offset":"14",
         "@similarityScore":"0.9856069793346309",
         "@percentageOfSecondRank":"0.014354886850971043"
      }
   ]
}

From this response, how can I know that the URI http://dbpedia.org/resource/Battle_of_Gettysburg comes from the text "The Battle of Gettysburg"? I was using the surface form, but this field only indicates "Gettysburg". Also, the offset field indicates the start position of "Gettysburg". Could this be a bug?

I've tried to use the demo webpage (https://www.dbpedia-spotlight.org/demo/) in order to check if only "Gettysburg" is hightlighed, but it seems to be down at this moment.

Thanks beforehand.

REST Interface is not filtering types

Our demo application provides some filters, such as types and SPARQL queries that are not working.

Test cases:

  • When a user submits a text to be annotated and select one or more types, Spotlight engine must filter the annotations among these types;

  • When a user submits a text to be annotated with a SPARQL query, Spotlight engine must combine the result with the SPARQL query to filter annotations;

Expired certificate

Hi, the certificate expired last week and I can't consume the api nor demo (since it uses the api).
Will it be available again?
Thanks

CPU Cores Number max=5?

While I have used the parameters:

-Dthreads.max=10 -Dthreads.core=10

My CPU has 12 cores. but the CPU usage up to 500%; so .. where is the problem?

Website is down

Hello everyone,

thank you for hosting the website, you hosting it saved me a lot of time.
unfortunately i noticed today that the website is down, could you please restart it.

Thank you in advance
regards,
hashpad

ERROR 404: Not Found.

HI, I am trying to run my own server and follow your instructions. Here is what I get:

wget http://downloads.dbpedia-spotlight.org/spotlight/dbpedia-spotlight-1.0.0.jar
--2020-02-25 04:52:22--  http://downloads.dbpedia-spotlight.org/spotlight/dbpedia-spotlight-1.0.0.jar
Resolving downloads.dbpedia-spotlight.org (downloads.dbpedia-spotlight.org)... 200.18.160.23
Connecting to downloads.dbpedia-spotlight.org (downloads.dbpedia-spotlight.org)|200.18.160.23|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://sourceforge.net/projects/dbpedia-spotlight/filesspotlight/dbpedia-spotlight-1.0.0.jar [following]
--2020-02-25 04:52:23--  https://sourceforge.net/projects/dbpedia-spotlight/filesspotlight/dbpedia-spotlight-1.0.0.jar
Resolving sourceforge.net (sourceforge.net)... 216.105.38.13
Connecting to sourceforge.net (sourceforge.net)|216.105.38.13|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-02-25 04:52:23 ERROR 404: Not Found.

Annotation fails with String index out of range: 4

Hi, I'm using Spotlight to annotate ~40k texts.

In around 3.5k instances, the annotation does not work as expected and Spotlight produces String index out of range: 4 instead of the annotation XML.

I can't find the reason why this happens. From what I can tell, the texts where Spotlight fails are of similar length and structure as those that work flawlessly.

I've tried removing all non-alphanumeric characters from sample texts that failed, but the error still persists.

This is the last shell output I'm getting on the REST server before the CURL command returns the error.

...]
492713 [Grizzly-2222(5)] INFO org.dbpedia.spotlight.filter.annotations.ConfidenceFilter - (c=0.45) filtered out by similarity score threshold (0.000<0.450): SurfaceForm[Black] -0.000-> DBpediaResource[Black(DBpedia:Colour)] - at position *7371* in - Text[... rres management told them that if they played the Black Angels Death Song again theyd be fired the V ...]                                
492713 [Grizzly-2222(5)] INFO org.dbpedia.spotlight.filter.annotations.ConfidenceFilter - (c=0.45) filtered out by similarity score threshold (0.000<0.450): SurfaceForm[Black] -0.000-> DBpediaResource[Black_Canadians(Wikidata:Q41710,DBpedia:EthnicGroup)] - at position *7371* in - Text[... rres management told them that if they played the Black Angels Death Song again theyd be fired the V ...]

Does anybody have an idea why this could happen? I can provide a text file containing the texts in question for reference.

I'm using Java 1.8.0, dbpedia-sporlight-1.0.0 jarfile, latest en core data release

Thanks for your help!

No Types Being Returned using monthly build of models

Running the server pointed at the latest models from http://downloads.dbpedia-spotlight.org/monthly_build/models/en/ results in no types being returned by the example text:

curl http://localhost:2222/rest/annotate   -H "Accept: text/xml"   --data-urlencode "text=Brazilian state-run giant oil company Petrobras signed a three-year technology and research cooperation agreement with oil service provider Halliburton."   --data "confidence=0"   --data "support=0"
<?xml version="1.0" encoding="utf-8"?>
<Annotation text="Brazilian state-run giant oil company Petrobras signed a three-year technology and research cooperation agreement with oil service provider Halliburton." confidence="0.0" support="0" types="" sparql="" policy="whitelist">
<Resources>
<Resource URI="http://dbpedia.org/resource/Brazil" support="95617" types="" surfaceForm="Brazilian" offset="0" similarityScore="0.9476330743637952" percentageOfSecondRank="0.0546964397251261"/>
<Resource URI="http://dbpedia.org/resource/Giant" support="1630" types="" surfaceForm="giant" offset="20" similarityScore="0.8491284931535805" percentageOfSecondRank="0.15984183965968377"/>
<Resource URI="http://dbpedia.org/resource/Petroleum" support="14548" types="" surfaceForm="oil company" offset="26" similarityScore="0.6478659224548402" percentageOfSecondRank="0.532568572617261"/>
<Resource URI="http://dbpedia.org/resource/Petrobras" support="746" types="" surfaceForm="Petrobras" offset="38" similarityScore="1.0" percentageOfSecondRank="0.0"/>
<Resource URI="http://dbpedia.org/resource/Sign_language" support="1848" types="" surfaceForm="signed" offset="48" similarityScore="0.7572051510363451" percentageOfSecondRank="0.15818448512547723"/>
<Resource URI="http://dbpedia.org/resource/Technology" support="12406" types="" surfaceForm="technology" offset="68" similarityScore="0.9935341438769555" percentageOfSecondRank="0.006385575253761328"/>
<Resource URI="http://dbpedia.org/resource/Cooperation" support="668" types="" surfaceForm="cooperation" offset="92" similarityScore="0.7265890021764683" percentageOfSecondRank="0.2855282935938623"/>
<Resource URI="http://dbpedia.org/resource/Petroleum" support="14548" types="" surfaceForm="oil" offset="119" similarityScore="0.9481608096582461" percentageOfSecondRank="0.05447434846074574"/>
<Resource URI="http://dbpedia.org/resource/Service_provider" support="318" types="" surfaceForm="service provider" offset="123" similarityScore="0.9397239980786181" percentageOfSecondRank="0.055727768829113095"/>
<Resource URI="http://dbpedia.org/resource/Halliburton" support="702" types="" surfaceForm="Halliburton" offset="140" similarityScore="0.9999999998930207" percentageOfSecondRank="0.0"/>
</Resources>
</Annotation>

The 2016-04 models do return types successfully: http://downloads.dbpedia-spotlight.org/2016-04/

Spotlight Performances / Possible tuning ?

Hi,
I'm working on an "in-house" dbpedia-spotlight setup.
I use dbpedia-spotlight-1.0.0.jar, the en.tar.gz model from http://downloads.dbpedia-spotlight.org/2016-04/en/model/en.tar.gz, and start with :

java -Xms10G -Xmx10G -jar dbpedia-spotlight-1.0.0.jar en_2+2/ http://x.x.x.x:2222/en/

  • Server bare metal
  • 16 core CPUs ( CPU E5-2620 )
  • 16GB ram

From my firsts benchmarks with random english articles with a confidence=0.35, I can't go beyond 30req/s. With confidence=0.5, around 60req/s. It seems the number of // clients does not matter, I'm still 'stuck' at 30 req/s.
I also tried on another hardware/aws etc, and still the same result.
Is there is some known limitations ?
Spotlight seems to use at maximum 8 cores. Is any tuning possible ? I can't see any documentation on this part.

Thank you :)

offset duplicated

Hi everybody:

I have tried the following text in the DBpedia SpotLight demo.

"Artaxerxes III was succeeded by Artaxerxes IV Arses”

I get the following strange result.

"Artaxerxes III was succeeded by Artaxerxes IV ArsesArtaxerxes IV ArsesArtaxerxes IV Arses, who before he could act was also poisoned"

If I make an API call, I get two different entities identified in the same offset 32:

@URI-> http://dbpedia.org/resource/Arses_of_Persia
@surfaceForm-> Artaxerxes IV Arses
@offset-> 32

and

@URI-> http://dbpedia.org/resource/Artaxerxes_III
@surfaceForm-> Artaxerxes
@offset-> 32

May be this is a bug? or is there an explanation I don't get?

No such element exception

I'm attempting to use the dbpedia spotlight annotators without the REST API client as per: https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Run-from-Java-or-Scala.

The only difference is that I am loading the model.

Here is my simple test program.

import java.io.File;
import java.util.List;

import org.dbpedia.spotlight.annotate.DefaultParagraphAnnotator;
import org.dbpedia.spotlight.db.SpotlightModel;
import org.dbpedia.spotlight.disambiguate.ParagraphDisambiguatorJ;
import org.dbpedia.spotlight.exceptions.ConfigurationException;
import org.dbpedia.spotlight.exceptions.InputException;
import org.dbpedia.spotlight.exceptions.SpottingException;
import org.dbpedia.spotlight.model.DBpediaResourceOccurrence;
import org.dbpedia.spotlight.model.SpotlightConfiguration.DisambiguationPolicy;
import org.dbpedia.spotlight.model.SpotterConfiguration.SpotterPolicy;
import org.dbpedia.spotlight.model.Text;
import org.dbpedia.spotlight.spot.Spotter;

class Test {

	public static void main(String[] args) throws ConfigurationException, InputException, SpottingException {

		String text = new String("President Obama gave a speech.");

		File modelFolder = null;

		try {
			modelFolder = new File("en_2+2");
		} catch (Exception e) {
			e.printStackTrace();
			System.err.println("\n Error");
			System.exit(1);
		}
		
		SpotlightModel db = SpotlightModel.fromFolder(modelFolder);
		ParagraphDisambiguatorJ disambiguator = db.disambiguators().get(DisambiguationPolicy.Default);
		Spotter spotter = db.spotters().get(SpotterPolicy.Default);
		DefaultParagraphAnnotator annotator = new DefaultParagraphAnnotator(spotter, disambiguator);
		List<DBpediaResourceOccurrence> occurences = annotator.annotate(text);

	}
}

For some reason, this gives me the following stacktrace when calling the annotate method.

106013 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Loading FSADictionary...
106349 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Done (335 ms)
106546 [main] INFO org.dbpedia.spotlight.annotate.DefaultParagraphAnnotator - Spotting... (FSA dictionary spotter)
Exception in thread "main" java.util.NoSuchElementException: None.get
	at scala.None$.get(Option.scala:313)
	at scala.None$.get(Option.scala:311)
	at org.dbpedia.spotlight.db.DBSpotter.extract(DBSpotter.scala:48)
	at org.dbpedia.spotlight.annotate.DefaultParagraphAnnotator.annotate(DefaultAnnotator.scala:63)
	at Test.main(Test.java:35)

How to run in localhost server compiling from the newest source code

First I git clone the source code from https://github.com/dbpedia-spotlight/dbpedia-spotlight
But I got a 0.70.jar after maven the project. However it can run successful with the command:

java -jar **********

So I find this repo, and clone to my computer and then use maven to package a jar:
spotlight-1.0-jar-with-dependencies.jar

But when I run the jar, it got an error:

Error: Could not find or load main class org.dbpedia.spotlight.web.rest.Server

Did I forget something to deal with?
Thanks

BTW, According to my test, the version 0.7 has a quite faster speed than the 1.0. I annotate a same text file, one cost 110 seconds and the other is 260 seconds.

download links are broken

none of the download links for the spotlight server or models in the documentation seem to be valid.

Empty type in local server

Hi,

I run the jar file in my local server but i got many empty entity types. Is there any way can help me fix the issue.

Thanks&BR,
kivi

@ Symbol

Is there any particular reason for the @ symbol in the JSON tree of results? It makes iterating through the tree with multiple entities a pain.

Inconsistency in results for German Text.

If I annotate the following text using DBpedia Spotlight Model, the results I get are very different, depending on the length of the text. If the text length is small the recognition is correct. For longer texts, the predictions are not correct.

For example, for the following sample the predictions are correct.

Qualifikationen ||| Branchenkenntnisse ||| Öffentlicher Sektor, Banken, Dienstleistung allgemein, Handel, Luft- und Raumfahrtindustrie , Telekommunikation, Kassenärzt liche Vereinigung, Energieversorger, Erstversicherung (GKV, SHU, K, L), Rückversicherung ||| Sprachkenntnisse ||| Deutsch , ||| Englisch ||| Technische Kenntnisse ||| Anwendungssoftware: ||| MS-Office, MS-Visio, MS-Access, MS-Project ||| Betriebssysteme : ||| Windows, Unix, Linux ||| Datenbanken: ||| Sybase, DB/2, MS-SQLServer, mySQL, Oracle, Cassandra, SAP DB, Informix, MS-Access ||| Programmiersprachen: ||| Java, Javascript, Groovy, Assembler, C, C++, C#, COBOL, Delphi, HTML, PHP, Visual Basic, PL/SQL

However, if the same sample is part of a 3-4 page document that the results are not consistent. For the word Deutsch in such a case I get something like :

<Resource URI="http://de.dbpedia.org/resource/Deutschland" support="172968" types="Wikidata:Q6256,Schema:Place,Schema:Country,DBpedia:PopulatedPlace,DBpedia:Place,DBpedia:Location,DBpedia:Country" surfaceForm="Deutsch" offset="2113" similarityScore="0.9999999998672138" percentageOfSecondRank="1.328211046569277E-10"/>

Why is there a difference in the results? Also, the Resource URI, doesn't respond when accessed separately. Any reason why? Why is there an inconsistency of results?

Upload dbpedia-spotlight-services-1.0.0.jar to maven central repository

Hi is it possible to upload the dbpedia-spotlight-services-1.0.0.jar jar to Maven Central as described here:

This can be done using the POM file: https://github.com/dbpedia-spotlight/dbpedia-spotlight-model/blob/master/pom.xml

Other DBPedia libraries are also listed in maven and searchable via: https://search.maven.org/search?q=org.dbpedia

This will allow people to customize the DBPedia spotlight service by importing it in their java/scala code.

CC: @sandroacoelho @ragnarok85

Dutch Spotlight down

For some reason the dutch spotlight service seems to be down (http://api.dbpedia-spotlight.org/nl/annotate).

Service Unavailable.
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

Is this planned and will it be back online in the future?

Fatal transport error: Connection refused (Connection refused)

I do the following:

1- I download the .jar from: https://downloads.dbpedia-spotlight.org/spotlight/dbpedia-spotlight-1.0.0.jar
2- I donlwoad the model from https://downloads.dbpedia-spotlight.org/2016-10/en/model/en.tar.gz
3- run the nohup java -jar dbpedia-spotlight-1.0.jar en http://localhost:2222/rest &

In the nohuup.out i have the following log:

538 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Loading MemoryQuantizedCountStore...
689 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Done (149 ms)
689 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Loading MemoryTokenTypeStore...
1429 [main] INFO org.dbpedia.spotlight.db.memory.MemoryTokenTypeStore - Creating reverse-lookup for Tokens.
2062 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Done (1372 ms)
2063 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Loading MemorySurfaceFormStore...
42082 [main] INFO org.dbpedia.spotlight.db.memory.MemorySurfaceFormStore - Summing total SF counts.
44690 [main] INFO org.dbpedia.spotlight.db.memory.MemorySurfaceFormStore - Creating reverse-lookup for surface forms, adding normalized surface forms.
45666 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Done (43603 ms)
45667 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Loading MemoryResourceStore...
48382 [main] INFO org.dbpedia.spotlight.db.memory.MemoryResourceStore - Creating reverse-lookup for DBpedia resources.
48996 [main] INFO org.dbpedia.spotlight.db.memory.MemoryResourceStore - Counting total support...
49172 [main] INFO org.dbpedia.spotlight.db.memory.MemoryResourceStore - Done.
49173 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Done (3505 ms)
49174 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Loading MemoryCandidateMapStore...
115755 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Done (66580 ms)
115756 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Loading MemoryContextStore...
132272 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Done (16515 ms)
264874 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Loading FSADictionary...
266088 [main] INFO org.dbpedia.spotlight.db.memory.MemoryStore$ - Done (1214 ms)
266887 [main] INFO org.dbpedia.spotlight.web.rest.Server - Initiated 1 disambiguators.
266888 [main] INFO org.dbpedia.spotlight.web.rest.Server - Initiated 2 spotters.
May 25, 2018 5:45:30 PM com.sun.grizzly.Controller logVersion
INFO: GRIZZLY0001: Starting Grizzly Framework 1.9.48 - 25/05/18 5:45 PM
Server started in /project/6008168/tamouze listening on http://localhost:2222/rest

When i execute my code i get the following exception:

ERROR 2018-05-25 23:40:46,583 org.spotlight.Main.main() [DbSpotlightClient] - Fatal transport error: Connection refused (Connection refused)
ERROR 2018-05-25 23:40:46,583 org.spotlight.Main.main() [DbSpotlightClient] - POST
org.dbpedia.spotlight.exceptions.AnnotationException: Transport error executing HTTP request.
        at org.spotlight.DbAnnotationClient.request(DbAnnotationClient.java:58)
        at org.spotlight.DbSpotlightClient.extract(DbSpotlightClient.java:48)
        at org.spotlight.Main.main(Main.java:14)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:121)
        at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
        at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:326)
        at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:610)
        at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:445)
        at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:835)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
        at org.spotlight.DbAnnotationClient.request(DbAnnotationClient.java:42)

my code is :

public class DbSpotlightClient extends DbAnnotationClient {

//	private  String  API_URL    = "http://api.dbpedia-spotlight.org/en/annotate"; 
 private  String  API_URL    = "http://localhost:2222/rest";
	List<ResourceItem>resources;
	public List<ResourceItem> extract(String text)throws AnnotationException{
		String spotlightResponse=null;
		resources = null;
		try{
			    //client = HttpClientBuilder.create().build();
				client = new DefaultHttpClient();
		        HttpPost post = new HttpPost(API_URL);
		        List<NameValuePair> urlParameters = new ArrayList<NameValuePair>();
		        urlParameters.add(new BasicNameValuePair("text", text));
		        urlParameters.add(new BasicNameValuePair("confidence", "0.25"));
		        post.setEntity(new UrlEncodedFormEntity(urlParameters));
		        post.setHeader("Accept", "application/json");
		        spotlightResponse = request(post);
            Announce.message("spotlightResponse= ", spotlightResponse);
    	} 
		catch(UnsupportedEncodingException e){
	        throw new AnnotationException("Could not encode text.", e);
		} 
		assert spotlightResponse != null;
		
		try{
			AnnotationUnit annotationUnit = new Gson().fromJson(spotlightResponse.toString(), AnnotationUnit.class);
			
			resources = annotationUnit.getResources();
		}catch(Exception e){
			throw new AnnotationException("Received invalid response from DBpedia Spotlight API.");
		}
		return resources;
	}

	public static void main(String[] args) throws Exception {
		Announce.doing("Starting DBpediaSpotlightClient main ");
        DbSpotlightClient c = new DbSpotlightClient ();
        File input = new File("./abstract/test1.txt");
        File output = new File("./datasets/out/marsAnnotated.txt");
        c.evaluate(input, output);
        Announce.done();

	}
}

Noting that if i execute the following:

curl http://localhost:2222/rest/annotate \
  -H "Accept: text/xml" \
  --data-urlencode "text=Brazilian state-run giant oil company Petrobras signed a three-year technology and research cooperation agreement with oil service provider Halliburton." \
  --data "confidence=0" \
  --data "support=0"

We get the following:


<?xml version="1.0" encoding="utf-8"?>
<Annotation text="Brazilian state-run giant oil company Petrobras signed a three-year technology and research cooperation agreement with oil service provider Halliburton." confidence="0.0" support="0" types="" sparql="" policy="whitelist">
<Resources>
<Resource URI="http://dbpedia.org/resource/Brazil" support="96461" types="Wikidata:Q6256,Schema:Place,Schema:Country,DBpedia:PopulatedPlace,DBpedia:Place,DBpedia:Location,DBpedia:Country" surfaceForm="Brazilian" offset="0" similarityScore="0.965930684523081" percentageOfSecondRank="0.034833824868363776"/>
<Resource URI="http://dbpedia.org/resource/Giant_star" support="1230" types="" surfaceForm="giant" offset="20" similarityScore="0.8199720139395985" percentageOfSecondRank="0.19329921512879014"/>
<Resource URI="http://dbpedia.org/resource/Petroleum" support="14717" types="" surfaceForm="oil company" offset="26" similarityScore="0.5771024950036353" percentageOfSecondRank="0.7326844635626644"/>
<Resource URI="http://dbpedia.org/resource/Petrobras" support="771" types="Wikidata:Q43229,Wikidata:Q24229398,DUL:SocialPerson,DUL:Agent,Schema:Organization,DBpedia:Organisation,DBpedia:Company,DBpedia:Agent" surfaceForm="Petrobras" offset="38" similarityScore="1.0" percentageOfSecondRank="0.0"/>
<Resource URI="http://dbpedia.org/resource/Sign_language" support="1927" types="" surfaceForm="signed" offset="48" similarityScore="0.7276535296615254" percentageOfSecondRank="0.17575346113982937"/>
<Resource URI="http://dbpedia.org/resource/Technology" support="12462" types="" surfaceForm="technology" offset="68" similarityScore="0.9914358988874088" percentageOfSecondRank="0.00851189308814701"/>
<Resource URI="http://dbpedia.org/resource/Cooperation" support="714" types="" surfaceForm="cooperation" offset="92" similarityScore="0.718889126469838" percentageOfSecondRank="0.3076037508336602"/>
<Resource URI="http://dbpedia.org/resource/Petroleum" support="14717" types="" surfaceForm="oil" offset="119" similarityScore="0.9495596915391201" percentageOfSecondRank="0.05284578455873624"/>
<Resource URI="http://dbpedia.org/resource/Service_provider" support="304" types="" surfaceForm="service provider" offset="123" similarityScore="0.9387022772639211" percentageOfSecondRank="0.05629789285158675"/>
<Resource URI="http://dbpedia.org/resource/Halliburton" support="707" types="Wikidata:Q43229,Wikidata:Q24229398,DUL:SocialPerson,DUL:Agent,Schema:Organization,DBpedia:Organisation,DBpedia:Company,DBpedia:Agent" surfaceForm="Halliburton" offset="140" similarityScore="0.9999999998892122" percentageOfSecondRank="0.0"/>
</Resources>
</Annotation>

Any help please to identify the problem?

Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile

mvn install -rf :core -U -e
[INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ core ---
[WARNING] Expected all dependencies to require Scala version: 2.13.5
[WARNING] org.dbpedia.extraction:core:3.9 requires scala version: 2.13.5
[WARNING] net.liftweb:lift-json_2.10:2.5 requires scala version: 2.13.5
[WARNING] net.liftweb:lift-json_2.10:2.5 requires scala version: 2.10.0
[WARNING] Multiple versions of scala libraries detected!
[INFO] /Users/shivanginigugalia/dbpedia-spotlight-model/core/src/main/java:-1: info: compiling
[INFO] /Users/shivanginigugalia/dbpedia-spotlight-model/core/src/main/scala:-1: info: compiling
[INFO] Compiling 194 source files to /Users/shivanginigugalia/dbpedia-spotlight-model/core/target/classes at 1615878079059
[ERROR] /Users/shivanginigugalia/dbpedia-spotlight-model/core/src/main/scala/com/officedepot/cdap2/collection/CompactHashMap.scala:47: error: not found: type ClassManifest
[ERROR] class CompactHashMap[K: ClassManifest, V: ClassManifest] () extends scala.collection.mutable.Map[K,V] with Serializable {
[ERROR] ^
[ERROR] /Users/shivanginigugalia/dbpedia-spotlight-model/core/src/main/scala/com/officedepot/cdap2/collection/CompactHashMap.scala:47: error: not found: type ClassManifest
[ERROR] class CompactHashMap[K: ClassManifest, V: ClassManifest] () extends scala.collection.mutable.Map[K,V] with Serializable {
[ERROR] ^
...
...
[WARNING] 5 warnings
[ERROR] 107 errors
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for DBpedia Spotlight Core 1.0:
[INFO]
[INFO] DBpedia Spotlight Core ............................. FAILURE [ 49.191 s]
[INFO] DBpedia Spotlight RESTful API ...................... SKIPPED
[INFO] DBpedia Spotlight Indexing ......................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 49.297 s
[INFO] Finished at: 2021-03-16T12:31:25+05:30
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on project core: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on project core: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)

If I remove scala-compile-first, then this error m getting:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project core: Fatal error compiling: java.lang.ExceptionInInitializerError: com.sun.tools.javac.code.TypeTag

Abbreviations and some others

Hi Sandro!

  1. Can you please confirm that there isn't any parameter except confidence and support that we can change when using web service?

  2. What is the range for support?

  3. Can you please confirm that there aren't any problems/bugs which might affect our results (except type filters that we are not using)

  4. It seems to me that most problematic issue is with abbreviations. Is there any algorithm on Spotlight which deals with "abbreviation detection" specifically? I would expect that at least when there is an expansion of the abbreviation Spotlight should be able to find the correct link for it -which sometime fails. Is there any way of avoiding this issue except increasing confidence?

Not working in Java 9 or 10

Hi,

We are forced to update our java to 10. But after upgrading to java 10, dbpedia spotlight is no longer working. We are getting this error: "ava.lang.TypeNotPresentException: Type javax.xml.bind.JAXBContext not present".

Do you have any suggestions to resolve this?

Thanks,
Jayson

Response 414 - "REQUEST URI TOO LARGE"

I'm getting an error that I'm passing in a document that is to large for Spotlight.

I'm using the python requests module. The document that I'm trying to pass in has 2533 words. I've cleaned it of stop words and sanitized it pretty well. When I cut the document down to 1211 words it works without any issues.

EDIT: I totally forgot that GET has a URL length limit. Is there any way to use a POST?

querystring = {"text": text}
headers = {
'accept': "application/json",
'Cache-Control': "no-cache"
}
response = requests.request("GET", url, headers=headers, params=querystring)

Are there any workarounds?
examplelarge.txt

Not work for Russian language

I tried to use spotlight for Russian language. I set the Host to both "http://api.dbpedia-spotlight.org/ru/annotate" and "http://model.dbpedia-spotlight.org/ru/annotate", but non of them works! I received the error below:
http://api.dbpedia-spotlight.org/ru/annotate Traceback (most recent call last): File "/.../testing.py", line 47, in <module> annot = get_annotations(text) File "/.../testing.py", line 12, in get_annotations confidence=0.4, support=20) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/spotlight/__init__.py", line 192, in annotate pydict = _post_request(address, payload, filters, headers) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/spotlight/__init__.py", line 51, in _post_request response.raise_for_status() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/models.py", line 939, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 502 Server Error: Proxy Error for url: http://api.dbpedia-spotlight.org/ru/annotate
I would be appreciative if you help me figure out how to solve it.

How to link resource to the corresponding text fragment in the original input text from json response

Hi, I am wondering how to relate detected resources with the input text. For example, for the input:
"The Battle of Gettysburg was fought July 1–3, 1863."
I used the following curl command to query a local instance of spotlight:

curl -X POST http://localhost:2222/rest/annotate -H 'accept: application/json' -H 'content-type: application/x-www-form-urlencoded' --data-urlencode "text=The Battle of Gettysburg was fought July 1–3, 1863." --data-urlencode "confidence=0.35"

Which returns the following:

{
   "@text":"The Battle of Gettysburg was fought July 1–3, 1863.",
   "@confidence":"0.35",
   "@support":"0",
   "@types":"",
   "@sparql":"",
   "@policy":"whitelist",
   "Resources":[
      {
         "@URI":"http://dbpedia.org/resource/Battle_of_Gettysburg",
         "@support":"2871",
         "@types":"Wikidata:Q1656682,DUL:Event,Schema:Event,DBpedia:SocietalEvent,DBpedia:MilitaryConflict,DBpedia:Event",
         "@surfaceForm":"Gettysburg",
         "@offset":"14",
         "@similarityScore":"0.9856069793346309",
         "@percentageOfSecondRank":"0.014354886850971043"
      }
   ]
}

From this response, how can I know that the URI http://dbpedia.org/resource/Battle_of_Gettysburg comes from the text "The Battle of Gettysburg"? I was using the surface form, but this field only indicates "Gettysburg". Also, the offset field indicates the start position of "Gettysburg". Could this be a bug?

I've tried to use the demo webpage (https://www.dbpedia-spotlight.org/demo/) in order to check if only "Gettysburg" is hightlighed, but it seems to be down at this moment.

Thanks beforehand.

DBPedia Spotlight is not Working in Java 11

Our project is currently using DBPedia Spotlight on a critical piece of our application but is running on a lower version of Java. We've identified a couple of risks such as 1) Security vulnerabilities and 2) End of Java support for lower JAVA versions thus we need to upgrade to the latest JAVA version (JAVA 11) but we also learned that DBPedia is not working in JAVA 11. We would like to ask if, will there be a version of DBPedia Spotlight compatible with JAVA 11 that will be released soon? Or if not, is there any available commercial support you can provide us to resolve this ASAP? If there's an on call support that we could contact to will be very much appreaciated as well. Hope to recieve feedback from you the soonest.

New installation refusing remote connection

Installed the jar and en model on an ubuntu box with 32gb memory. It comes right up; in a separate console, run the curl on localhost:2222 and it runs fine. Running the same curl against 192.168.0.12:2222 from another box gets Connection Refused. On that ubuntu box, ufw is inactive. using "-v" in the curl simply confirms a connection was tried and failed.

Thanks in advance for any ideas, perhaps something I am missing.

Error creating XML using in house REST server

I am running the in-house RESTful server.

Using dbpeidia spotlight 1.0.0 [1] and model 2016-10 [2].

The server can do annotation most of the time, but sometimes throws:

org.dbpedia.spotlight.exceptions.OutputException: Error creating XML output.Error creating XML output.org.dbpedia.spotlight.web.rest.OutputManager.makeXML(OutputManager.java:108)

One minimum test case to reproduce the error is:

curl http://localhost:2222/en/rest/annotate --data-urlencode "text=Crimean peninsula Black Sea" -H "Accept: application/json"

Interestingly, querying the public dbpedia end point is totally fine:

curl http://api.dbpedia-spotlight.org/en/annotate --data-urlencode "text=Crimean peninsula Black Sea" -H "Accept: application/json"

[1] http://downloads.dbpedia-spotlight.org/spotlight/dbpedia-spotlight-1.0.0.jar
[2] http://downloads.dbpedia-spotlight.org/2016-10/en/model/en.tar.gz

Issue with the Installation

Hi,

I have checked that most of the links are not working.

I wanted to use the service using jar file, but the link is not working. Can you please provide me the JAR file?

Run DBPedia Spotlight with local Sparql-Endpoint

Hi,

I have spent some time reading through all the different wiki pages and tutorials, but I am a bit confused about which possibilities of running DBPedia Spotlight still exist.

I am running my own DBPedia instance on a server using Virtuoso providing me with a local sparql endpoint to access it. I would like to run DBPedia Spotlight and set the sparql endpoint to my local one.

What is the best way to achieve this with the available options for running DBPedia Spotlight?

Update current dependencies

We know that the current stack is old, see https://docs.google.com/presentation/d/1_3Ky3AY-HTWlCN0WGcZHq5rlTM6nnmD1VryU2lT2x4g/edit#slide=id.g260f1373c3_0_36 and https://docs.google.com/document/d/1EYZPN4KmyAhlGPfyRBjiAhBVgCSVzhG0jR-9kQd7v0s/edit by @sandroacoelho for some explanations.

As a complete rewrite is expected (any roadmap ?, and it will take months), first we need to take care of 1.0.x for us, users using it in production :)

By running mvn versions:display-dependency-updates , we can see that tons of dependencies are outdated. I think we can update all minors versions and see what happen.

I already run a spotlight instance in production updated with scala compiler 2.10.6, jersey 1.19.4 and grizzly 1.9.65.

We can also run mvn dependency:analyze to check non used dependencies.

Bad Cases: Some short numbers can be annotated

Hi,
I found that some words such as "02" "-5" and so on can be annotated to some entities, which is very strange. It may be the model that caused the question, which I used in 'en_2.2/'

For example

02: http://dbpedia.org/resource/Digimon_Adventure_02
-5: http://dbpedia.org/resource/Straight-five_engine

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.