dat / pyner Goto Github PK
View Code? Open in Web Editor NEWPython interface to the Stanford Named Entity Recognizer
License: Other
Python interface to the Stanford Named Entity Recognizer
License: Other
Thank you for publishing your library. Just a suggestion, it would be helpful to cut releases and publish this package to the python 'cheeseshop'
I am willing to help out with this. As releasing is usually done with honors by the maintainer, I wanted to drop an issue here first and gather your thoughts.
Specifically what prompted me to raise this, is that I'm trying to optimize my build on docker and your library is the only entry in my requirements.txt that doesn't have any artifact published on pypi ... Therefore to install pyner one needs to have git installed in their container
Thanks!
I have installed Pyner successfully. However when I run the example, an empty set of entities is returned (indicated below):
$ python
Python 2.7.3 (default, Sep 26 2012, 21:53:58)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import ner
tagger = ner.HttpNER(host='localhost', port=1234)
tagger.get_entities("University of California is located in California, United States")
{}
The command through which i am running Stanford NER is:
java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -port 1234
causes
[Errno 57] Socket is not connected
error on multiple calls (at least on OSX).
You can "fix" this by removing the shutdown code in ner/utils, but perhaps it should be addressed more methodically.
tagger = ner.HttpNER(host='localhost', port=8080)
throws back an AttributeError: 'module' object has no attribute 'HttpNER'
The stanford-ner server finds some strings unparsable. I have relatively dirty data with stray characters like BOM and NULL, not to mention non-US characters like ç, so pyner hangs. Could we have a timeout, perhaps in socket? I'm using ner.SocketNER and I'm surprised and delighted how fast it is - thank you!
Category wishlist
Cheers,
Dave
Hi is it possible to create our own training data, so that we can use the custom Tagger.
When I try to pass a email sting of text I am getting thrown this error.
I can verify my setup works with this returning two PERSON entities
import ner
tagger = ner.SocketNER(port=9191, output_format='slashTags')
t = "My daughter Sophia goes to the university of California. James also goes there"
print(type(t))
test = tagger.get_entities(t)
person_ents = test['PERSON']
for i in person_ents:
print(i)
This outputs as expected
Sophia
James
The only difference is here that I have email text here instead I can verify it's a string
print(type(firstEmail))
test = tagger.get_entities(firstEmail)
person_ents = test['PERSON']
print (type(person_ents))
for i in person_ents:
print(i)
This returns the following error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-79-ff847452c8df> in <module>()
3
4
----> 5 test = tagger.get_entities(firstEmail)
6 person_ents = test['PERSON']
7 print (type(person_ents))
~/anaconda3/envs/nlp/lib/python3.6/site-packages/ner-0.1-py3.6.egg/ner/client.py in get_entities(self, text)
90 else: #inlineXML
91 entities = self.__inlineXML_parse_entities(tagged_text)
---> 92 return self.__collapse_to_dict(entities)
93
94 def json_entities(self, text):
~/anaconda3/envs/nlp/lib/python3.6/site-packages/ner-0.1-py3.6.egg/ner/client.py in __collapse_to_dict(self, pairs)
71 """
72 return dict((first, list(map(itemgetter(1), second))) for (first, second)
---> 73 in groupby(sorted(pairs, key=itemgetter(0)), key=itemgetter(0)))
74
75 def get_entities(self, text):
TypeError: '<' not supported between instances of 'NoneType' and 'str'
Any idea how what's wrong
Consistently, if I have the stanford NER server running on my local machine, and I use pyner to make 20 consecutive queries - it doesn't matter how far apart in time they are - I receive the following error message and trace:
/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ner-0.1-py2.7.egg/ner/client.pyc in get_entities(self, text)
74 :returns: a dict of entity type to list of entities of that type
75 """
---> 76 tagged_text = self.tag_text(text)
77 if self.oformat == 'slashTags':
78 entities = self.__slashTags_parse_entities(tagged_text)
/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ner-0.1-py2.7.egg/ner/client.pyc in tag_text(self, text)
117 with tcpip4_socket(self.host, self.port) as s:
118 s.sendall(text)
--> 119 tagged_text = s.recv(10*len(text))
120 return tagged_text
121
/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.pyc in __exit__(self, type, value, traceback)
22 if type is None:
23 try:
---> 24 self.gen.next()
25 except StopIteration:
26 return
/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ner-0.1-py2.7.egg/ner/utils.pyc in tcpip4_socket(host, port)
15 yield s
16 finally:
---> 17 s.shutdown(socket.SHUT_RDWR)
18 s.close()
19
/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.pyc in meth(name, self, *args)
222
223 def meth(name,self,*args):
--> 224 return getattr(self._sock,name)(*args)
225
226 for _m in _socketmethods:
error: [Errno 57] Socket is not connected
I can use other ner client libraries to make consecutive requests, so I'm doubtful that the issue lies with the server code.
Hi,
When I try using the get_entities functions, I receive the following error.
socket.error: [Errno 10061] No connection could be made because the target machine actively refused it
Googling it, I realized firewall is blocking it. Any idea how I can fix this?
Regards,
Yashwanth
Hey I implemented the following code:
tagger = ner.HttpNER(host='localhost', port=8080)
a = "Kate Walsh, on the cover of More magazine's April issue, appears on The Ellen DeGeneres Show. More magazine Kate Walsh on The Ellen DeGeneres Show 4/14/11"
tagger.get_entities(a)
but I am getting a connection refused error.
Traceback (most recent call last):
File "/Volumes/Privet Drive/Copy/University of Cincinnati/intern/FEM R&D/parse.py", line 16, in <module>
tagger.get_entities(a)
File "build/bdist.macosx-10.9-intel/egg/ner/client.py", line 81, in get_entities
tagged_text = self.tag_text(text)
File "build/bdist.macosx-10.9-intel/egg/ner/client.py", line 165, in tag_text
c.request('POST', self.location, params, headers)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1007, in _send_request
self.endheaders(body)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 829, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 791, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 772, in connect
self.timeout, self.source_address)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 61] Connection refused
Can somebody please point out where I might be going wrong?
On setting the connection with
tagger = ner.HttpNER(host='127.0.0.1', port=631)
and using any sentence in tagger.get_entites("..."),
this is the result returned:
{'TITLE': ['Not Found - CUPS v1.7rc1']}
Hogwarts:pyner-master Akrita$ sudo python setup.py install
running install
running bdist_egg
running egg_info
writing ner.egg-info/PKG-INFO
writing top-level names to ner.egg-info/top_level.txt
writing dependency_links to ner.egg-info/dependency_links.txt
reading manifest file 'ner.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'ner.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.9-intel/egg
running install_lib
running build_py
creating build/bdist.macosx-10.9-intel/egg
copying build/lib/.DS_Store -> build/bdist.macosx-10.9-intel/egg
creating build/bdist.macosx-10.9-intel/egg/ner
copying build/lib/ner/.DS_Store -> build/bdist.macosx-10.9-intel/egg/ner
copying build/lib/ner/init.py -> build/bdist.macosx-10.9-intel/egg/ner
copying build/lib/ner/client.py -> build/bdist.macosx-10.9-intel/egg/ner
copying build/lib/ner/exceptions.py -> build/bdist.macosx-10.9-intel/egg/ner
copying build/lib/ner/utils.py -> build/bdist.macosx-10.9-intel/egg/ner
byte-compiling build/bdist.macosx-10.9-intel/egg/ner/init.py to init.pyc
byte-compiling build/bdist.macosx-10.9-intel/egg/ner/client.py to client.pyc
byte-compiling build/bdist.macosx-10.9-intel/egg/ner/exceptions.py to exceptions.pyc
byte-compiling build/bdist.macosx-10.9-intel/egg/ner/utils.py to utils.pyc
creating build/bdist.macosx-10.9-intel/egg/EGG-INFO
copying ner.egg-info/PKG-INFO -> build/bdist.macosx-10.9-intel/egg/EGG-INFO
copying ner.egg-info/SOURCES.txt -> build/bdist.macosx-10.9-intel/egg/EGG-INFO
copying ner.egg-info/dependency_links.txt -> build/bdist.macosx-10.9-intel/egg/EGG-INFO
copying ner.egg-info/top_level.txt -> build/bdist.macosx-10.9-intel/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/ner-0.1-py2.7.egg' and adding 'build/bdist.macosx-10.9-intel/egg' to it
removing 'build/bdist.macosx-10.9-intel/egg' (and everything under it)
Processing ner-0.1-py2.7.egg
Copying ner-0.1-py2.7.egg to /Library/Python/2.7/site-packages
Adding ner 0.1 to easy-install.pth file
Installed /Library/Python/2.7/site-packages/ner-0.1-py2.7.egg
Processing dependencies for ner==0.1
Finished processing dependencies for ner==0.1
Hogwarts:pyner-master Akrita$
Then when I "import ner" it says no such package.
I am a beginner at python. Can you tell me where might I be going wrong?
I tried to run this query:
tagger.get_entities('University of California is located in California, United States')
and got this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-76-0a5190e4b836> in <module>()
----> 1 tagger.get_entities('University of California is located in California, United States')
~\AppData\Local\Continuum\anaconda3\lib\site-packages\ner-0.1-py3.6.egg\ner\client.py in get_entities(self, text)
89 groupby(entities, key=itemgetter(0)))
90 else: #inlineXML
---> 91 entities = self.__inlineXML_parse_entities(tagged_text)
92 return self.__collapse_to_dict(entities)
93
~\AppData\Local\Continuum\anaconda3\lib\site-packages\ner-0.1-py3.6.egg\ner\client.py in __inlineXML_parse_entities(self, tagged_text)
62 """
63 return (match.groups() for match in
---> 64 INLINEXML_EPATTERN.finditer(tagged_text))
65
66 def __collapse_to_dict(self, pairs):
What causes this? I did not change anything of the code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.