Git Product home page Git Product logo

pyner's People

Contributors

chyikwei avatar dat avatar gabriel4649 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyner's Issues

devise release strategy and deploy to pypi

Thank you for publishing your library. Just a suggestion, it would be helpful to cut releases and publish this package to the python 'cheeseshop'

I am willing to help out with this. As releasing is usually done with honors by the maintainer, I wanted to drop an issue here first and gather your thoughts.

Specifically what prompted me to raise this, is that I'm trying to optimize my build on docker and your library is the only entry in my requirements.txt that doesn't have any artifact published on pypi ... Therefore to install pyner one needs to have git installed in their container

Thanks!

Empty set of entities

I have installed Pyner successfully. However when I run the example, an empty set of entities is returned (indicated below):

$ python
Python 2.7.3 (default, Sep 26 2012, 21:53:58)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import ner
tagger = ner.HttpNER(host='localhost', port=1234)
tagger.get_entities("University of California is located in California, United States")
{}

The command through which i am running Stanford NER is:
java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -port 1234

premature socket shutdown

causes
[Errno 57] Socket is not connected
error on multiple calls (at least on OSX).

You can "fix" this by removing the shutdown code in ner/utils, but perhaps it should be addressed more methodically.

Attribute Error

tagger = ner.HttpNER(host='localhost', port=8080)
throws back an AttributeError: 'module' object has no attribute 'HttpNER'

Timeout

The stanford-ner server finds some strings unparsable. I have relatively dirty data with stray characters like BOM and NULL, not to mention non-US characters like ç, so pyner hangs. Could we have a timeout, perhaps in socket? I'm using ner.SocketNER and I'm surprised and delighted how fast it is - thank you!

Category wishlist

Cheers,
Dave

Custom Tagger

Hi is it possible to create our own training data, so that we can use the custom Tagger.

TypeError: '<' not supported between instances of 'NoneType' and 'str'

When I try to pass a email sting of text I am getting thrown this error.

I can verify my setup works with this returning two PERSON entities

import ner
tagger = ner.SocketNER(port=9191, output_format='slashTags')
t = "My daughter Sophia goes to the university of California. James also goes there"
print(type(t))
test = tagger.get_entities(t)
person_ents = test['PERSON']
for i in person_ents:
    print(i)

This outputs as expected

Sophia
James

The only difference is here that I have email text here instead I can verify it's a string

print(type(firstEmail))

test = tagger.get_entities(firstEmail)
person_ents = test['PERSON']
print (type(person_ents))
for i in person_ents:
    print(i)

This returns the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-79-ff847452c8df> in <module>()
      3 
      4 
----> 5 test = tagger.get_entities(firstEmail)
      6 person_ents = test['PERSON']
      7 print (type(person_ents))

~/anaconda3/envs/nlp/lib/python3.6/site-packages/ner-0.1-py3.6.egg/ner/client.py in get_entities(self, text)
     90         else: #inlineXML
     91             entities = self.__inlineXML_parse_entities(tagged_text)
---> 92         return self.__collapse_to_dict(entities)
     93 
     94     def json_entities(self, text):

~/anaconda3/envs/nlp/lib/python3.6/site-packages/ner-0.1-py3.6.egg/ner/client.py in __collapse_to_dict(self, pairs)
     71         """
     72         return dict((first, list(map(itemgetter(1), second))) for (first, second)
---> 73             in groupby(sorted(pairs, key=itemgetter(0)), key=itemgetter(0)))
     74 
     75     def get_entities(self, text):

TypeError: '<' not supported between instances of 'NoneType' and 'str'

Any idea how what's wrong

Error 57 after 20 queries using SocketNER

Consistently, if I have the stanford NER server running on my local machine, and I use pyner to make 20 consecutive queries - it doesn't matter how far apart in time they are - I receive the following error message and trace:

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ner-0.1-py2.7.egg/ner/client.pyc in get_entities(self, text)
     74         :returns: a dict of entity type to list of entities of that type
     75         """
---> 76         tagged_text = self.tag_text(text)
     77         if self.oformat == 'slashTags':
     78             entities = self.__slashTags_parse_entities(tagged_text)

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ner-0.1-py2.7.egg/ner/client.pyc in tag_text(self, text)
    117         with tcpip4_socket(self.host, self.port) as s:
    118             s.sendall(text)
--> 119             tagged_text = s.recv(10*len(text))
    120         return tagged_text
    121 

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.pyc in __exit__(self, type, value, traceback)
     22         if type is None:
     23             try:
---> 24                 self.gen.next()
     25             except StopIteration:
     26                 return

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/ner-0.1-py2.7.egg/ner/utils.pyc in tcpip4_socket(host, port)
     15         yield s
     16     finally:
---> 17         s.shutdown(socket.SHUT_RDWR)
     18         s.close()
     19 

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.pyc in meth(name, self, *args)
    222 
    223 def meth(name,self,*args):
--> 224     return getattr(self._sock,name)(*args)
    225 
    226 for _m in _socketmethods:

error: [Errno 57] Socket is not connected

I can use other ner client libraries to make consecutive requests, so I'm doubtful that the issue lies with the server code.

socket.error: [Errno 10061]

Hi,

When I try using the get_entities functions, I receive the following error.

socket.error: [Errno 10061] No connection could be made because the target machine actively refused it

Googling it, I realized firewall is blocking it. Any idea how I can fix this?

Regards,
Yashwanth

error: [Errno 61] Connection refused

Hey I implemented the following code:

tagger = ner.HttpNER(host='localhost', port=8080)
a = "Kate Walsh, on the cover of More magazine's April issue, appears on The Ellen DeGeneres Show. More magazine Kate Walsh on The Ellen DeGeneres Show 4/14/11"

tagger.get_entities(a)

but I am getting a connection refused error.

Traceback (most recent call last):
  File "/Volumes/Privet Drive/Copy/University of Cincinnati/intern/FEM R&D/parse.py", line 16, in <module>
    tagger.get_entities(a)
  File "build/bdist.macosx-10.9-intel/egg/ner/client.py", line 81, in get_entities
    tagged_text = self.tag_text(text)
  File "build/bdist.macosx-10.9-intel/egg/ner/client.py", line 165, in tag_text
    c.request('POST', self.location, params, headers)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 973, in request
    self._send_request(method, url, body, headers)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1007, in _send_request
    self.endheaders(body)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 969, in endheaders
    self._send_output(message_body)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 829, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 791, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 772, in connect
    self.timeout, self.source_address)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 61] Connection refused

Can somebody please point out where I might be going wrong?

"No module named Ner"

Hogwarts:pyner-master Akrita$ sudo python setup.py install
running install
running bdist_egg
running egg_info
writing ner.egg-info/PKG-INFO
writing top-level names to ner.egg-info/top_level.txt
writing dependency_links to ner.egg-info/dependency_links.txt
reading manifest file 'ner.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'ner.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.9-intel/egg
running install_lib
running build_py
creating build/bdist.macosx-10.9-intel/egg
copying build/lib/.DS_Store -> build/bdist.macosx-10.9-intel/egg
creating build/bdist.macosx-10.9-intel/egg/ner
copying build/lib/ner/.DS_Store -> build/bdist.macosx-10.9-intel/egg/ner
copying build/lib/ner/init.py -> build/bdist.macosx-10.9-intel/egg/ner
copying build/lib/ner/client.py -> build/bdist.macosx-10.9-intel/egg/ner
copying build/lib/ner/exceptions.py -> build/bdist.macosx-10.9-intel/egg/ner
copying build/lib/ner/utils.py -> build/bdist.macosx-10.9-intel/egg/ner
byte-compiling build/bdist.macosx-10.9-intel/egg/ner/init.py to init.pyc
byte-compiling build/bdist.macosx-10.9-intel/egg/ner/client.py to client.pyc
byte-compiling build/bdist.macosx-10.9-intel/egg/ner/exceptions.py to exceptions.pyc
byte-compiling build/bdist.macosx-10.9-intel/egg/ner/utils.py to utils.pyc
creating build/bdist.macosx-10.9-intel/egg/EGG-INFO
copying ner.egg-info/PKG-INFO -> build/bdist.macosx-10.9-intel/egg/EGG-INFO
copying ner.egg-info/SOURCES.txt -> build/bdist.macosx-10.9-intel/egg/EGG-INFO
copying ner.egg-info/dependency_links.txt -> build/bdist.macosx-10.9-intel/egg/EGG-INFO
copying ner.egg-info/top_level.txt -> build/bdist.macosx-10.9-intel/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/ner-0.1-py2.7.egg' and adding 'build/bdist.macosx-10.9-intel/egg' to it
removing 'build/bdist.macosx-10.9-intel/egg' (and everything under it)
Processing ner-0.1-py2.7.egg
Copying ner-0.1-py2.7.egg to /Library/Python/2.7/site-packages
Adding ner 0.1 to easy-install.pth file

Installed /Library/Python/2.7/site-packages/ner-0.1-py2.7.egg
Processing dependencies for ner==0.1
Finished processing dependencies for ner==0.1
Hogwarts:pyner-master Akrita$

Then when I "import ner" it says no such package.

I am a beginner at python. Can you tell me where might I be going wrong?

cannot use a string pattern on a bytes-like object

I tried to run this query:
tagger.get_entities('University of California is located in California, United States')

and got this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-76-0a5190e4b836> in <module>()
----> 1 tagger.get_entities('University of California is located in California, United States')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\ner-0.1-py3.6.egg\ner\client.py in get_entities(self, text)
     89                 groupby(entities, key=itemgetter(0)))
     90         else: #inlineXML
---> 91             entities = self.__inlineXML_parse_entities(tagged_text)
     92         return self.__collapse_to_dict(entities)
     93 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\ner-0.1-py3.6.egg\ner\client.py in __inlineXML_parse_entities(self, tagged_text)
     62         """
     63         return (match.groups() for match in
---> 64             INLINEXML_EPATTERN.finditer(tagged_text))
     65 
     66     def __collapse_to_dict(self, pairs):

What causes this? I did not change anything of the code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.