Git Product home page Git Product logo

geotext's People

Contributors

albertc1 avatar blakejakopovic avatar elyase avatar iamnottheway avatar joseluizcoe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geotext's Issues

USA not recognized

"USA" is not being detected. I have to replace "USA" to "United States" in order the country to be detected.

Multiple City Names in capitalized sentences

Oops:

In [1]: from geotext import GeoText

In [2]: places1 = GeoText('I love London and Brussels.')

In [3]: places1.cities
Out[3]: ['London', 'Brussels']

In [4]: places2 = GeoText('I Love London and Brussels.')

In [5]: places2.cities
Out[5]: ['Brussels']

Cities not identified

I have found 2 cities which are not identified in geotext.

The cities "Ventalló" and "Sant Cugat del Vallès" exist in http://www.geonames.org but geotext is not able to find.

GeoText("Ventallo").cities
[]
GeoText("Sant Cugat del Vallés").cities
[]
GeoText("Alcalá de Henares").cities
['Alcalá de Henares']

Not able to derive city names using Geotext library

Hi,

I have texts extracted from certain contexts of files like:

'. Education : 05/2012 DePaul University Graduate School Master of Science in E-Commerce Technology Morgan in E-Commerce Technology Morgan State University 05/88 BS in Computer Science Technical Training or Certifications '

'Information Technology Southern New Hampshire University Expected 2015 Associate of Arts , Graphic Design Penn'

These contain the name of some university along with some city names like 'Morgan city' in the first sentence and 'New Hampshire' in the second sentence. I am using the code mentioned below to extract the city names from the text using the 'geotext' python library:

from geotext import geotext
places = GeoText(sent1) -- or sent2
        print(places.cities)

I had used pip install geotext for the installation on Python 3 Anaconda 3.0 in Windows 7.
The output I am getting is ['University'] and ['University', 'University']. These are clearly not city names.

I would like to mention that the post installation I have had some 'expecting bytes not strings' errors and 'cannot find name GeoText' errors which I corrected manually.

I changed the import statement in init.py to contain geotext instead of GeoText and
I changed the string to string.encode() for the byte array errors.

Melbourne and Bristol coming up as US only...

Hi, I am running single cities through the country_mentions func and both of them are coming up only with "OrderedDict([('US', 1)])"

cities = ['Melbourne', 'Bristol']

for city in cities:
    country_dict = GeoText(city.title()).country_mentions
    print(country_dict)

I understand that these are places in the US, but obviously Melbourne is pretty significant in Australia, as is Bristol in the UK. Should the Dict come back with numerous country mentions?

Thanks!

'UK' is in country mentions

"The official ISO country code for the United Kingdom is 'GB'. The code 'UK' is reserved."
Both UK and GB are returned in my country mentions for some reason. I'm not even sure what the UK ones are from. (I'm using this on a huge file so there's no way to tell what places it's deeming as UK)

Sensitivity to capitalization, punctuation, and places sharing a name.

Hi @elyase this is great work, thanks - very fast. I am encountering a few reliability issues however. Specifically, I am finding that the library is very sensitive to capitalization and punctuation (ignores lowercase, ignores countries if followed by other properly capitalized words) and that it also has trouble disambiguating between multiple places with the same name. For example:

GeoText("France Is A Country").country_mentions
>>OrderedDict()

GeoText("paris France").country_mentions
>>OrderedDict([('FR', 1)])

GeoText("Paris France").country_mentions
>>OrderedDict()

GeoText("Paris, France").country_mentions
>> OrderedDict([('FR', 1), ('US', 1)])

(Presumably because there are also American cities named Paris?)

Just wanted to flag this for future updates...thanks!

Cannot extracts city when sentence is too long.

OS name and version

Windows 10, Version 1803

Any details about your local setup that might be helpful in troubleshooting.

I am doing my ChatBot assignment that can tell user the weather.

So, I using geotext to extract cities from the user input

But I found that when the sentence is too long, it cannot return the city I want.

Detailed steps to reproduce the bug.

Codes:

from geotext import GeoText


def main():
    while True:
        request = input("Enter sentence containing a location: ")
        places = GeoText(request.title())

        print("Cities in the sentence: " + places.cities)


if __name__ == '__main__':
    main()

Output:

Enter sentence containing a location: what is the weather today in kuala lumpur?
Cities in the sentence: ['Kuala Lumpur']
Enter sentence containing a location: please tell me the weather today in New York
Cities in the sentence: ['York']
Enter sentence containing a location: Washington
Cities in the sentence: ['Washington']

Enter sentence containing a location: can you please tell me the weather today in kuala lumpur?
Cities in the sentence: []
Enter sentence containing a location: can you please tell me the weather today in london?
Cities in the sentence: []
Enter sentence containing a location: please tell me the weather today in Washington
Cities in the sentence: []
Enter sentence containing a location: what is the weather in Washington?
Cities in the sentence: []

3 words cities

When i try to recognize some cities with more then two words the city is not recognized.

Examples: Rio de Janeiro, Mar del Plata, Rio das Ostras.

Comma Issue

Hi,

I have encountered a strange issue while testing the library, When I enter following String it gives me back "Parsippany" as the city:

54 Manchester Rd, Parsippany , NJ 07054 07054

but for the following, I dont get any:

54 Manchester Rd Parsippany , NJ 07054 07054

The only thing different is the comma "," between Rd and Parsippany.

Any ideas?

Tests fail

Tests in test folder fail. Please check this out.

Numerous False Negatives

Hello Elyase, very glad you have created and maintained this very useful python library. I'm currently using it to help parse quite a lot of info from the USPTO. Anyway I noticed quite a few errors where the library didn't capture the city and/or country from the string. Here are some examples of strings from the source data I ran the library against where the city and/or country was not picked out. Hopefully these cases can help you improve the library.

INDIANAPOLIS INDIANA.
BARDSLEY, ENGLAND
ST. LOUIS, MO.
WHITING, INDIANA, AND CHICAGO, ILLINOIS.
PHILADELPHIA PA.
LEROY, N.Y.
LYNDONVILLE, VT.
AMENIA, N. Y.
COPPERHILL, TENN.
DETROIT AND JOSEPH CAMPAU AT THE RIVER,MICH.
IVORYTON, CONN.
ST. LOUIS, MO. CORPORATION OF MISSOURI.
OGDENSBURG, N.Y.
NEAR SHEFFIELD, ENGLAND
INDIANAPOLIS IND.
BASLE,
ST. LOUIS, MO. REPUBLISHED BY MONSANTO COMPANY,/ST. LOUIS, MO.
LABORATORY PARK DECATUR, ILL.
1006 OAZA KADOMA, KADOMA-CHO KITAKAWACHI-GUN, OSAKA,
3501 W. 48TH PLACE CHICAGO 32, ILL.
700 BROADWAY NEW YORK, N.Y.
811 WYANDOTTE KANSAS CITY, MO.
835 S. 8TH ST. ST. LOUIS 2, MO.
47/51 EXMOUTH MARKET, ROSEBERRY AVE. LONDON E.C.1, ENGLAND
1407 CUMMINGS DRIVE RICHMOND 20, VA.

Case insensitive option

Hi,

First of all thanks for your work, it works well and it's really useful.

What do you think of an option to make a case insensitive search for city/country names. I'm trying to do it for my project, I can send you a PR if you want and I succeed.

UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

from geotext import GeoText
places = GeoText("London is a great city")
places.cities
GeoText('New York, Texas, and also China').country_mentions

My computer system is Windows 10... The code fragment is mentioned above. Then it throws an error:

"D:\Program Files\Python3\python.exe" D:/OneDrive/Programs/Jieba/ExtractLocation.py
Traceback (most recent call last):

File "D:/OneDrive/Programs/Jieba/ExtractLocation.py", line 20, in
from geotext import GeoText

File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext_init_.py", line 7, in
from .geotext import GeoText
File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 87, in
class GeoText(object):
File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 103, in GeoText
index = build_index()
File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 74, in build_index
get_data_path('countryInfo.txt'), usecols=[4, 0], skip=1)
File "C:\Users\Du Fei\AppData\Roaming\Python\Python36\site-packages\geotext\geotext.py", line 48, in read_table
next(f)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.