Git Product home page Git Product logo

Comments (3)

charlieg avatar charlieg commented on July 19, 2024

Very strange. The older version of CLAVIN running on the web demo (circa 2012?) handles this input just fine, but the most recent version I've got yields the same incorrect output you're seeing. It's either a problem in the gazetteer data, or the result of some algorithmic tweaks.

from cliff-annotator.

rahulbot avatar rahulbot commented on July 19, 2024

Our algorithm is failing this case. It is picking Bangor, NIR, GB because it has a bigger population. Since the US isn't mentioned, pass 2 of our heuristic isn't catching Maine. One approach to fixing would be to grab exact match states after picking countries. We'd have to test against our test corpuses to see if that improves or reduces accuracy.

10:41:54.530 [main] DEBUG o.m.c.p.d.HeuristicDisambiguationStrategy - Starting with 2 lists to do:
10:41:54.530 [main] DEBUG o.m.c.p.d.HeuristicDisambiguationStrategy -   Location: Bangor@14
10:41:54.530 [main] DEBUG o.m.c.p.d.GenericPass -     2656396 Bangor, NIR, GB / 1.0 / 60385 / P ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     4957280 Bangor, ME, US / 1.0 / 33039 / P ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     2656397 Bangor, WLS, GB / 1.0 / 15449 / P ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     5178996 Bangor, PA, US / 1.0 / 5273 / P ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     4984863 Bangor, MI, US / 1.0 / 1885 / P ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     5244626 Bangor, WI, US / 1.0 / 1459 / P ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     6614461 Bangor, A2, FR / 1.0 / 926 / A ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     3035146 Bangor, A2, FR / 1.0 / 782 / P ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     5326001 Bangor, CA, US / 1.0 / 646 / P ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     2966367 Bangor, C, IE / 1.0 / 293 / P ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.HeuristicDisambiguationStrategy -   Location: Maine@22
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     4971068 Maine, ME, US / 1.0 / 1325518 / A ( isExactMatch=true )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     2996665 Maine-de-Boixe, B7, FR / 1.0 / 397 / P ( isExactMatch=false )
10:41:54.531 [main] DEBUG o.m.c.p.d.GenericPass -     6064408 Maine, 10, CA / 1.0 / 0 / L ( isExactMatch=true )
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass -     2996667 Maine, B5, FR / 1.0 / 0 / H ( isExactMatch=true )
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass -     2996668 Maine, B5, FR / 1.0 / 0 / H ( isExactMatch=true )
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass -     2996669 Maine, B5, FR / 1.0 / 0 / H ( isExactMatch=true )
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass -     2996670 Maine, B7, FR / 1.0 / 0 / H ( isExactMatch=true )
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass -     2996671 Maine, B5, FR / 1.0 / 0 / L ( isExactMatch=true )
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass -     2441450 Mainé, 04, NE / 1.0 / 0 / P ( isExactMatch=false )
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass -     2428692 Maïné, 05, TD / 1.0 / 0 / P ( isExactMatch=false )
10:41:54.532 [main] DEBUG o.m.c.p.d.MultiplePassChain - Pass 0: Pick large areas
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass - Still have 2 lists to do
10:41:54.532 [main] DEBUG o.m.c.p.d.MultiplePassChain - Pass 1: Pick countries that might not be an exact match
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass - Still have 2 lists to do
10:41:54.532 [main] DEBUG o.m.c.p.d.MultiplePassChain - Pass 2: Looking for top populated exact match in same countries as best results so far
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass -   PICKED: Bangor@14
10:41:54.532 [main] DEBUG o.m.c.p.d.GenericPass -     2656396 Bangor, NIR, GB / 1.0 / 60385 / P ( isExactMatch=true )
10:41:54.533 [main] DEBUG o.m.c.p.d.GenericPass - Still have 1 lists to do
10:41:54.533 [main] DEBUG o.m.c.p.d.MultiplePassChain - Pass 3: Pick the top Admin Region or Populated Place remaining that is in a country we found already
10:41:54.533 [main] DEBUG o.m.c.p.d.GenericPass -   PICKED: Bangor@14
10:41:54.533 [main] DEBUG o.m.c.p.d.GenericPass -     2656396 Bangor, NIR, GB / 1.0 / 60385 / P ( isExactMatch=true )
10:41:54.533 [main] DEBUG o.m.c.p.d.GenericPass - Still have 1 lists to do
10:41:54.533 [main] DEBUG o.m.c.p.d.MultiplePassChain - Pass 4: Pick the top Admin Region or Populated Place remaining
10:41:54.533 [main] DEBUG o.m.c.p.d.GenericPass -   PICKED: Bangor@14
10:41:54.533 [main] DEBUG o.m.c.p.d.GenericPass -     2656396 Bangor, NIR, GB / 1.0 / 60385 / P ( isExactMatch=true )
10:41:54.533 [main] DEBUG o.m.c.p.d.GenericPass -   PICKED: Maine@22
10:41:54.533 [main] DEBUG o.m.c.p.d.GenericPass -     4971068 Maine, ME, US / 1.0 / 1325518 / A ( isExactMatch=true )
10:41:54.533 [main] DEBUG o.m.c.p.d.GenericPass - Still have 0 lists to do

from cliff-annotator.

rahulbot avatar rahulbot commented on July 19, 2024

fixed in release v1.3.0

from cliff-annotator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.