Git Product home page Git Product logo

topomatch's People

Contributors

famuvie avatar

Watchers

 avatar  avatar

topomatch's Issues

Include alternative matching algorithms

While the local-alignment algorithm currently used seems to perform better than alternatives such as fuzzy-matching or distance-based methods in most cases, sometimes it fails to match pretty obvious terms.

Implement other competing methods to be used either as an alternative or simultaneously. If simultaneously, we can display the best guesses from each method to be presented as alternatives to the user for the manual fixes.

Transliteration on systems with extended ASCII encodings

I'm currently performing transliteration from UTF-8 to ASCII for standardising toponyms.
However, in Windows systems with some extended ASCII encoding (latin-1, ISO-8859-1, whatever) this have given some problems.

Test this and implement a more robust approach that takes into account the current locale.

Function to interactively build fixes

Given a topomatch object, write a function that walks the user interactively through all the unmatched toponyms, presenting the candidate terms ordered by score and letting the user pick the right choice (or NA for no match).

Return a named vector with fixes suitable to be included as the fixes argument of transcribe().

Handle hierarchical layers of toponyms

Use case: need to match both province and district names for two sources.

  1. Match province names
  2. Match district names, within provinces. (this functionality is missing)

Idea: additional argument to topomatch() that performs matches by corresponding groups given two vectors of corresponding province names.
This is a bit fragile. Think about it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.