cirad-astre / topomatch Goto Github PK
View Code? Open in Web Editor NEWMatch toponyms by similarity
Match toponyms by similarity
While the local-alignment algorithm currently used seems to perform better than alternatives such as fuzzy-matching or distance-based methods in most cases, sometimes it fails to match pretty obvious terms.
Implement other competing methods to be used either as an alternative or simultaneously. If simultaneously, we can display the best guesses from each method to be presented as alternatives to the user for the manual fixes.
I'm currently performing transliteration from UTF-8 to ASCII for standardising toponyms.
However, in Windows systems with some extended ASCII encoding (latin-1, ISO-8859-1, whatever) this have given some problems.
Test this and implement a more robust approach that takes into account the current locale.
Given a topomatch
object, write a function that walks the user interactively through all the unmatched toponyms, presenting the candidate terms ordered by score and letting the user pick the right choice (or NA for no match).
Return a named vector with fixes suitable to be included as the fixes
argument of transcribe()
.
Use case: need to match both province and district names for two sources.
Idea: additional argument to topomatch()
that performs matches by
corresponding groups given two vectors of corresponding province names.
This is a bit fragile. Think about it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.