Git Product home page Git Product logo

reconcile-csv's Introduction

reconcile-csv

A OpenRefine reconciliation service that works from CSV files.

Introduction

Someone just handed you some datasets they claim are related and you should be able to mesh them up easily - no problem you're the data wizard. Except there is no clear unique identifier you could use to join the dataset. All you know is that if several columns are the same, it should be the same. Enter typos and misspellings and here you go: A long night ahead.

Reconcile-CSV aims to ease your pain: It acts as a OpenRefine reconciliation service and performs fuzzy matching to identify related entries.

Usage

Create a column with Unique-ID's you will use to match.

Pre-compiled:

java -Xmx2g -jar reconcile-csv-1.0.1-SNAPSHOT-standalone.jar <file> <primary search column> <column with id's>

With Leiningen:

lein run <file> <primary search column> <column with id's>

Then add http://localhost:8000/reconcile as a reconciliation service to refine. You can add more columns through the reconcile-interface in Refine.

Reconcile away!

Then use:

cell.recon.match.id

to get the ID from the match.

License

Copyright © 2013 Michael Bauer, Open Knowledge Foundation

Distributed under the BSD-2 Clause license. See LICENSE for details

reconcile-csv's People

Contributors

mihi-tr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.