Git Product home page Git Product logo

pleiades-dump's Introduction

Pleiades Data Dumps

This package is used to dump abridged Pleiades gazetteer content to Comma-Separated Values files on a recurring basis. More detailed documentation can be found in docs/README.txt

pleiades-dump's People

Contributors

alecpm avatar cguardia avatar davisagli avatar paregorios avatar sgillies avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pleiades-dump's Issues

featureType or featureTypes in location?

The docs here say that table location has a field called featureTypes. But in the most recent version of the dumps, the column seems to be named featureType. Probably the docs should updated accordingly? Or should the csv files should be changed, since it is comma separated and Places does have a column called featureTypes.

Add coordinate precision to dumps

Since dumps come from the catalog, this will require cataloging of the precision value. Raises the question of whether positional accuracy precision can be merged with our other sense of precise vs rough.

Remove \r\n from inside fields?

Some records in the dump have new lines inside fields. This is ok when you load the whole CSV as a spreadsheet in order to work on it, but it means that lots of command-line tools will have trouble with the file. For example, I was using grep to select entries that had a featureType of temple and the new lines meant that those entries got split mid-field.

For example, pleiades place 786059 has a new line in the description field, which shows up as \r\n in the json.

Any way to represent the new line in a different way inside fields in the dump to allow use of such tools? (It's not clear to me that the new lines are needed/desirable anyway.)

Which columns to use in order to join the 3 data sets?

Hi

Can you please indicate how each of the 3 data sets are to be joined with each other - the column names - and whether it is normal to expect duplicate records if the data sets are merged?

I was merging the location and names data sets to places data set using pid = path, but I end up getting duplicate rows.

Thanks a lot

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.