Git Product home page Git Product logo

csvy.github.io's Introduction

csvy.org

This repo contains the specs of yaml frontmatter for csv file format on http://csvy.org.

csvy.github.io's People

Contributors

charlesnepote avatar hadley avatar jcolomb avatar jrovegno avatar leeper avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

csvy.github.io's Issues

(New) Ruby CsvReader Library / Gem Supports CSV with meta data block (front matter) in YAML (CSVY) "out-of-the-box"

Hello,
In the latest update v1.1 of the ruby csvreader library / gem - see https://github.com/csvreader/csvreader - I've added support for CSV with meta data block (front matter) in YAML (CSVY) "out-of-the-box". Yes, it's turned on by default in the CSV v1.0 "The Right Way" format / flavor. See https://github.com/csvreader/csvreader/blob/master/test/test_parser_meta.rb for (test) examples. Keep up the great work with CSVY. Cheers. Prost.

PS: Here's the code snippet:

def test_parse
  records = [["a", "b", "c"],
             ["1", "2", "3"]]

  assert_equal records, parser.parse( <<TXT )
# with meta data
## see https://blog.datacite.org/using-yaml-frontmatter-with-csv/
---
columns:
- title: Purchase Date
  type: date
- title: Item
  type: string
- title: Amount (€)
  type: float
---
a,b,c
1,2,3
TXT

  pp parser.meta
  meta = { "columns"=>
             [{"title"=>"Purchase Date", "type"=>"date"},
              {"title"=>"Item",          "type"=>"string"},
              {"title"=>"Amount (€)",    "type"=>"float"}]
         }
  assert_equal meta, parser.meta


  assert_equal records, parser.parse( <<TXT )
# with (empty) meta data
---
---
a,b,c
1,2,3
TXT

  pp parser.meta
  meta = {}
  assert_equal meta, parser.meta



  assert_equal records, parser.parse( <<TXT )
# without meta data
a,b,c
1,2,3
TXT

  assert_nil parser.meta
end

Standard in the metadata format

I am quite excited: metadata one can actually read: sounds cool.

However, how could we/you bring some standars in the metadata format itself? In the example, the date format is not given, shouldn't we try to make some organisation. Maybe with a versioning of the metadata format itself? Going semantic from the start?

sorry if the question is stupid, I am no specialist...

#csvy already taken: how to tweet this?

it is not as stupid as to call a r package "twitteR", but still annoying. Change the name, find a different hashtag which we will make official here, or get one million tweets #csvy, such that other users of the # will change theirs?

(ok now I go to bed...)

Comment block

It's common (though not specified) for lines beginning with # at the start of a CSV file to be treated as comments (with a "citation needed" caveat - I'm not sure which parsers, if any, do this by default).

If the YAML header was to have a '#' at the start of each line, this might be more compatible with parsers that aren't expecting a YAML block?

a completely related strategy to make it human writable

Dear all,

I think finding a way to pool metadata and data into one file is great and the way to go. I love the idea of csvy. Thing is, if the metadata needs a different program to be written, it will done only by techies.

When I saw the new read_csv function from readr, I got an idea... My proposition would be the following, take everything that is good in YAML, but transform its format to be csv. This would be the example:

---,---
name, my-dataset
description, "this is an example of the new csv format, including metadata on top."
author, "Julien Colomb"

fields, name, title, type,desciption, constraints
fields,var1,variable 1, string, explaining var1, required
fields, var2, variable 2, integer
fields, var3, variable 3, number

---,---
var1,var2,var3
A,1,2.5
B,3,4.3

why I think it could be cool:

  • scanning for the first 8 character, if the first 3 are "-" at the beginning, the fourth one is the delimiter, the 8th is the line break. This would make csv file readable, whatever the the delimiter is, and makes compatibility issues (with excel) obsolete. In addition if the first 3 character are not "-", there is no metadata and the program can (try to) read the file as a normal csv.

  • You can read and write the metadata in your normal spreadsheet program.

  • if templates is made for fields names, it is easy for any user to enter metadata, you just copy and paste-transposed the headers and fill the table.

What do you think?

This issue is licenced CC0.

Validation rules for header/value consistency?

Hello,
I am curious if there is a list of requirements for consistency between the header description and the data values?

For example:

  • Does the number of described variables have to match between the header and the CSV section?
  • Do the variable names have to match between header and CSV section?
  • Do variable names have to be distinct from each other?
  • Do variable names have to begin with an alphabetic character, or are these valid variable names: "?Strange_var", "101.5s"
  • Other rules?

On a related note, are there any "best practices" for standard field names in the YAML section? For example, how would someone best describe the units and other standard metadata for the list of variables in the file? If there were such standards, then automatically reading in and using the metadata would be much more straightforward. Otherwise, it seems that all of that potentially useful metadata is not guaranteed to be in any predictable form, and automated readers will be throwing darts to see if it's there.

Also, are there plans for checking variable names and/or units against standards, such as CF-conventions and SI units?

Thanks!

Limit csvy to only one data resource per file

@leeper pointed out: I don't think I would ever encourage creating a single file with multiple tables in it. So, I'd say the new spec looks good except I would limit it to one table per file. As such, I think you could use the spec as currently stated on the webpage AND a simplified version that just contains the dialect element and the fields sub-element of resources.

Reference leeper/csvy/issues/13

WIP in branch bug-19

labview extension

just realise that the data I got out of labview were already of this sort: 21 first lines were metadata, the rest was data (as csv).
It will probably be good to see if the labview metadata could be compatible

I will just upload an example, pasting it here is not working because of "markdown" translations...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.