csvy / csvy.github.io Goto Github PK

View Code? Open in Web Editor NEW

62.0 8.0 7.0 506 KB

CSVY yaml frontmatter for csv file format

Home Page: http://csvy.org

License: Other

HTML 12.76% JavaScript 4.42% CSS 82.81%

csvy.github.io's Introduction

csvy.org

This repo contains the specs of yaml frontmatter for csv file format on http://csvy.org.

csvy.github.io's People

Contributors

Stargazers

Watchers

Forkers

jcolomb kwcityhk jrovegno charlesnepote leeper zhangangus pushpak0209

csvy.github.io's Issues

(New) Ruby CsvReader Library / Gem Supports CSV with meta data block (front matter) in YAML (CSVY) "out-of-the-box"

Hello,
In the latest update v1.1 of the ruby csvreader library / gem - see https://github.com/csvreader/csvreader - I've added support for CSV with meta data block (front matter) in YAML (CSVY) "out-of-the-box". Yes, it's turned on by default in the CSV v1.0 "The Right Way" format / flavor. See https://github.com/csvreader/csvreader/blob/master/test/test_parser_meta.rb for (test) examples. Keep up the great work with CSVY. Cheers. Prost.

PS: Here's the code snippet:

def test_parse
  records = [["a", "b", "c"],
             ["1", "2", "3"]]

  assert_equal records, parser.parse( <<TXT )
# with meta data
## see https://blog.datacite.org/using-yaml-frontmatter-with-csv/
---
columns:
- title: Purchase Date
  type: date
- title: Item
  type: string
- title: Amount (€)
  type: float
---
a,b,c
1,2,3
TXT

  pp parser.meta
  meta = { "columns"=>
             [{"title"=>"Purchase Date", "type"=>"date"},
              {"title"=>"Item",          "type"=>"string"},
              {"title"=>"Amount (€)",    "type"=>"float"}]
         }
  assert_equal meta, parser.meta


  assert_equal records, parser.parse( <<TXT )
# with (empty) meta data
---
---
a,b,c
1,2,3
TXT

  pp parser.meta
  meta = {}
  assert_equal meta, parser.meta



  assert_equal records, parser.parse( <<TXT )
# without meta data
a,b,c
1,2,3
TXT

  assert_nil parser.meta
end

Support ... to end YAML header

The YAML specification uses --- to start a (first or next) YAML document and ... to end a document.

YAML blocks in Pandoc Markdown start with --- and end with --- or ...

So YAML headers in CSVY should also start with --- and end with any of --- and ...

Standard in the metadata format

I am quite excited: metadata one can actually read: sounds cool.

However, how could we/you bring some standars in the metadata format itself? In the example, the date format is not given, shouldn't we try to make some organisation. Maybe with a versioning of the metadata format itself? Going semantic from the start?

sorry if the question is stupid, I am no specialist...

http://csvy.org/ not working

I get:
ERR_NAME_NOT_RESOLVED.

Align with csvw spec

Closely related, but more complex and web-focused: http://w3c.github.io/csvw/metadata/

Explicit license for csvy format and website content

We need to know the legal conditions behind csvy usage and the website contents.

I would find Creative Commons BY 4.0 interesting.

#csvy already taken: how to tweet this?

it is not as stupid as to call a r package "twitteR", but still annoying. Change the name, find a different hashtag which we will make official here, or get one million tweets #csvy, such that other users of the # will change theirs?

(ok now I go to bed...)

Inline Table Schema on csvy.org

I think the website would benefit from inlining more content rather than linking to it. I think that will aid adoption.

JSON Table Schema renamed to Table Schema

frictionlessdata/datapackage#334

http://specs.frictionlessdata.io/table-schema/

Comment block

It's common (though not specified) for lines beginning with # at the start of a CSV file to be treated as comments (with a "citation needed" caveat - I'm not sure which parsers, if any, do this by default).

If the YAML header was to have a '#' at the start of each line, this might be more compatible with parsers that aren't expecting a YAML block?

a completely related strategy to make it human writable

Dear all,

I think finding a way to pool metadata and data into one file is great and the way to go. I love the idea of csvy. Thing is, if the metadata needs a different program to be written, it will done only by techies.

When I saw the new read_csv function from readr, I got an idea... My proposition would be the following, take everything that is good in YAML, but transform its format to be csv. This would be the example:

---,---
name, my-dataset
description, "this is an example of the new csv format, including metadata on top."
author, "Julien Colomb"

fields, name, title, type,desciption, constraints
fields,var1,variable 1, string, explaining var1, required
fields, var2, variable 2, integer
fields, var3, variable 3, number

---,---
var1,var2,var3
A,1,2.5
B,3,4.3

why I think it could be cool:

scanning for the first 8 character, if the first 3 are "-" at the beginning, the fourth one is the delimiter, the 8th is the line break. This would make csv file readable, whatever the the delimiter is, and makes compatibility issues (with excel) obsolete. In addition if the first 3 character are not "-", there is no metadata and the program can (try to) read the file as a normal csv.
You can read and write the metadata in your normal spreadsheet program.
if templates is made for fields names, it is easy for any user to enter metadata, you just copy and paste-transposed the headers and fill the table.

What do you think?

This issue is licenced CC0.

More CSV Inline Meta Data Format Alternatives @ csv,specs

Hello, might be of interest to you (and yes, it includes front matter in YAML too). I've started a collection of inline meta data formats (that are supported by the csvreader library in ruby). For now the formats include: CSV in CSV, Attribute-Relation "Classic", Attribute- Relation "Inline" and, yes, front matter in YAML.
See https://github.com/csvspecs/csv-meta for all formats. Cheers. Prost.

Include CSV Dialect Description Format in csvy

Why not include CSV Dialect Description Format in csvy?

It can be also easily expressed in YAML syntax, isn't it?

Validation rules for header/value consistency?

Hello,
I am curious if there is a list of requirements for consistency between the header description and the data values?

For example:

Does the number of described variables have to match between the header and the CSV section?
Do the variable names have to match between header and CSV section?
Do variable names have to be distinct from each other?
Do variable names have to begin with an alphabetic character, or are these valid variable names: "?Strange_var", "101.5s"
Other rules?

On a related note, are there any "best practices" for standard field names in the YAML section? For example, how would someone best describe the units and other standard metadata for the list of variables in the file? If there were such standards, then automatically reading in and using the metadata would be much more straightforward. Otherwise, it seems that all of that potentially useful metadata is not guaranteed to be in any predictable form, and automated readers will be throwing darts to see if it's there.

Also, are there plans for checking variable names and/or units against standards, such as CF-conventions and SI units?

Thanks!

Limit csvy to only one data resource per file

@leeper pointed out: I don't think I would ever encourage creating a single file with multiple tables in it. So, I'd say the new spec looks good except I would limit it to one table per file. As such, I think you could use the spec as currently stated on the webpage AND a simplified version that just contains the dialect element and the fields sub-element of resources.

Reference leeper/csvy/issues/13

WIP in branch bug-19

Create minimal code to open csvy in pandas

This issue is related with
pandas-dev/pandas#9613

labview extension

just realise that the data I got out of labview were already of this sort: 21 first lines were metadata, the rest was data (as csv).
It will probably be good to see if the labview metadata could be compatible

I will just upload an example, pasting it here is not working because of "markdown" translations...