This repo contains the specs of yaml frontmatter for csv file format on http://csvy.org.
csvy / csvy.github.io Goto Github PK
View Code? Open in Web Editor NEWCSVY yaml frontmatter for csv file format
Home Page: http://csvy.org
License: Other
CSVY yaml frontmatter for csv file format
Home Page: http://csvy.org
License: Other
This repo contains the specs of yaml frontmatter for csv file format on http://csvy.org.
Hello,
In the latest update v1.1 of the ruby csvreader library / gem - see https://github.com/csvreader/csvreader - I've added support for CSV with meta data block (front matter) in YAML (CSVY) "out-of-the-box". Yes, it's turned on by default in the CSV v1.0 "The Right Way" format / flavor. See https://github.com/csvreader/csvreader/blob/master/test/test_parser_meta.rb for (test) examples. Keep up the great work with CSVY. Cheers. Prost.
PS: Here's the code snippet:
def test_parse
records = [["a", "b", "c"],
["1", "2", "3"]]
assert_equal records, parser.parse( <<TXT )
# with meta data
## see https://blog.datacite.org/using-yaml-frontmatter-with-csv/
---
columns:
- title: Purchase Date
type: date
- title: Item
type: string
- title: Amount (€)
type: float
---
a,b,c
1,2,3
TXT
pp parser.meta
meta = { "columns"=>
[{"title"=>"Purchase Date", "type"=>"date"},
{"title"=>"Item", "type"=>"string"},
{"title"=>"Amount (€)", "type"=>"float"}]
}
assert_equal meta, parser.meta
assert_equal records, parser.parse( <<TXT )
# with (empty) meta data
---
---
a,b,c
1,2,3
TXT
pp parser.meta
meta = {}
assert_equal meta, parser.meta
assert_equal records, parser.parse( <<TXT )
# without meta data
a,b,c
1,2,3
TXT
assert_nil parser.meta
end
The YAML specification uses ---
to start a (first or next) YAML document and ...
to end a document.
YAML blocks in Pandoc Markdown start with ---
and end with ---
or ...
So YAML headers in CSVY should also start with ---
and end with any of ---
and ...
I am quite excited: metadata one can actually read: sounds cool.
However, how could we/you bring some standars in the metadata format itself? In the example, the date format is not given, shouldn't we try to make some organisation. Maybe with a versioning of the metadata format itself? Going semantic from the start?
sorry if the question is stupid, I am no specialist...
I get:
ERR_NAME_NOT_RESOLVED.
Closely related, but more complex and web-focused: http://w3c.github.io/csvw/metadata/
We need to know the legal conditions behind csvy usage and the website contents.
I would find Creative Commons BY 4.0 interesting.
it is not as stupid as to call a r package "twitteR", but still annoying. Change the name, find a different hashtag which we will make official here, or get one million tweets #csvy, such that other users of the # will change theirs?
(ok now I go to bed...)
I think the website would benefit from inlining more content rather than linking to it. I think that will aid adoption.
It's common (though not specified) for lines beginning with #
at the start of a CSV file to be treated as comments (with a "citation needed" caveat - I'm not sure which parsers, if any, do this by default).
If the YAML header was to have a '#' at the start of each line, this might be more compatible with parsers that aren't expecting a YAML block?
Dear all,
I think finding a way to pool metadata and data into one file is great and the way to go. I love the idea of csvy. Thing is, if the metadata needs a different program to be written, it will done only by techies.
When I saw the new read_csv function from readr, I got an idea... My proposition would be the following, take everything that is good in YAML, but transform its format to be csv. This would be the example:
---,---
name, my-dataset
description, "this is an example of the new csv format, including metadata on top."
author, "Julien Colomb"
fields, name, title, type,desciption, constraints
fields,var1,variable 1, string, explaining var1, required
fields, var2, variable 2, integer
fields, var3, variable 3, number
---,---
var1,var2,var3
A,1,2.5
B,3,4.3
why I think it could be cool:
scanning for the first 8 character, if the first 3 are "-" at the beginning, the fourth one is the delimiter, the 8th is the line break. This would make csv file readable, whatever the the delimiter is, and makes compatibility issues (with excel) obsolete. In addition if the first 3 character are not "-", there is no metadata and the program can (try to) read the file as a normal csv.
You can read and write the metadata in your normal spreadsheet program.
if templates is made for fields names, it is easy for any user to enter metadata, you just copy and paste-transposed the headers and fill the table.
What do you think?
This issue is licenced CC0.
Hello, might be of interest to you (and yes, it includes front matter in YAML too). I've started a collection of inline meta data formats (that are supported by the csvreader library in ruby). For now the formats include: CSV in CSV, Attribute-Relation "Classic", Attribute- Relation "Inline" and, yes, front matter in YAML.
See https://github.com/csvspecs/csv-meta for all formats. Cheers. Prost.
Why not include CSV Dialect Description Format in csvy?
It can be also easily expressed in YAML syntax, isn't it?
Hello,
I am curious if there is a list of requirements for consistency between the header description and the data values?
For example:
On a related note, are there any "best practices" for standard field names in the YAML section? For example, how would someone best describe the units and other standard metadata for the list of variables in the file? If there were such standards, then automatically reading in and using the metadata would be much more straightforward. Otherwise, it seems that all of that potentially useful metadata is not guaranteed to be in any predictable form, and automated readers will be throwing darts to see if it's there.
Also, are there plans for checking variable names and/or units against standards, such as CF-conventions and SI units?
Thanks!
@leeper pointed out: I don't think I would ever encourage creating a single file with multiple tables in it. So, I'd say the new spec looks good except I would limit it to one table per file. As such, I think you could use the spec as currently stated on the webpage AND a simplified version that just contains the dialect element and the fields sub-element of resources.
Reference leeper/csvy/issues/13
WIP in branch bug-19
This issue is related with
pandas-dev/pandas#9613
just realise that the data I got out of labview were already of this sort: 21 first lines were metadata, the rest was data (as csv).
It will probably be good to see if the labview metadata could be compatible
I will just upload an example, pasting it here is not working because of "markdown" translations...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.