culturehack / data-tool Goto Github PK

View Code? Open in Web Editor NEW

21.0 7.0 15.0 245 KB

A collection of cultural data sets and sources & a website to browse them.

License: MIT License

Ruby 17.23% CSS 22.10% HTML 60.67%

data-tool's Introduction

data-tool

A collection of cultural data sets and sources & a website to browse them.

Adding data

To add a page about a data set, you just need to create a text file in the /sources folder.

Please look through the site to make sure your data set is not already included. If you'd like to add more information to an existing page, there is a link to each page's source file on GitHub at the bottom.

To create a new entry on GitHub:

Create an Artisanal Integer* (eg from London Integers) to act as the unique identifier for your entry.
Create a file in the sources folder, using the 'page plus' icon next to the path to the folder

Github might 'fork' your file at this point: don't panic, that's fine: it's like a 'Save As' so we don't overwrite each other's changes.

Name your new file, using the format integer-Name-of-Source.md for the file name.
Fill out the information in your newly-minted text file.

You can use one the existing sources as a guide, or have a look at the blank and template examples in documentation

You can add metadata using the YAML 'frontmatter' format – but don’t worry about this if you don’t know what that means - just follow the structure in the example file!

If you can, describe your source in detail, in a way that would be useful for anyone thinking about building something using that dataset.

If you know of any examples of things made using your data set, add a link in your description.

The description uses the Markdown syntax for text formatting.

The Dataset Size, Licensing and Contact information sections at the bottom of each page are actually pulled in from the YAML frontmatter at the top of the file - you don't need to add them again.

'Commit' your new file when you're done (it's like saving), using the green button at the foot of the page below the editing area.

Add a bit of description about the changes you're making. Keep the commit summary short and to the point!

If you've 'forked' the repository, saving will issue a Pull Request for someone at Culture Hack to accept your change.

Artisanal Integers are a collection of web services which generate numbers (integers) that are guaranteed to be unique. We use them to make sure that there are no conflicts when merging together forked versions of this codebase.

data-tool's People

Contributors

Stargazers

Watchers

Forkers

fionaroberto dracos emchateau george08 r4isstatic cooperhewitt micahwalter imclab skraphog barrynorton mpetyx jan-martinek carwash mashtheweb

data-tool's Issues

Add cache headers

Should add some cache headers so that the pages can be stored in public proxy caches, for even speedier loading.

Suggest expiry of 10 mins?

Build Failing - `parse': (<unknown>): found unexpected document indicator while scanning a quoted scalar at line 19 column 22 (Psych::SyntaxError)

Hello

Travis is helpfully telling me the most recent builds are failing, but not giving me tremendously useful feedback about why.

Build #62 was broken.    31 seconds
Kim Plowright   2140777 Changeset →
Add DigitalNZ, edit other Sources
    Cooper Hewiitt and Open Library with more data. Digital NZ apis added
    https://travis-ci.org/culturehack/data-tool/builds/13781983

    Build #63 is still failing.  33 seconds
Kim Plowright   08a08e5 Changeset →
correct empty yaml value
    Attempting to fix the Travis Error being thrown. Unclear *which* file is making it barf, other than that it's a line 19. This doc has an empty value at line 19 and is in the commit that made it barf. perhaps this is the culprit. (NB build is not failing for me locally)
    https://travis-ci.org/culturehack/data-tool/builds/13800862

Seems to be a parse error in site.rb -

`parse': (): found unexpected document indicator while scanning a quoted scalar at line 19 column 22 (Psych::SyntaxError)

other possibility - naming the test file 9999999999-whatereveritwas is the number causing the problem.

Any ideas? LMK what the solution is so I can fix for myself in future too!

Ability to search datasets by title / description

This would be pretty useful.

Currently unsure on approach.

Could import all the entries into postgres upon launch and use the postgres full text search feature. Has the advantage of built-in features like stemming and spelling correction. Disadvantages: another dependency, makes site more complicated to install, etc.

Alternative could implement some basic in-memory text searching. Wouldn't be too tricky to simply return matching results, but wouldn't be as sophisticated.

Add first published / update frequency information

Would be useful to have this on the dataset pages...

Choice of displayed licences

Creative Commons licences as mentioned on /about are not really suitable for data, they are licenses for content. See http://opendatacommons.org/licenses/ for some suitable data licenses, and http://opendatacommons.org/faq/licenses/#Why_Not_Use_a_Creative_Commons_or_FreeOpen_Source_Software_License_for_Databases for the explanation on CC. Or OSM's move from CC to ODbL: http://www.osmfoundation.org/wiki/License/We_Are_Changing_The_License#Why_are_we_changing_the_license.3F

Caper: check UX sketches and feed back

In the UX folder of Dropbox - UX and IA ideas.pdf

Question - Post Doc Researcher?

we're looking for a post-doc researcher to work with

Caper: check and sign SOW, provide terms/PO

Question - 50 entries v 200 entries

How many data set entries do we want / need?

Question for Rachel, really: would she prefer 50 really rich well described ones, or 200 less well described ones?

Caper strategy document mentions 200...

Left-hand category filter

The filtering is a bit confusing; my instinct is always to click through the top category list, and each time I find it a bit weird that more categories, rather than different categories, are being displayed. Could you change this to toggle through the list please, rather than add each category to the display? And then keep the small/medium/large as a filter on each category.

Research - single text files data descriptors?

Are there any existing methods of describing data / datasets etc with such a simple YAML format? Can we point to anything?

Related - how do we the integrate with other data sources in future - CKAN interoperability?

Status of copy

Obviously it's possible/likely that the copy describing each bit of data will be a bit of a moveable feast, but looking at the site now, I'm not sure which bits might have been placeholder text put in by Frankie and which are ones put in by you. This one is a case in point: http://data.culturehack.org.uk/dataset/37251027-Pepys-Diary All for a jolly tone of voice, but not sure about the "It's great" and the typo.

Redirect culturehack.org.uk/data to data.culturehack.org.uk

Currently http://www.culturehack.org.uk/data resolves to http://culturehack.org.uk/2012/06/16/data-sets-by-type/

it should probably be changed to resolve to
http://data.culturehack.org.uk

Assigning to @jamesjefferies for whenever he has a moment, is low priority

Rogue entry

http://data.culturehack.org.uk/dataset/99999999-Template-Entry

Figure out how to set file name for new source in prose.io

Give more context to Licensing labels

Is it possible to make the licensing info a click through to something? I'm not sure what PD means, for instance, on this page http://data.culturehack.org.uk/dataset/37251027-Pepys-Diary and there's no way of finding out.

Categories: setting them in code, extending them

Currently, our categories are (i think) defined in site.rb lines 6-14

  CATEGORIES = [
    'Art', 
    'Literature',
    'Music', 
    'Performance', 
    'Fashion', 
    'Media', 
    'History'
  ]

They're also then listed out in _prose.yml line 18 on

 - name: "categories"
        field:
          element: "multiselect"
          label: "Categories"
          options:
            - name: "Art"
              value: "Art"

QUESTIONS:

To add a new category, or rename an existing category, is it just a case of editing them in those two places?
Can Categories take spaces? If so, do they need to be surrounded by quotes in site.rb and any of the source.md files?

Ability to filter by licence

Would be easy to add. Not sure how much of a priority this is though.

FR iterating the UX sketches based on the documents sent on friday

Initial work done, needs proper data adding.

YAML frontmatter: clarify media

In one file there is a media: data pair in the YAML frontmatter (it also appears as media: text in 37251018-British-Museum-object-catalog.md)

media: doesn't seem to be defined in _prose.yml

Q: What was media:? what were we going to do with it? Did we define a list of options?

ALSO

Am I correct in saying: Yaml is flexible and doesn't mind if you add additional values in there? So we could just make up fields on the fly?

Confirm Introductory Copy

@we-are-caper Rachel - you were looking at the intro copy on the main page. Can you confirm what wording you'd like here?

Currently:

Explore open data about arts and culture, and the creative things people have done with it. Find out more →

Previous version you were checking with Katy:

Culture Hack Data is a simple way to explore open data about arts and culture, and the creative things people do with it. To get started, search or filter our list of data sources using the categories to the left.
Find Out More →

Suggest we can reword that slightly

Culture Hack Data is a simple way to explore open data about arts and culture, and the creative things people do with it. Search or filter our list of XX data sources, or contribute a new entry
Find Out More →

Ability to include 'sample data'

It'd be useful if you could include links to sample data files (eg CSV, JSON) on the dataset pages.

Could be hosted externally, or within the project.

Visit Site opening in new window?

Would it be a nicer UX thing if the "visit site" opened in a new window, or would that just be annoying in a different way?

Empty categories

Obviously I don't expect every category to be populated, but filtering "art" by "small" and "medium" returns 0 entries, which might be seen to look at bit bad at launch, as it's the first set of filter. Is it possible to pop something in here for cosmetic purposes pls?!

rewrite instructions and templates to be clearer about which bits are automatically pulled in etc

See the SOCH merge notes.

Smithsonian 3D showcase

https://twitter.com/GuWa/status/400647398302547969

http://3d.si.edu/

Not really the right place to put this, but it's pretty cool

Create initial proof-of-concept views

Need to write some code that takes the data sources and creates webpages for each of them.

Website produced by Kim and Frankie

Can you take this out of the footer and add it to the About page pls?

Schedule meeting - strategy

schedule a meeting with the four of us soon (and James, if you're planning to be in London at any point?) as would be useful to discuss some of these as we consider next steps/develop the strategy.

Feature - text file creation button with Artisinal Integer call

Manually creating a text file is a bit of a PITA

We know that files will follow a standard template

The process would be roughly

visit artisinal integer site, get number
create text file with this filename
copy paste in template layout
do all data entry
append text file name with slug?

is this possible to script? It would make sense.
It might be possible to create a new file in github via the API...

requires RC to be signed in to the _connect site
when opened as a file remotely, just shows a load of JS errors... no content.

Need to copy-paste the actual page contents!

QUESTION: How to integrate editorial with data tool pages?

How do we cross reference between data tool and editorial on main CH site?

In order to pull WP posts through / evidence hacks, how do we best combine this with the data tool entry points?