Git Product home page Git Product logo

data-tool's Introduction

Build Status

data-tool

A collection of cultural data sets and sources & a website to browse them.

Adding data

To add a page about a data set, you just need to create a text file in the /sources folder.

Please look through the site to make sure your data set is not already included. If you'd like to add more information to an existing page, there is a link to each page's source file on GitHub at the bottom.

To create a new entry on GitHub:

  1. Create an Artisanal Integer* (eg from London Integers) to act as the unique identifier for your entry.

  2. Create a file in the sources folder, using the 'page plus' icon next to the path to the folder

Github might 'fork' your file at this point: don't panic, that's fine: it's like a 'Save As' so we don't overwrite each other's changes.

  1. Name your new file, using the format integer-Name-of-Source.md for the file name.

  2. Fill out the information in your newly-minted text file.

You can use one the existing sources as a guide, or have a look at the blank and template examples in documentation

You can add metadata using the YAML 'frontmatter' format – but don’t worry about this if you don’t know what that means - just follow the structure in the example file!

If you can, describe your source in detail, in a way that would be useful for anyone thinking about building something using that dataset.

If you know of any examples of things made using your data set, add a link in your description.

The description uses the Markdown syntax for text formatting.

The Dataset Size, Licensing and Contact information sections at the bottom of each page are actually pulled in from the YAML frontmatter at the top of the file - you don't need to add them again.

  1. 'Commit' your new file when you're done (it's like saving), using the green button at the foot of the page below the editing area.

Add a bit of description about the changes you're making. Keep the commit summary short and to the point!

  1. If you've 'forked' the repository, saving will issue a Pull Request for someone at Culture Hack to accept your change.

Artisanal Integers are a collection of web services which generate numbers (integers) that are guaranteed to be unique. We use them to make sure that there are no conflicts when merging together forked versions of this codebase.

data-tool's People

Contributors

barrynorton avatar carwash avatar dracos avatar fionaroberto avatar frankieroberto avatar mildlydiverting avatar pidg avatar skraphog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-tool's Issues

Add cache headers

Should add some cache headers so that the pages can be stored in public proxy caches, for even speedier loading.

Suggest expiry of 10 mins?

Build Failing - `parse': (<unknown>): found unexpected document indicator while scanning a quoted scalar at line 19 column 22 (Psych::SyntaxError)

Hello

Travis is helpfully telling me the most recent builds are failing, but not giving me tremendously useful feedback about why.

Build #62 was broken.    31 seconds
Kim Plowright   2140777 Changeset →
Add DigitalNZ, edit other Sources
    Cooper Hewiitt and Open Library with more data. Digital NZ apis added
    https://travis-ci.org/culturehack/data-tool/builds/13781983

    Build #63 is still failing.  33 seconds
Kim Plowright   08a08e5 Changeset →
correct empty yaml value
    Attempting to fix the Travis Error being thrown. Unclear *which* file is making it barf, other than that it's a line 19. This doc has an empty value at line 19 and is in the commit that made it barf. perhaps this is the culprit. (NB build is not failing for me locally)
    https://travis-ci.org/culturehack/data-tool/builds/13800862

Seems to be a parse error in site.rb -

`parse': (): found unexpected document indicator while scanning a quoted scalar at line 19 column 22 (Psych::SyntaxError)

other possibility - naming the test file 9999999999-whatereveritwas is the number causing the problem.

Any ideas? LMK what the solution is so I can fix for myself in future too!

Ability to search datasets by title / description

This would be pretty useful.

Currently unsure on approach.

Could import all the entries into postgres upon launch and use the postgres full text search feature. Has the advantage of built-in features like stemming and spelling correction. Disadvantages: another dependency, makes site more complicated to install, etc.

Alternative could implement some basic in-memory text searching. Wouldn't be too tricky to simply return matching results, but wouldn't be as sophisticated.

Choice of displayed licences

Creative Commons licences as mentioned on /about are not really suitable for data, they are licenses for content. See http://opendatacommons.org/licenses/ for some suitable data licenses, and http://opendatacommons.org/faq/licenses/#Why_Not_Use_a_Creative_Commons_or_FreeOpen_Source_Software_License_for_Databases for the explanation on CC. Or OSM's move from CC to ODbL: http://www.osmfoundation.org/wiki/License/We_Are_Changing_The_License#Why_are_we_changing_the_license.3F

Question - 50 entries v 200 entries

How many data set entries do we want / need?

Question for Rachel, really: would she prefer 50 really rich well described ones, or 200 less well described ones?

Caper strategy document mentions 200...

Left-hand category filter

The filtering is a bit confusing; my instinct is always to click through the top category list, and each time I find it a bit weird that more categories, rather than different categories, are being displayed. Could you change this to toggle through the list please, rather than add each category to the display? And then keep the small/medium/large as a filter on each category.

Ta

Research - single text files data descriptors?

Are there any existing methods of describing data / datasets etc with such a simple YAML format? Can we point to anything?

Related - how do we the integrate with other data sources in future - CKAN interoperability?

Status of copy

Obviously it's possible/likely that the copy describing each bit of data will be a bit of a moveable feast, but looking at the site now, I'm not sure which bits might have been placeholder text put in by Frankie and which are ones put in by you. This one is a case in point: http://data.culturehack.org.uk/dataset/37251027-Pepys-Diary All for a jolly tone of voice, but not sure about the "It's great" and the typo.

Categories: setting them in code, extending them

Currently, our categories are (i think) defined in site.rb lines 6-14

  CATEGORIES = [
    'Art', 
    'Literature',
    'Music', 
    'Performance', 
    'Fashion', 
    'Media', 
    'History'
  ]

They're also then listed out in _prose.yml line 18 on

 - name: "categories"
        field:
          element: "multiselect"
          label: "Categories"
          options:
            - name: "Art"
              value: "Art"

QUESTIONS:

To add a new category, or rename an existing category, is it just a case of editing them in those two places?
Can Categories take spaces? If so, do they need to be surrounded by quotes in site.rb and any of the source.md files?

YAML frontmatter: clarify media

In one file there is a media: data pair in the YAML frontmatter (it also appears as media: text in 37251018-British-Museum-object-catalog.md)

media: doesn't seem to be defined in _prose.yml

Q: What was media:? what were we going to do with it? Did we define a list of options?

ALSO

Am I correct in saying: Yaml is flexible and doesn't mind if you add additional values in there? So we could just make up fields on the fly?

Confirm Introductory Copy

@we-are-caper Rachel - you were looking at the intro copy on the main page. Can you confirm what wording you'd like here?

Currently:

Explore open data about arts and culture, and the creative things people have done with it. Find out more →

Previous version you were checking with Katy:

Culture Hack Data is a simple way to explore open data about arts and culture, and the creative things people do with it. To get started, search or filter our list of data sources using the categories to the left.
Find Out More →

Suggest we can reword that slightly

Culture Hack Data is a simple way to explore open data about arts and culture, and the creative things people do with it. Search or filter our list of XX data sources, or contribute a new entry
Find Out More →

Ability to include 'sample data'

It'd be useful if you could include links to sample data files (eg CSV, JSON) on the dataset pages.

Could be hosted externally, or within the project.

Empty categories

Obviously I don't expect every category to be populated, but filtering "art" by "small" and "medium" returns 0 entries, which might be seen to look at bit bad at launch, as it's the first set of filter. Is it possible to pop something in here for cosmetic purposes pls?!

Schedule meeting - strategy

schedule a meeting with the four of us soon (and James, if you're planning to be in London at any point?) as would be useful to discuss some of these as we consider next steps/develop the strategy.

Feature - text file creation button with Artisinal Integer call

Manually creating a text file is a bit of a PITA

We know that files will follow a standard template

The process would be roughly

  • visit artisinal integer site, get number
  • create text file with this filename
  • copy paste in template layout
  • do all data entry
  • append text file name with slug?

is this possible to script? It would make sense.
It might be possible to create a new file in github via the API...

Update results via AJAX when filtering

This would be nice. Not sure how necessary it really is though, as the site is pretty speedy already (given that there's no database calls involved, and all the data is in memory).

Analytics

I forgot to add any analytics tracking to the site.

Probably best to use the same tracking account as the main Culture Hack site, I’d guess? (that way journeys between the two bits of the site could be tracked).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.