Git Product home page Git Product logo

oystr's Introduction

oystr

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. Travis build status Coverage status CodeFactor CRAN status Blog post

Purpose

Handle TfL Oyster journey history data. Under development.

You can use an Oyster card to pay for public transit on Transport for London (TfL) services. You can opt-in to monthly emails with your journey history attached as a CSV. Functions in this package help to read, handle and summarise these data.

I, and this package, are not associated officially with TfL.

Install

The package is under development with no guarantees whatsoever.

Install with remotes::install_github("matt-dray/oystr").

Functions

Functions under development:

  • oy_read() reads and checks multiple raw journey history files from a folder
  • oy_clean() cleans journey history data and engineers new variables
  • oy_lineplot() to plot features over time (restricted to train journeys for now)
  • oy_summary() for summarising main statistics (restricted to train and bus journeys only)
  • oy_cols() contains the TfL colour palette

There's also anonymised journey history data:

  • journeys_read is an example of anonymised data read with oy_read()
  • journeys_clean is the result of using oy_clean() on the journeys_read data

Dependencies

Developing this package is an exercise in working with minimal dependencies (hopefully zero) and working with good ol' base R functions.

Limitations

The format of journey history data from TfL have remained pretty consistent for a number of years and there's no reason to believe that this will change anytime soon. It could though. In which case, these functions may fail.

Also, I asked TfL for details of all the possible forms of their column 'Journey/Action', which includes things like station start and end, bus route and much more. They were unable to provide this information. Therefore, the oy_clean() function can only parse formats that I'm personally aware of (train and bus, mostly) given my own Oyster history data. For example, I know the exact string for bus journeys is "Bus journey, route ". I don't know what format this string takes if you travel on a boat, for example.

I would be extremely pleased if someone could share this information.

Contributing

Please note that the 'oystr' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

oystr's People

Contributors

matt-dray avatar

Stargazers

 avatar  avatar

Watchers

 avatar

oystr's Issues

oy_clean() steps

These should be dates/times:

  • Date
  • Start.Time
  • End.Time

Journey.Action should be split into multiple columns, like:

  • mode (train, bus, etc)
  • from station
  • to station

Charge is already numeric and zero. Credit will be money, so it should be numeric and NA should be 0? Maybe not; is there a conceptual difference between no charge for a journey (e.g. because of a season ticket) and credit, which is like one-off repayments? balance is numeric and okay.

Need some Note examples to see how they could be split to provide new columns of data.

Improve oy_cols() input handling

  • Accept mixed input (indices/names/NA)
  • 'Return what you can', so so oy_cols(1, 2, NA, 100) should return the first and second colours and provide warnings
  • Handle passing non-vectors (dataframes, matrices, arrays, lists)
  • Handle negative indices ()

Stretch

  • Ask 'did you mean?' in cases like oy_cols("victoria") instead of oy_cols("line_victoria").

Contactless Cards

Hi,

Great library!

I might be wrong but as far as I can see the data format provided for contactless card journey history is different to that for Oyster cards.

It would be really cool if the library could support contactless cards, albeit with less data to play with.

The format I have for contactless card journey files is:

Date,Time,Journey,Charge (GBP),Capped,Notes
03/08/2019,13:46 - 13:59,Gospel Oak to Harringay Green Lanes,-1.50,N,

Thanks!

What should oy_plot() do?

Produce line charts of journey duration (available for trains only?) and amount spent between provided dates?

oy_plot(data, start, end, weekdays_only, type)

  • data — data.frame of Oyster data passed through oy_read() and oy_clean()
  • start and end — select a time frame for the x axis
  • weekdays_only — to isolate weekdays (like, commute days)
  • type — "duration" (journey time) or "spend" (i.e. balance remaingin)

Improve data sets for demonstration purposes

  • journeys_read.rda should contain data from more than one month; after all, it's meant to demonstrate what happens when files from two different months are read and combined
  • separate CSVs of these two months should be added to data/ so that they can be fetched by the vignette when demonstrating the effect of oy_read() (i don't know whether this approach is legit)
  • a non-Oyster CSV should also be added to demonstrate the warning you get from oy_read() when one one file is not in the right format

oy_cols() input should be c()

So you supply oy_cols(c("line_victoria", "line_circle")) rather than oy_cols("line_victoria", "line_circle").

This behaviour is more expected by users?

Log: known string formats in Oyster data

Examples of unique string formats from Oyster data sent by email from TfL (example CSV). I assume that more are possible. What are they?

Journey/Action column

Train travel

  • X to Y
  • X to Y [National Rail]
  • X to Y [London Underground]
  • X to Y [London Underground / National Rail]
  • X to Y (A, B lines)

Start and end the same

  • Entered and exited X

Failed tap

  • X to [No touch-out]
  • [No touch-in] to X

Other transport modes

  • Bus journey, route 999

Fares

  • Season ticket added on touch in, X
  • Topped up,
  • Automated Refund, X

Note column

  • NA
  • We are not able to show where you touched out during this journey
  • We are not able to show where you touched in during this journey
  • You have been charged for travelling in zones not covered by your Travelcard.

`oy_read()` should check that CSVs are journey history files

Currently the function reads the CSVs and binds them all.

do.call("rbind", lapply(oy_files, function(x) read.csv())

Probably want to read the files, then check content (perhaps dimensions, headers), then bind the ones that are the correct format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.