Comments (11)
Aha, didn't even realise it was down! Hope this info is of use and not a hassle, will share with you any code we write that may be of use to you - not huge changes, but fiddly ones that TfL's ad-hoc data file naming systems seems to have caused!
from bikedata.
May be able to submit a patch.
from bikedata.
Should also work with gsub()
.
from bikedata.
Also, seems 2014 data is duplicated:
from bikedata.
Thanks @Robinlovelace, I'll check it out
from bikedata.
Updated cleaned data, with missing ids in grey:
from bikedata.
Ground truth data from TfL daily counts, cc @rogerbeecham.
library(ggplot2)
u = "https://data.london.gov.uk/download/number-bicycle-hires/ac29363e-e0cb-47cc-a97a-e216d900a6b0/tfl-daily-cycle-hires.xls"
download.file(u, "tfl-daily-cycle-hires.xls")
daily_hires_schema = readxl::read_excel("tfl-daily-cycle-hires.xls")
daily_hires = readxl::read_excel("tfl-daily-cycle-hires.xls", sheet = 2)
#> New names:
#> * `Number of Bicycle Hires` -> `Number of Bicycle Hires...2`
#> * `` -> ...3
#> * Month -> Month...4
#> * `Number of Bicycle Hires` -> `Number of Bicycle Hires...5`
#> * `` -> ...6
#> * … and 3 more problems
range(daily_hires$Day)
#> [1] "2010-07-30 UTC" "2019-07-31 UTC"
names(daily_hires)[2] = "Number of hires"
daily_hires$Monthly = RcppRoll::roll_mean(x = daily_hires$`Number of hires`, n = 30, fill = NA)
daily_hires$Yearly = RcppRoll::roll_mean(x = daily_hires$`Number of hires`, n = 365, fill = NA)
ggplot(daily_hires, aes(Day, `Number of hires`)) +
geom_point(alpha = 0.1) +
geom_line(aes(Day, Monthly), lwd = 1) +
geom_line(aes(Day, Yearly), colour = "blue", lwd = 1) +
xlab("Year") +
ylim(c(0, 50000)) +
xlim(as.POSIXlt(c("2010-01-01", "2019-10-01")))
#> Warning: Removed 2 rows containing missing values (geom_point).
#> Warning: Removed 29 rows containing missing values (geom_path).
#> Warning: Removed 364 rows containing missing values (geom_path).
Created on 2019-08-29 by the reprex package (v0.3.0)
from bikedata.
Thanks @Robinlovelace, I'll clean all these up next week. For London, it's usually just minor formatting and file naming stuff, so shouldn't be too hard
from bikedata.
Another illustration of this - should be able to find duplicate files that go into the DB with this:
from bikedata.
The distinct()
function in dplyr can find them, frustratingly the trip ids differ:
> nrow(trips_df_duplicated) / nrow(trips_df) #
[1] 0.824824
>
>
> trips_df_duplicated = distinct(trips_df %>% select(start_time, stop_time, start_station_id, end_station_id))
> nrow(trips_df_duplicated) / nrow(trips_df) #
[1] 0.8273564
from bikedata.
Thnx @Robinlovelace, and sorry this has taken so long to get around to. I'm trying to get the pkg back n CRAN, but having a bit of trouble there. Once it's back up, i'll get straight onto this
from bikedata.
Related Issues (20)
- check UBSAN issues from examples HOT 1
- ditch reshape2 for tidyr HOT 1
- integer overflow
- Vanouver Mobi data
- get back on CRAN
- Issue downloading 2020 data from london HOT 2
- Guadalaraja data breaks with 2021-06 data HOT 3
- potentially inverted lat/lng in sf bay area stations HOT 5
- naive question: is fast c++ csv/zip --> sqlite3 useful enough to be its own package? HOT 2
- put citation in readme
- London dates not parsed correctly in bike_daily_trips() HOT 4
- Fix Guadalajara stations
- CRAN fails
- auto-download niceridemn
- consider updating main readme to match vignette HOT 1
- fix tests HOT 1
- Removed from CRAN HOT 1
- Unable to store NYC data from 2018 and 2019
- Update all tests and examples to reflect new latest_lo_stns param
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bikedata.