Comments (8)
That is one advantage of a spatialite database, that when the spatialite extension is not available it reverts back to a standard sqlite database with the only issue being you can't run spatial queries on the database. There shouldn't be any additional performance problems from having a spatialite database although the database would be slightly larger from having the additional database tables.
I'll make some changes to just use a standard COUNT(*) and GROUP BY.
from bikedata.
Q1: yeah, it's coz i got quite frustrated trying to achieve the COUNT
and GROUP BY
approach in pure dplyr
, and ended up just being happy that i had any kind of (non-dplyr
) solution. I make no claim for optimality, so please feel absolutely free to improve!
Q2: Not exactly sure what you mean there - do you mean parameterise everything in the way the dates and times already are? If so, then yeah, I'd agree with that.
from bikedata.
I was thinking of just constructing the query with COUNT(*) and GROUP BY directly so you don't need to rely on dplyr to do the summarise and group by.
In terms of parameterised queries, I was thinking more of using something like dbBind to bind values to a particular query rather than building the query directly. As I said, perhaps it isn't really needed in this case but protecting against SQL injection is fairly common practice and also has the added benefit of protecting against some likely errors.
from bikedata.
yeah, that's a good point. I hadn't even begun to consider the potential consequences of failing to adequately protect SQL queries - now I understand what you mean. And I guess using DBI
would also be good in terms of standardisation. One thing I had in mind was the potential non-necessity of actually using a spatialite
DB over a non-spatial sqlite3
- NYC is a rarity in having coordinates along with the trip data, but most systems do not have that. I had in mind the notion of comparing the two in terms both of speed and DB size. Either way, DBI::dbBind
-type interfaces would aid standardisation no matter what storage we ultimately implement.
In keeping with my professed ignorance of many matters database related, would you like to have a first cut at a DBI
implementation using COUNT
and GROUP BY
, and we can see where that gets us?
from bikedata.
From the list in #2, I just checked both Boston and Chicago and both have the coordinates of the stations. For Boston there is a link to a separate file containing the stations, for Chicago there is a json file that you admittedly have to dig for.
I haven't looked at the others yet but I would be surprised if they weren't available anywhere.
from bikedata.
all systems provide station coordinates, i just meant that not all include them explicitly with the actual ride data (foremost London, which also has to be incorporated, and requires an OAuth key). DC is another strange case, where some stations have been moved a little, so coordinates change while the stations stay notionally the same (plus a couple of problematic cases where coordinates remain almost identical while stations names and purposes actually change - they use experimental stations for their own testing purposes, the clever folk). All fiddly parts of pre-processing that i'll deal with soon.
I suspect a productive division of labour at present would be for me to concentrate on getting the data in, while you concentrate more on getting the data out. I'll aim to have all data in (start_id, start_name, start_time, end_id, end_name, end_time, ...), where (...) is additional user data provided by systems like NYC, but not London - so pretty much identical to the current NYC tables. I should have time to get all other systems programmed in this week.
from bikedata.
I see what you mean now, yes changing coordinates is an issue.
Sounds like a reasonable division of labour to me. I'm working on changes to tripmat() now.
from bikedata.
I have switched to use RSQLite/DBI in #7.
from bikedata.
Related Issues (20)
- check UBSAN issues from examples HOT 1
- ditch reshape2 for tidyr HOT 1
- integer overflow
- Vanouver Mobi data
- get back on CRAN
- Issue downloading 2020 data from london HOT 2
- Guadalaraja data breaks with 2021-06 data HOT 3
- potentially inverted lat/lng in sf bay area stations HOT 5
- naive question: is fast c++ csv/zip --> sqlite3 useful enough to be its own package? HOT 2
- put citation in readme
- London dates not parsed correctly in bike_daily_trips() HOT 4
- Fix Guadalajara stations
- CRAN fails
- auto-download niceridemn
- bikedata misses numerous csv files from London HOT 11
- consider updating main readme to match vignette HOT 1
- fix tests HOT 1
- Removed from CRAN HOT 1
- Unable to store NYC data from 2018 and 2019
- Update all tests and examples to reflect new latest_lo_stns param
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bikedata.