Git Product home page Git Product logo

Comments (10)

rpruim avatar rpruim commented on August 27, 2024

I guess I won't send that email I was writing...

from nycflights13.

hadley avatar hadley commented on August 27, 2024

I think I made it as large as I could without running into CRAN problems :/

from nycflights13.

rpruim avatar rpruim commented on August 27, 2024

What about the idea of exceeding github's single file size limit by have the package combine data from multiple files at install time as a way to deliver data one order of magnitude larger than github would otherwise allow? Is there any reason that this is a bad idea?

from nycflights13.

hadley avatar hadley commented on August 27, 2024

It's not the github file size that's the main problem, it's the CRAN file size. It's possible to work around it by downloading datasets after the package is installed, but that would require a decent amount of thinking to make sure it worked in a wide range of situations.

from nycflights13.

beanumber avatar beanumber commented on August 27, 2024

The airlines package will do exactly that as soon as I get around to finishing it. But please feel free to take it out for a spin now!

More generally, I'm working on an etl package that will provide a general framework for having small R packages provide interfaces to medium-sized data sets. Your comments are welcome.

from nycflights13.

nicholasjhorton avatar nicholasjhorton commented on August 27, 2024

Ben,

I suspect that we need a version of the airlines package that doesn't require use of a database backend (for smaller medium-sized datasets).

Nick

On Oct 30, 2015, at 9:48 AM, Ben Baumer [email protected] wrote:

The airlines package will do exactly that as soon as I get around to finishing it. But please feel free to take it out for a spin now!

More generally, I'm working on an etl package that will provide a general framework for having small R packages provide interfaces to medium-sized data sets. Your comments are welcome.


Reply to this email directly or view it on GitHub.

Nicholas Horton
Professor of Statistics
Department of Mathematics and Statistics, Amherst College
Box 2239, 31 Quadrangle Dr
Amherst, MA 01002-5000
https://www.amherst.edu/people/facstaff/nhorton

from nycflights13.

beanumber avatar beanumber commented on August 27, 2024

Sure, we could modify things so that you don't have to use the database backend. But then where are you going to store the data? In .rda files? In CSVs? I'd worry that this will lead to a "not the right tool for the job" situation, and that if you really want to work with data of this size (dozens of GBs) and structure (relational), that maybe it's not reasonable to think that you are going to get away with this without using a RDBMS.

from nycflights13.

nicholasjhorton avatar nicholasjhorton commented on August 27, 2024

I'm thinking a different use case with data on the order of tens of megabytes (current CRAN limit?) to a handful of GB.

For tens of GB or larger your approach with etl and airlines is the best bet.

All the best,

Nick

On Oct 30, 2015, at 10:15 AM, Ben Baumer [email protected] wrote:

Sure, we could modify things so that you don't have to use the database backend. But then where are you going to store the data? In .rda files? In CSVs? I'd worry that this will lead to a "not the right tool for the job" situation, and that if you really want to work with data of this size (dozens of GBs) and structure (relational), that maybe it's not reasonable to think that you are going to get away with this without using a RDBMS.


Reply to this email directly or view it on GitHub.

from nycflights13.

beanumber avatar beanumber commented on August 27, 2024

I believe the current CRAN limit is 5 MB.

from nycflights13.

jennybc avatar jennybc commented on August 27, 2024

This solution the @richfitz is working on might qualify as "a decent amount of thinking to make sure it worked in a wide range of situations":

https://github.com/richfitz/datastorr

from nycflights13.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.