Git Product home page Git Product logo

Comments (3)

MarcusKlik avatar MarcusKlik commented on June 19, 2024

Thanks, it's good to see that many people benefit from the fst package in their work! Your request is somewhat similar to requests made in issue #12, I believe a first step could be to use R's internal serialization mechanism for serializing 'complex types' but use the LZ4 and ZSTD compressors instead of the default compressors for speed. In that case, you would still have random row access to elements in list-type columns. Later, I could also optimize further by using fst serialization for list elements of known types inside the list-type columns (recursively), increasing speed further.
Thanks for the request, it's definitely on the list for one of the next versions.

from fst.

derekholmes avatar derekholmes commented on June 19, 2024

FWIW, here is the code I wrote to do this. I have a function called cAssign() which takes a possible list of dataframes and either assigns them to a different environment and/or saves them to disk. (This is self-rolled persistence.) The name is passed in as a string, and if one of the data frames is large enough, it is split into a separate file using fst.

cAssign<-function(x,dbg=TRUE,silent=FALSE,copysilent=FALSE,trace=FALSE,dpath=datapath,nbig=10000,title="",usefst=TRUE) {
   ppp=lapply(x,function(y){
      fname=paste0(dpath,paste0(y,".RD"))
      if(usefst) {
        cadtmp=get(y,pos=parent.frame(n=3))
        if("list" %in% class(cadtmp)) {
          listonames = c(names(cadtmp),paste0("A",1:length(cadtmp)))[1:length(cadtmp)]
          for(i in 1:length(listonames)) {
            if("data.frame" %in% class(cadtmp[[i]]) && nrow(cadtmp[[i]])>=nbig) {
               message(" Splitting ",listonames[[i]], " from ",y)
               newfilename=paste0(y,"_",listonames[[i]],".fst")
               write.fst(cadtmp[[i]], paste0(dpath,newfilename),compress=20)
               cadtmp[[i]]=newfilename
            }
           }
        }
        e1<-new.env()
        assign(y,cadtmp,env=e1)
        save(list=c(y),envir=e1,file=fname) }
      else {
        save(list=c(y),file=fname)  }
      if(!copysilent) { message("GlobalAssign and Saving ", y, " to disk as ",y,".RD (filesize:",file.size(fname),")") }
      })
   }

from fst.

MarcusKlik avatar MarcusKlik commented on June 19, 2024

Hi @derekholmes, thanks for sharing! So basically you need to store a list with several components, the largest of which are data.table's. You would like fst to be able to store a list and if a list element is a data.table (or vector), still have random access to that structure?

Supporting lists would certainly be possible. For storing a table with random access inside that list the fst format would need to support nested structures. That would be a very interesting and useful feature I think. The current format could be maintained as is, but when you need a list, you can use a single column data.table containing 1 column of the list type. The same holds for vectors.

The speed of a nested list structure would probably be lower due to additional file-pointer jumps, but when the data.table elements are comparatively large, the effect would be small.

Thanks for your feature request, when the list type is implemented, I'll make sure that the format is prepared for recursive structures as well!

from fst.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.