Comments (3)
Thanks, it's good to see that many people benefit from the fst
package in their work! Your request is somewhat similar to requests made in issue #12, I believe a first step could be to use R's internal serialization mechanism for serializing 'complex types' but use the LZ4 and ZSTD compressors instead of the default compressors for speed. In that case, you would still have random row access to elements in list-type columns. Later, I could also optimize further by using fst
serialization for list elements of known types inside the list-type columns (recursively), increasing speed further.
Thanks for the request, it's definitely on the list for one of the next versions.
from fst.
FWIW, here is the code I wrote to do this. I have a function called cAssign() which takes a possible list of dataframes and either assigns them to a different environment and/or saves them to disk. (This is self-rolled persistence.) The name is passed in as a string, and if one of the data frames is large enough, it is split into a separate file using fst.
cAssign<-function(x,dbg=TRUE,silent=FALSE,copysilent=FALSE,trace=FALSE,dpath=datapath,nbig=10000,title="",usefst=TRUE) {
ppp=lapply(x,function(y){
fname=paste0(dpath,paste0(y,".RD"))
if(usefst) {
cadtmp=get(y,pos=parent.frame(n=3))
if("list" %in% class(cadtmp)) {
listonames = c(names(cadtmp),paste0("A",1:length(cadtmp)))[1:length(cadtmp)]
for(i in 1:length(listonames)) {
if("data.frame" %in% class(cadtmp[[i]]) && nrow(cadtmp[[i]])>=nbig) {
message(" Splitting ",listonames[[i]], " from ",y)
newfilename=paste0(y,"_",listonames[[i]],".fst")
write.fst(cadtmp[[i]], paste0(dpath,newfilename),compress=20)
cadtmp[[i]]=newfilename
}
}
}
e1<-new.env()
assign(y,cadtmp,env=e1)
save(list=c(y),envir=e1,file=fname) }
else {
save(list=c(y),file=fname) }
if(!copysilent) { message("GlobalAssign and Saving ", y, " to disk as ",y,".RD (filesize:",file.size(fname),")") }
})
}
from fst.
Hi @derekholmes, thanks for sharing! So basically you need to store a list with several components, the largest of which are data.table
's. You would like fst
to be able to store a list and if a list element is a data.table
(or vector), still have random access to that structure?
Supporting list
s would certainly be possible. For storing a table with random access inside that list
the fst
format would need to support nested structures. That would be a very interesting and useful feature I think. The current format could be maintained as is, but when you need a list
, you can use a single column data.table
containing 1 column of the list
type. The same holds for vectors.
The speed of a nested list
structure would probably be lower due to additional file-pointer jumps, but when the data.table
elements are comparatively large, the effect would be small.
Thanks for your feature request, when the list
type is implemented, I'll make sure that the format is prepared for recursive structures as well!
from fst.
Related Issues (20)
- wrong forum
- Problem with windows file names encoding
- Progress bar when read/write HOT 1
- fst 0.9.4 package load fails with Rcpp 1.0.6 in R 4.1.0 (but not in R 4.0.5 or with Rcpp 1.0.7) HOT 1
- OpenMP not detected Mac 12 (Monterey) M1 (ARM) Mac HOT 17
- How to extract contents from a fst file when R crashes reading it HOT 2
- mac os, apple M1 installation guide should be updated to include the paths of homebrew installed libomp when using xcode-select c++ compiler HOT 1
- Convert `sql` query from BigQuery to `fst` format HOT 1
- Integer64 still remains numeric upon opening with read_fst HOT 9
- Binaries through r-universe HOT 1
- Chunkwise support for `read.fst`? HOT 3
- R crashes while reading an fst file HOT 15
- attributes are not saved HOT 1
- Unable to save embedded lists
- Can `read_fst` use a filter condition beforehand? HOT 1
- Big-endian seems to work: maybe remove misleading requirement on CRAN? HOT 3
- Why is the first read slower? HOT 2
- Compression rate to minimize reading time? HOT 2
- relatively new install issue HOT 7
- write_fst Seems To Skip Small Tables When Writing In A for Loop HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fst.