Comments (3)
Hi @tafia, first of all, congratulations on filing the first issue in the fstlib
repository :-)
Up till now, the core C++ code of R
's fst
package was part of the R
package itself. But now, I've published the library as a separate component to enable implementation in other languages than R
.
As you noticed, I have yet to write documentation on the fstlib
API and will do so in the coming months. In short, with the fstlib
library you can and will be able to:
- Write in-memory datasets to the file using the
fst
format - Have random access to that
fst
file, both row- and column wise - Use custom type-specific compression on each column in the
fst
file - Very fast multi-threaded compression of memory blocks
- Very fast multi-threaded hashing of memory blocks
- Add new datasets to existing
fst
files (row-binding) future expansion but format is ready - Add new columns to existing
fst
files (column binding) future expansion but format is ready - Retrieve data using on-the fly sub-setting (e.g. YEAR == 2016) without any memory overhead future expansion but format is ready
- On-the-fly ('chunked') operations on data in a
fst
file, this is like applying map-reduce type algorithms on chunked data. This will be a fully multi-threaded feature. future expansion
The future expansion features will be developed in the coming period using the R
package as a technology driver.
IO operations using the fstlib
are designed to be as fast as possible, typically topping (due to compression) the maximum speed of a (NVME) SSD drives. At the same time, the library will be very small, so can easily be included in other packages or components.
Having a rust
binding would be great!
from fstlib.
first of all, congratulations on filing the first issue in the fstlib repository :-)
🥇
As you noticed, I have yet to write documentation on the fstlib API and will do so in the coming months.
You sure have lot of work to do! I certainly don't want to bother you too much. I'll split my input file for the moment in as many chunks as necessary.
For the moment, I am mainly interested in creating fst files (Write in-memory datasets and saving it to the disk). There are examples in tests drive, I guess if I manage to have rust bindings, it should be enough for me.
from fstlib.
That's great, please let me know if you need anything. The Visual Studio 2017 solution contains 4 projects:
- Project
fstcpp
: this is a very basic implementation of afstlib
wrapper in C++ (let's say the C++ variant of theR
package. - Project
fstlib
: that's thefstlib
library. - Project
fstlibtest
: a Google test project to test basic functionality. Currently I mostly use this to track and debug issues that arise from theR
package users. Eventually, this will be the main test repository forfstlib
. - Project
googletests
: the Google library for writing unit tests
Unfortunately, I have no experience with Rust but if you can make a wrapper for C++ code, then you should have no problems. It would be nice if you could have your work in a GitHub repository, so that we can learn from the process!
from fstlib.
Related Issues (10)
- Linux support HOT 12
- Looking forward to full description of fst format HOT 11
- Can the small factors levels limits be increased from <128? HOT 1
- Documentation on setting up a `C++` project using the `fstlib` library
- Scan multi-threaded code for false sharing
- Project readme file
- Fix coveralls
- Format specifications and C++ API docs HOT 1
- Zero-row tables are correctly serialized
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fstlib.