Git Product home page Git Product logo

Comments (3)

MarcusKlik avatar MarcusKlik commented on July 26, 2024

Hi @tafia, first of all, congratulations on filing the first issue in the fstlib repository :-)

Up till now, the core C++ code of R's fst package was part of the R package itself. But now, I've published the library as a separate component to enable implementation in other languages than R.

As you noticed, I have yet to write documentation on the fstlib API and will do so in the coming months. In short, with the fstlib library you can and will be able to:

  • Write in-memory datasets to the file using the fst format
  • Have random access to that fst file, both row- and column wise
  • Use custom type-specific compression on each column in the fst file
  • Very fast multi-threaded compression of memory blocks
  • Very fast multi-threaded hashing of memory blocks
  • Add new datasets to existing fst files (row-binding) future expansion but format is ready
  • Add new columns to existing fst files (column binding) future expansion but format is ready
  • Retrieve data using on-the fly sub-setting (e.g. YEAR == 2016) without any memory overhead future expansion but format is ready
  • On-the-fly ('chunked') operations on data in a fst file, this is like applying map-reduce type algorithms on chunked data. This will be a fully multi-threaded feature. future expansion

The future expansion features will be developed in the coming period using the R package as a technology driver.

IO operations using the fstlib are designed to be as fast as possible, typically topping (due to compression) the maximum speed of a (NVME) SSD drives. At the same time, the library will be very small, so can easily be included in other packages or components.

Having a rust binding would be great!

from fstlib.

tafia avatar tafia commented on July 26, 2024

first of all, congratulations on filing the first issue in the fstlib repository :-)

🥇

As you noticed, I have yet to write documentation on the fstlib API and will do so in the coming months.

You sure have lot of work to do! I certainly don't want to bother you too much. I'll split my input file for the moment in as many chunks as necessary.

For the moment, I am mainly interested in creating fst files (Write in-memory datasets and saving it to the disk). There are examples in tests drive, I guess if I manage to have rust bindings, it should be enough for me.

from fstlib.

MarcusKlik avatar MarcusKlik commented on July 26, 2024

That's great, please let me know if you need anything. The Visual Studio 2017 solution contains 4 projects:

  • Project fstcpp: this is a very basic implementation of a fstlib wrapper in C++ (let's say the C++ variant of the R package.
  • Project fstlib: that's the fstlib library.
  • Project fstlibtest: a Google test project to test basic functionality. Currently I mostly use this to track and debug issues that arise from the R package users. Eventually, this will be the main test repository for fstlib.
  • Project googletests: the Google library for writing unit tests

image

Unfortunately, I have no experience with Rust but if you can make a wrapper for C++ code, then you should have no problems. It would be nice if you could have your work in a GitHub repository, so that we can learn from the process!

from fstlib.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.