Git Product home page Git Product logo

fcsd's Introduction

Front-coded string dictionary: Fast and compact indexed string set

Documentation Crates.io License: MIT

This is a Rust library to store an indexed set of strings and support fast queires. The data structure is a plain front-coded string dictionary described in Martínez-Prieto et al., Practical compressed string dictionaries, INFOSYS 2016.

Japanese description

Features

  • Indexed set. Fcsd implements an indexed set of strings in a compressed format. n strings in the set are indexed with integers from [0..n-1] and assigned in the lexicographical order.
  • Simple and fast compression/decompression. Fcsd maintains a set of strings in a compressed space through front coding, a differential compression technique for strings, allowing for fast decompression operations.
  • Random access. Fcsd maintains strings through a bucketization technique enabling to directly decompress arbitrary strings and perform binary search for strings.

Example

use fcsd::Set;

// Input string keys should be sorted and unique.
let keys = ["ICDM", "ICML", "SIGIR", "SIGKDD", "SIGMOD"];

// Builds an indexed set.
let set = Set::new(keys).unwrap();
assert_eq!(set.len(), keys.len());

// Gets indexes associated with given keys.
let mut locator = set.locator();
assert_eq!(locator.run(b"ICML"), Some(1));
assert_eq!(locator.run(b"SIGMOD"), Some(4));
assert_eq!(locator.run(b"SIGSPATIAL"), None);

// Decodes string keys from given indexes.
let mut decoder = set.decoder();
assert_eq!(decoder.run(0), b"ICDM".to_vec());
assert_eq!(decoder.run(3), b"SIGKDD".to_vec());

// Enumerates indexes and keys stored in the set.
let mut iter = set.iter();
assert_eq!(iter.next(), Some((0, b"ICDM".to_vec())));
assert_eq!(iter.next(), Some((1, b"ICML".to_vec())));
assert_eq!(iter.next(), Some((2, b"SIGIR".to_vec())));
assert_eq!(iter.next(), Some((3, b"SIGKDD".to_vec())));
assert_eq!(iter.next(), Some((4, b"SIGMOD".to_vec())));
assert_eq!(iter.next(), None);

// Enumerates indexes and keys starting with a prefix.
let mut iter = set.predictive_iter(b"SIG");
assert_eq!(iter.next(), Some((2, b"SIGIR".to_vec())));
assert_eq!(iter.next(), Some((3, b"SIGKDD".to_vec())));
assert_eq!(iter.next(), Some((4, b"SIGMOD".to_vec())));
assert_eq!(iter.next(), None);

// Serialization / Deserialization
let mut data = Vec::<u8>::new();
set.serialize_into(&mut data).unwrap();
assert_eq!(data.len(), set.size_in_bytes());
let other = Set::deserialize_from(&data[..]).unwrap();
assert_eq!(data.len(), other.size_in_bytes());

Todo

  • Add benchmarking codes.
  • Add RePair compressed veriants.

Licensing

This library is free software provided under MIT.

fcsd's People

Contributors

kampersanda avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.