Git Product home page Git Product logo

sufex's Introduction

Sufex

Overview

Sufex is an indexing and search system for linguistic purposes. It can also be used as a C++ library of data structures relevant for search. I am currently re-implementing it, and will add parts as they get ready.

How to build

$> mkdir build
$> cd build
$> cmake ../src
$> make

Trigram generation and sorting

sux/trigram.hpp provides utilities for the generation and sorting of trigrams as required by the DC algorithm for suffix array construction.

Trigram generation

Use the sux::TrigramMaker<TGImpl,CharType,PosType>. It takes three template arguments:

  • The trigram implementation (TGImpl::tuple, TGImpl::arraytuple or TGImpl::pointer);
  • The character type (e.g. char)
  • The position type (e.g. unsigned long)

Here is a piece of code that shows how to use this:

#include <sux/trigram.hpp>

using namespace std;
using namespace sux;
string input { "abcabeabxd" };

/* Generating trigrams. */
auto trigrams =
    TrigramMaker<TGImpl::tuple,string::value_type,size_t>::make_23trigrams(begin(input),end(input));

/* Printing them. */
for (const auto &trigram : trigrams)
  cout << triget1(trigram) << triget2(trigram) << triget3(trigram) << '\n';
cout.flush();

The code above makes use of the functions triget1, triget2 and triget3 to access the individual characters belonging to a trigram.

There is a convenience function, sux::string_to_23trigrams, which can be applied to any instance of std::basic_string<CharType>.

Trigram sorting

/* Input. */
string input { "abcabeabxd" };

/* 2,3-Trigrams (convenience function). */
auto trigrams = string_to_23trigrams(input);

/* Trigram sorting using 2 parallel threads. */
sort_23trigrams(trigrams,2);

sufex's People

Contributors

jogojapan avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.