Git Product home page Git Product logo

prob-phoc's Introduction

prob-phoc

Build Status

PyTorch functions to compute meaningful probabilistic relevance scores from PHOC (Pyramid of Histograms of Characters) embeddings. Although they are called Pyramid of Histograms of Characters, in practice they are a Pyramid of Bag of Characters. At the end, each word is represented by a high-dimensional binary vector.

See the wiki for additional details.

Usage

The library provides two functions: cphoc and pphoc, which are similar to SciPy's cdist and pdist:

Both functions can operate with PHOC embeddings in the probability space (where each dimension is a real number in the range [0, 1]), or in the log-probability space (where each dimension is the logarithm of a probability). These are also sometimes refered to as the Real and Log semirings.

import torch
from prob_phoc import cphoc, pphoc

x = torch.Tensor(...)
y = torch.Tensor(...)

# Compute the log-relevance scores between all pairs of rows in x, y.
# Note: x and y must have the PHOC log-probabilities.
logprob = cphoc(x, y)

# This is equivalent to:
logprob = cphoc(x, y, method="sum_prod_log")

# If your matrices have probabilities instead of log-probabilities, use:
prob = cphoc(x, y, method="sum_prob_real")

# Compute the log-relevance scores between all pairs of distinct rows in x.
# Note: The output is a vector with N * (N - 1) / 2 elements.
logprob = pphoc(x)

Installation

The easiest way is to install the package from PyPI:

pip install prob-phoc

If you want to install the latest version from the repository, clone it and use the setup.py script to compile and install the library.

python setup.py install

You will need a C++11 compiler (tested with GCC 4.9). If you want to compile with CUDA support, you will also need to install the CUDA Toolkit (tested with versions 8.0, 9.0 and 10.0)

Tests and benchmarks

After the installation, you can run the tests to ensure that everything is working fine.

python -m prob_phoc.test

I have also some benchmarks to compare CPU vs. CUDA, for different matrix sizes and float precision. These take quite a long to run, so don't hold your breath.

python -m prob_phoc.benchmark

prob-phoc's People

Contributors

jpuigcerver avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.