plasticityai / magnitude Goto Github PK

View Code? Open in Web Editor NEW

1.6K 37.0 118.0 72.39 MB

A fast, efficient universal vector embedding utility package.

License: MIT License

Python 99.95% Shell 0.05%

python natural-language-processing nlp machine-learning vectors embeddings word2vec fasttext glove gensim

magnitude's Introduction

Magnitude: a fast, simple vector embedding utility library

A feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. It is primarily intended to be a simpler / faster alternative to Gensim, but can be used as a generic key-vector store for domains outside NLP. It offers unique features like out-of-vocabulary lookups and streaming of large models over HTTP. Published in our paper at EMNLP 2018 and available on arXiv.

Installation
Motivation
Benchmarks and Features
Pre-converted Magnitude Formats of Popular Embeddings Models
Using the Library
Concurrency and Parallelism
File Format and Converter
Remote Loading
Remote Streaming over HTTP
Other Documentation
Other Languages
Other Programming Languages
Other Domains
Contributing
Roadmap
Other Notable Projects
Citing this Repository
LICENSE and Attribution

Installation

You can install this package with pip:

pip install pymagnitude # Python 2.7
pip3 install pymagnitude # Python 3

Google Colaboratory has some dependency issues with installing Magnitude due to conflicting dependencies. You can use the following snippet to install Magnitude on Google Colaboratory:

# Install Magnitude on Google Colab
! echo "Installing Magnitude.... (please wait, can take a while)"
! (curl https://raw.githubusercontent.com/plasticityai/magnitude/master/install-colab.sh | /bin/bash 1>/dev/null 2>/dev/null)
! echo "Done installing Magnitude."

Motivation

Vector space embedding models have become increasingly common in machine learning and traditionally have been popular for natural language processing applications. A fast, lightweight tool to consume these large vector space embedding models efficiently is lacking.

The Magnitude file format (.magnitude) for vector embeddings is intended to be a more efficient universal vector embedding format that allows for lazy-loading for faster cold starts in development, LRU memory caching for performance in production, multiple key queries, direct featurization to the inputs for a neural network, performant similiarity calculations, and other nice to have features for edge cases like handling out-of-vocabulary keys or misspelled keys and concatenating multiple vector models together. It also is intended to work with large vector models that may not fit in memory.

It uses SQLite, a fast, popular embedded database, as its underlying data store. It uses indexes for fast key lookups as well as uses memory mapping, SIMD instructions, and spatial indexing for fast similarity search in the vector space off-disk with good memory performance even between multiple processes. Moreover, memory maps are cached between runs so even after closing a process, speed improvements are reaped.

Benchmarks and Features

Metric	Magnitude Light	Magnitude Medium	Magnitude Heavy	Magnitude Stream
Initial load time	0.7210s	━ ¹	━ ¹	7.7550s
Cold single key query	0.0001s	━ ¹	━ ¹	1.6437s
Warm single key query ^{(same key as cold query)}	0.00004s	━ ¹	━ ¹	0.0004s
Cold multiple key query ⁽ⁿ⁼²⁵⁾	0.0442s	━ ¹	━ ¹	1.7753s
Warm multiple key query ^{(n=25) (same keys as cold query)}	0.00004s	━ ¹	━ ¹	0.0001s
First `most_similar` search query ^{(n=10) (worst case)}	247.05s	━ ¹	━ ¹	-
First `most_similar` search query ^{(n=10) (average case) (w/ disk persistent cache)}	1.8217s	━ ¹	━ ¹	-
Subsequent `most_similar` search ^{(n=10) (different key than first query)}	0.2434s	━ ¹	━ ¹	-
Warm subsequent `most_similar` search ^{(n=10) (same key as first query)}	0.00004s	0.00004s	0.00004s	-
First `most_similar_approx` search query ^{(n=10, effort=1.0) (worst case)}	N/A	N/A	29.610s	-
First `most_similar_approx` search query ^{(n=10, effort=1.0) (average case) (w/ disk persistent cache)}	N/A	N/A	0.9155s	-
Subsequent `most_similar_approx` search ^{(n=10, effort=1.0) (different key than first query)}	N/A	N/A	0.1873s	-
Subsequent `most_similar_approx` search ^{(n=10, effort=0.1) (different key than first query)}	N/A	N/A	0.0199s	-
Warm subsequent `most_similar_approx` search ^{(n=10, effort=1.0) (same key as first query)}	N/A	N/A	0.00004s	-
File size	4.21GB	5.29GB	10.74GB	0.00GB
Process memory (RAM) utilization	18KB	━ ¹	━ ¹	1.71MB
Process memory (RAM) utilization after 100 key queries	168KB	━ ¹	━ ¹	1.91MB
Process memory (RAM) utilization after 100 key queries + similarity search	342KB²	━ ¹	━ ¹
Integrity checks and tests	✅	✅	✅	✅
Universal format between word2vec (`.txt`, `.bin`), GloVe (`.txt`), fastText (`.vec`), and ELMo (`.hdf5`) with converter utility	✅	✅	✅	✅
Simple, Pythonic interface	✅	✅	✅	✅
Few dependencies	✅	✅	✅	✅
Support for larger than memory models	✅	✅	✅	✅
Lazy loading whenever possible for speed and performance	✅	✅	✅	✅
Optimized for `threading` and `multiprocessing`	✅	✅	✅	✅
Bulk and multiple key lookup with padding, truncation, placeholder, and featurization support	✅	✅	✅	✅
Concatenting multiple vector models together	✅	✅	✅	✅
Basic out-of-vocabulary key lookup ^{(character n-gram feature hashing)}	✅	✅	✅	✅
Advanced out-of-vocabulary key lookup with support for misspellings ^{(character n-gram feature hashing to similar in-vocabulary keys)}	❌	✅	✅	✅
Approximate most similar search with an annoy index	❌	❌	✅	✅
Built-in training for new models	❌	❌	❌	❌

^{1: same value as previous column}
^{2: uses mmap to read from disk, so the OS will still allocate pages of memory when memory is available, but it can be shared between processes and isn't managed within each process for extremely large files which is a performance win}
^{*: All benchmarks were performed on the Google News pre-trained word vectors (GoogleNews-vectors-negative300.bin) with a MacBook Pro (Retina, 15-inch, Mid 2014) 2.2GHz quad-core Intel Core i7 @ 16GB RAM on SSD over an average of trials where feasible.}

Pre-converted Magnitude Formats of Popular Embeddings Models

Popular embedding models have been pre-converted to the .magnitude format for immmediate download and usage:

Contributor	Data	Light ^{(basic support for out-of-vocabulary keys)}	Medium (recommended) ^{(advanced support for out-of-vocabulary keys)}	Heavy ^{(advanced support for out-of-vocabulary keys and faster most_similar_approx)}
Google - word2vec	Google News 100B	300D	300D	300D
Stanford - GloVe	Wikipedia 2014 + Gigaword 5 6B	50D, 100D, 200D, 300D	50D, 100D, 200D, 300D	50D, 100D, 200D, 300D
Stanford - GloVe	Wikipedia 2014 + Gigaword 5 6B (lemmatized by Plasticity)	50D, 100D, 200D, 300D	50D, 100D, 200D, 300D	50D, 100D, 200D, 300D
Stanford - GloVe	Common Crawl 840B	300D	300D	300D
Stanford - GloVe	Twitter 27B	25D, 50D, 100D, 200D	25D, 50D, 100D, 200D	25D, 50D, 100D, 200D
Facebook - fastText	English Wikipedia 2017 16B	300D	300D	300D
Facebook - fastText	English Wikipedia 2017 + subword 16B	300D	300D	300D
Facebook - fastText	Common Crawl 600B	300D	300D	300D
AI2 - AllenNLP ELMo	ELMo Models	ELMo Models	ELMo Models	ELMo Models
Google - BERT	Coming Soon...	Coming Soon...	Coming Soon...	Coming Soon...

There are instructions below for converting any .bin, .txt, .vec, .hdf5 file to a .magnitude file.

Using the Library

Constructing a Magnitude Object

You can create a Magnitude object like so:

from pymagnitude import *
vectors = Magnitude("/path/to/vectors.magnitude")

If needed, and included for convenience, you can also open a .bin, .txt, .vec, .hdf5 file directly with Magnitude. This is, however, less efficient and very slow for large models as it will convert the file to a .magnitude file on the first run into a temporary directory. The temporary directory is not guaranteed to persist and does not persist when your computer reboots. You should pre-convert .bin, .txt, .vec, .hdf5 files with python -m pymagnitude.converter typically for faster speeds, but this feature is useful for one-off use-cases. A warning will be generated when instantiating a Magnitude object directly with a .bin, .txt, .vec, .hdf5. You can supress warnings by setting the supress_warnings argument in the constructor to True.

^{By default, lazy loading is enabled. You can pass in an optional lazy_loading argument to the constructor with the value -1 to disable lazy-loading and pre-load all vectors into memory (a la Gensim), 0 (default) to enable lazy-loading with an unbounded in-memory LRU cache, or an integer greater than zero X to enable lazy-loading with an LRU cache that holds the X most recently used vectors in memory.}
^{If you want the data for the most_similar functions to be pre-loaded eagerly on initialization, set eager to True.}
^{Note, even when lazy_loading is set to -1 or eager is set to True data will be pre-loaded into memory in a background thread to prevent the constructor from blocking for a few minutes for large models. If you really want blocking behavior, you can pass True to the blocking argument.}
^{By default, unit-length normalized vectors are returned unless you are loading an ELMo model. Set the optional argument normalized to False if you wish to recieve the raw non-normalized vectors instead.}
^{By default, NumPy arrays are returned for queries. Set the optional argument use_numpy to False if you wish to recieve Python lists instead.}
^{By default, querying for keys is case-sensitive. Set the optional argument case_insensitive to True if you wish to perform case-insensitive searches.}
^{Optionally, you can include the pad_to_length argument which will specify the length all examples should be padded to if passing in multple examples. Any examples that are longer than the pad length will be truncated.}
^{Optionally, you can set the truncate_left argument to True if you want the beginning of the the list of keys in each example to be truncated instead of the end in case it is longer than pad_to_length when specified.}
^{Optionally, you can set the pad_left argument to True if you want the padding to appear at the beginning versus the end (which is the default).}
^{Optionally, you can pass in the placeholders argument, which will increase the dimensions of each vector by a placeholders amount, zero-padding those extra dimensions. This is useful, if you plan to add other values and information to the vectors and want the space for that pre-allocated in the vectors for efficiency.}
^{Optionally, you can pass in the language argument with an ISO 639-1 Language Code, which, if you are using Magnitude for word vectors, will ensure the library respects stemming and other language-specific features for that language. The default is en for English. You can also pass in None if you are not using Magnitude for word vectors.}
^{Optionally, you can pass in the dtype argument which will let you control the data type of the NumPy arrays returned by Magnitude.}
^{Optionally, you can pass in the devices argument which will let you control the usage of GPUs when the underlying models supports GPU usage. This argument should be a list of integers, where each integer represents the GPU device number (0, 1, etc.).}
^{Optionally, you can pass in the temp_dir argument which will let you control the location of the temporary directory Magnitude will use.}
^{Optionally, you can pass in the log argument which will have Magnitude log progress to standard error when slow operations are taking place.}

Querying

You can query the total number of vectors in the file like so:

len(vectors)

You can query the dimensions of the vectors like so:

vectors.dim

You can check if a key is in the vocabulary like so:

"cat" in vectors

You can iterate through all keys and vectors like so:

for key, vector in vectors:
  ...

You can query for the vector of a key like so:

vectors.query("cat")

You can index for the n-th key and vector like so:

vectors[42]

You can query for the vector of multiple keys like so:

vectors.query(["I", "read", "a", "book"])

A 2D array (keys by vectors) will be returned.

You can query for the vector of multiple examples like so:

vectors.query([["I", "read", "a", "book"], ["I", "read", "a", "magazine"]])

A 3D array (examples by keys by vectors) will be returned. If pad_to_length is not specified, and the size of each example is uneven, they will be padded to the length of the longest example.

You can index for the keys and vectors of multiple indices like so:

vectors[:42] # slice notation
vectors[42, 1337, 2001] # tuple notation

You can query the distance of two or multiple keys like so:

vectors.distance("cat", "dog")
vectors.distance("cat", ["dog", "tiger"])

You can query the similarity of two or multiple keys like so:

vectors.similarity("cat", "dog")
vectors.similarity("cat", ["dog", "tiger"])

You can query for the most similar key out of a list of keys to a given key like so:

vectors.most_similar_to_given("cat", ["dog", "television", "laptop"]) # dog

You can query for which key doesn't match a list of keys to a given key like so:

vectors.doesnt_match(["breakfast", "cereal", "dinner", "lunch"]) # cereal

You can query for the most similar (nearest neighbors) keys like so:

vectors.most_similar("cat", topn = 100) # Most similar by key
vectors.most_similar(vectors.query("cat"), topn = 100) # Most similar by vector

Optionally, you can pass a min_similarity argument to most_similar. Values from [-1.0-1.0] are valid.

You can also query for the most similar keys giving positive and negative examples (which, incidentally, solves analogies) like so:

vectors.most_similar(positive = ["woman", "king"], negative = ["man"]) # queen

Similar to vectors.most_similar, a vectors.most_similar_cosmul function exists that uses the 3CosMul function from Levy and Goldberg:

vectors.most_similar_cosmul(positive = ["woman", "king"], negative = ["man"]) # queen

You can also query for the most similar keys using an approximate nearest neighbors index which is much faster, but doesn't guarantee the exact answer:

vectors.most_similar_approx("cat")
vectors.most_similar_approx(positive = ["woman", "king"], negative = ["man"])

Optionally, you can pass an effort argument with values between [0.0-1.0] to the most_similar_approx function which will give you runtime trade-off. The default value for effort is 1.0 which will take the longest, but will give the most accurate result.

You can query for all keys closer to a key than another key is like so:

vectors.closer_than("cat", "rabbit") # ["dog", ...]

You can access all of the underlying vectors in the model in a large numpy.memmap array of size (len(vectors) x vectors.emb_dim) like so:

vectors.get_vectors_mmap()

You can clean up all associated resources, open files, and database connections like so:

vectors.close()

Basic Out-of-Vocabulary Keys

For word vector representations, handling out-of-vocabulary keys is important to handling new words not in the trained model, handling mispellings and typos, and making models trained on the word vector representations more robust in general.

Out-of-vocabulary keys are handled by assigning them a random vector value. However, the randomness is deterministic. So if the same out-of-vocabulary key is encountered twice, it will be assigned the same random vector value for the sake of being able to train on those out-of-vocabulary keys. Moreover, if two out-of-vocabulary keys share similar character n-grams ("uberx", "uberxl") they will placed close to each other even if they are both not in the vocabulary:

vectors = Magnitude("/path/to/GoogleNews-vectors-negative300.magnitude")
"uberx" in vectors # False
"uberxl" in vectors # False
vectors.query("uberx") # array([ 5.07109939e-02, -7.08248823e-02, -2.74812328e-02, ... ])
vectors.query("uberxl") # array([ 0.04734962, -0.08237578, -0.0333479, -0.00229564, ... ])
vectors.similarity("uberx", "uberxl") # 0.955000000200815

Advanced Out-of-Vocabulary Keys

If using a Magnitude file with advanced out-of-vocabulary support (Medium or Heavy), out-of-vocabulary keys will also be embedded close to similar keys (determined by string similarity) that are in the vocabulary:

vectors = Magnitude("/path/to/GoogleNews-vectors-negative300.magnitude")
"uberx" in vectors # False
"uberification" in vectors # False
"uber" in vectors # True
vectors.similarity("uberx", "uber") # 0.7383483267618451
vectors.similarity("uberification", "uber") # 0.745452837882727

Handling Misspellings and Typos

This also makes Magnitude robust to a lot of spelling errors:

vectors = Magnitude("/path/to/GoogleNews-vectors-negative300.magnitude")
"missispi" in vectors # False
vectors.similarity("missispi", "mississippi") # 0.35961736624824003
"discrimnatory" in vectors # False
vectors.similarity("discrimnatory", "discriminatory") # 0.8309152561753461
"hiiiiiiiiii" in vectors # False
vectors.similarity("hiiiiiiiiii", "hi") # 0.7069775034853861

Character n-grams are used to create this effect for out-of-vocabulary keys. The inspiration for this feature was taken from Facebook AI Research's Enriching Word Vectors with Subword Information, but instead of utilizing character n-grams at train time, character n-grams are used at inference so the effect can be somewhat replicated (but not perfectly replicated) in older models that were not trained with character n-grams like word2vec and GloVe.

Concatenation of Multiple Models

Optionally, you can combine vectors from multiple models to feed stronger information into a machine learning model like so:

from pymagnitude import *
word2vec = Magnitude("/path/to/GoogleNews-vectors-negative300.magnitude")
glove = Magnitude("/path/to/glove.6B.50d.magnitude")
vectors = Magnitude(word2vec, glove) # concatenate word2vec with glove
vectors.query("cat") # returns 350-dimensional NumPy array ('cat' from word2vec concatenated with 'cat' from glove)
vectors.query(("cat", "cats")) # returns 350-dimensional NumPy array ('cat' from word2vec concatenated with 'cats' from glove)

You can concatenate more than two vector models, simply by passing more arguments to constructor.

Additional Featurization (Parts of Speech, etc.)

You can automatically create vectors from additional features you may have such as parts of speech, syntax dependency information, or any other information using the FeaturizerMagnitude class:

from pymagnitude import *
pos_vectors = FeaturizerMagnitude(100, namespace = "PartsOfSpeech")
pos_vectors.dim # 4 - number of dims automatically determined by Magnitude from 100
pos_vectors.query("NN") # - array([ 0.08040417, -0.71705252,  0.61228951,  0.32322192]) 
pos_vectors.query("JJ") # - array([-0.11681135,  0.10259253,  0.8841201 , -0.44063763])
pos_vectors.query("NN") # - array([ 0.08040417, -0.71705252,  0.61228951,  0.32322192]) (deterministic hashing so the same value is returned every time for the same key)
dependency_vectors = FeaturizerMagnitude(100, namespace = "SyntaxDependencies")
dependency_vectors.dim # 4 - number of dims automatically determined by Magnitude from 100
dependency_vectors.query("nsubj") # - array([-0.81043793,  0.55401352, -0.10838071,  0.15656626])
dependency_vectors.query("prep") # - array([-0.30862918, -0.44487267, -0.0054573 , -0.84071788])

Magnitude will use the feature hashing trick internally to directly use the hash of the feature value to create a unique vector for that feature value.

The first argument to FeaturizerMagnitude should be an approximate upper-bound on the number of values for the feature. Since there are < 100 parts of speech tags and < 100 syntax dependencies, we choose 100 for both in the example above. The value chosen will determine how many dimensions Magnitude will automatically assign to the particular the FeaturizerMagnitude object to reduce the chance of a hash collision. The namespace argument can be any string that describes your additional feature. It is optional, but highly recommended.

You can then concatenate these features for use with a standard Magnitude object:

from pymagnitude import *
word2vec = Magnitude("/path/to/GoogleNews-vectors-negative300.magnitude")
pos_vectors = FeaturizerMagnitude(100, namespace = "PartsOfSpeech")
dependency_vectors = FeaturizerMagnitude(100, namespace = "SyntaxDependencies")
vectors = Magnitude(word2vec, pos_vectors, dependency_vectors) # concatenate word2vec with pos and dependencies
vectors.query([
    ("I", "PRP", "nsubj"), 
    ("saw", "VBD", "ROOT"), 
    ("a", "DT", "det"), 
    ("cat", "NN", "dobj"), 
    (".",  ".", "punct")
  ]) # array of size 5 x (300 + 4 + 4) or 5 x 308

# Or get a unique vector for every 'buffalo' in:
# "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo"
# (https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo)
vectors.query([
    ("Buffalo", "JJ", "amod"), 
    ("buffalo", "NNS", "nsubj"), 
    ("Buffalo", "JJ", "amod"), 
    ("buffalo", "NNS", "nsubj"), 
    ("buffalo",  "VBP", "rcmod"),
    ("buffalo",  "VB", "ROOT"),
    ("Buffalo",  "JJ", "amod"),
    ("buffalo",  "NNS", "dobj")
  ]) # array of size 8 x (300 + 4 + 4) or 8 x 308

A machine learning model, given this output, now has access to parts of speech information and syntax dependency information instead of just word vector information. In this case, this additional information can give neural networks stronger signal for semantic information and reduce the need for training data.

Using Magnitude with a ML library

Magnitude makes it very easy to quickly build and iterate on models that need to use vector representations by taking care of a lot of pre-processing code to convert a dataset of text (or keys) into vectors. Moreover, it can make these models more robust to out-of-vocabulary words and misspellings.

There is example code available using Magnitude to build an intent classification model for the ATIS (Airline Travel Information Systems) dataset (Train/Test), used for chatbots or conversational interfaces, in a few popular machine learning libraries below.

Keras

You can access a guide for using Magnitude with Keras (which supports TensorFlow, Theano, CNTK) at this Google Colaboratory Python notebook.

PyTorch

The PyTorch guide is coming soon.

TFLearn

The TFLearn guide is coming soon.

Utils

You can use the MagnitudeUtils class for convenient access to functions that may be useful when creating machine learning models.

You can import MagnitudeUtils like so:

  from pymagnitude import MagnitudeUtils

You can download a Magnitude model from a remote source like so:

  vecs = Magnitude(MagnitudeUtils.download_model('word2vec/heavy/GoogleNews-vectors-negative300'))

By default, download_model will download files from http://magnitude.plasticity.ai to a ~/.magnitude folder created automatically. If the file has already been downloaded, it will not be downloaded again. You can change the directory of the local download folder using the optional download_dir argument. You can change the domain from which models will be downloaded with the optional remote_path argument.

You can create a batch generator for X and y data with batchify, like so:

  X = [.3, .2, .7, .8, .1]
  y = [0, 0, 1, 1, 0]
  batch_gen = MagnitudeUtils.batchify(X, y, 2)
  for X_batch, y_batch in batch_gen:
    print(X_batch, y_batch)
  # Returns:
  # 1st loop: X_batch = [.3, .2], y_batch = [0, 0]
  # 2nd loop: X_batch = [.7, .8], y_batch = [1, 1]
  # 3rd loop: X_batch = [.1], y_batch = [0]
  # next loop: repeats infinitely...

You can encode class labels to integers and back with class_encoding, like so:

  add_class, class_to_int, int_to_class = MagnitudeUtils.class_encoding()
  add_class("cat") # Returns: 0
  add_class("dog") # Returns: 1
  add_class("cat") # Returns: 0
  class_to_int("dog") # Returns: 1
  class_to_int("cat") # Returns: 0
  int_to_class(1) # Returns: "dog"
  int_to_class(0) # Returns: "cat"

You can convert categorical data with class integers to one-hot NumPy arrays with to_categorical, like so:

  y = [1, 5, 2]
  MagnitudeUtils.to_categorical(y, num_classes = 6) # num_classes is optional
  # Returns: 
  # array([[0., 1., 0., 0., 0., 0.] 
  #       [0., 0., 0., 0., 0., 1.] 
  #       [0., 0., 1., 0., 0., 0.]])

You can convert from one-hot NumPy arrays back to a 1D NumPy array of class integers with from_categorical, like so:

  y_c = [[0., 1., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 1.]]
  MagnitudeUtils.from_categorical(y_c)
  # Returns: 
  # array([1., 5.])

Concurrency and Parallelism

The library is thread safe (it uses a different connection to the underlying store per thread), is read-only, and it never writes to the file. Because of the light-memory usage, you can also run it in multiple processes (or use multiprocessing) with different address spaces without having to duplicate the data in-memory like with other libraries and without having to create a multi-process shared variable since data is read off-disk and each process keeps its own LRU memory cache. For heavier functions, like most_similar a shared memory mapped file is created to share memory between processes.

File Format and Converter

The Magnitude package uses the .magnitude file format instead of .bin, .txt, .vec, or .hdf5 as with other vector models like word2vec, GloVe, fastText, and ELMo. There is an included command-line utility for converting word2vec, GloVe, fastText, and ELMo files to Magnitude files.

You can convert them like so:

python -m pymagnitude.converter -i <PATH TO FILE TO BE CONVERTED> -o <OUTPUT PATH FOR MAGNITUDE FILE>

The input format will automatically be determined by the extension / the contents of the input file. You should only need to perform this conversion once for a model. After converting, the Magnitude file format is static and it will not be modified or written to make concurrent read access safe.

The flags for pymagnitude.converter are specified below:

You can pass in the -h flag for help and to list all flags.
You can use the -p <PRECISION> flag to specify the decimal precision to retain (selecting a lower number will create smaller files). The actual underlying values are stored as integers instead of floats so this is essentially quantization for smaller model footprints.
You can add an approximate nearest neighbors index to the file (increases size) with the -a flag which will enable the use of the most_similar_approx function. The -t <TREES> flag controls the number of trees in the approximate neigherest neighbors index (higher is more accurate) when used in conjunction with the -a flag (if not supplied, the number of trees is automatically determined).
You can pass the -s flag to disable adding subword information to the file (which will make the file smaller), but disable advanced out-of-vocabulary key support.
If converting a model that has no vocabulary like ELMo, you can pass the -v flag along with the path to another Magnitude file you would like to take the vocabulary from.

Optionally, you can bulk convert many files by passing an input folder and output folder instead of an input file and output file. All .txt, .bin, .vec, .hdf5 files in the input folder will be converted to .magnitude files in the the output folder. The output folder must exist before a bulk conversion operation.

Remote Loading

You can instruct Magnitude download and open a model from Magnitude's remote repository instead of a local file path. The file will automatically be downloaded locally on the first run to ~/.magnitude/ and subsequently skip the download if the file already exists locally.

  vecs = Magnitude('http://magnitude.plasticity.ai/word2vec/heavy/GoogleNews-vectors-negative300.magnitude') # full url
  vecs = Magnitude('word2vec/heavy/GoogleNews-vectors-negative300') # or, use the shorthand for the url

For more control over the remote download domain and local download directory, see how to use MagnitudeUtils.download_model.

Remote Streaming over HTTP

Magnitude models are generally large files (multiple GB) that take up a lot of disk space, even though the .magnitude format makes it fast to utilize the vectors. Magnitude has an option to stream these large files over HTTP. This is explicitly different from the remote loading feature, in that the model doesn't even need to be downloaded at all. You can begin querying models immediately with no disk space used at all.

  vecs = Magnitude('http://magnitude.plasticity.ai/word2vec/heavy/GoogleNews-vectors-negative300.magnitude', stream=True) # full url
  vecs = Magnitude('word2vec/heavy/GoogleNews-vectors-negative300', stream=True) # or, use the shorthand for the url

  vecs.query("king") # Returns: the vector for "king" quickly, even with no local model file downloaded

You can play around with a demo of this in a Google Colaboratory Python Notebook.

This feature is extremely useful if your computing environment is resource constrainted (low RAM and low disk space), you want to experiment quickly with vectors without downloading and setting up large model files, or you are training a small model. While there is some added network latency since the data is being streamed, Magnitude will still use an in-memory cache as specified by the lazy_loading constructor parameter. Since languages generally have a Zipf-ian distribution, the network latency should largely not be an issue after the cache is warmed after being queried a small number of times.

They will be queried directly off a static HTTP web server using HTTP Range Request headers. All Magnitude methods support streaming, however, most_similar and most_similar_approx may be slow as they are not optimized for streaming yet. You can see how this streaming mode performs currently in the benchmarks, however, it will get faster as we optimize it in the future!

Other Languages

Currently, we only provide English word vector models on this page pre-converted to the .magnitude format. You can, however, still use Magnitude with word vectors of other languages. Facebook has trained their fastText vectors for many different languages. You can down the .vec file for any language you want and then convert it to .magnitude with the converter.

Other Programming Languages

Currently, reading Magnitude files is only supported in Python, since it has become the de-facto language for machine learning. This is sufficient for most use cases. Extending the file format to other languages shouldn't be difficult as SQLite has a native C implementation and has bindings in most languages. The file format itself and the protocol for reading and searching is also fairly straightforward upon reading the source code of this repository.

Other Domains

Currently, natural language processing is the most popular domain that uses pre-trained vector embedding models for word vector representations. There are, however, other domains like computer vision that have started using pre-trained vector embedding models like Deep1B for image representation. This library intends to stay agnostic to various domains and instead provides a generic key-vector store and interface that is useful for all domains.

Contributing

The main repository for this project can be found on GitLab. The GitHub repository is only a mirror. Pull requests for more tests, better error-checking, bug fixes, performance improvements, or documentation or adding additional utilties / functionalities are welcome on GitLab.

You can contact us at [email protected].

Roadmap

Speed optimizations on remote streaming and exposing stream cache configuration options
Make most_similar_approx optimized for streaming
In addition to the "Light", "Medium", and "Heavy" flavors, add a "Ludicrous" flavor that will be of an even larger file size but removes the constraint of the initially slow most_similar lookups.
Add Google BERT support
Support fastText .bin format

Other Notable Projects

spotify/annoy - Powers the approximate nearest neighbors algorithm behind most_similar_approx in Magnitude using random-projection trees and hierarchical 2-means. Thanks to author Erik Bernhardsson for helping out with some of the integration details between Magnitude and Annoy.

Citing this Repository

If you'd like to cite our paper at EMNLP 2018, you can use the following BibTeX citation:

@inproceedings{patel2018magnitude,
  title={Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package},
  author={Patel, Ajay and Sands, Alexander and Callison-Burch, Chris and Apidianaki, Marianna},
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  pages={120--126},
  year={2018}
}

or follow the Google Scholar link for other ways to cite the paper.

If you'd like to cite this repository you can use the following DOI badge:

Clicking on the badge will lead to a page that will help you generate proper BibTeX citations, JSON-LD citations, and other citations.

LICENSE and Attribution

This repository is licensed under the license found here.

“Seismic” icon by JohnnyZi from the Noun Project.

magnitude's People

Contributors

Stargazers

Watchers

Forkers

menshikh-iv xraymemory huyhoang17 scopeai noahkim11 caffeine-potent ml-lab ajmssc newenglandml moustaphacheikh xkzju anirband ericschles onisimchukv vladserkoff rjkrnkmr vvvictorlee silvermoon-h kyoungrok0517 jweese afcarl nankaiming carsondahlberg vishalbelsare alexpnt friendshipity gazzola klausondrag beesitech shusenl kp-forks zfang shuny-smartnews codeaudit ruze00 h-v dunovank cl-tohoku vlthr mittalsuraj18 tma15 saudbinhabib jfelectron ztx0728 zhangxuemiao mattzque emanuelaboros pandinosaurus stjordanis pombredanne ibozkurt79 ozgur-caglayan cmhashim vinbo8 calculatedcontent mindis rtvt123 oshaikh13 howgroup d-cunningham juvu gridl barseghyanartur shalevy1 tpeng dstaka lumiqai jiangjiane webblearning repoarchiver phymucs vladpaunescu arita37 zhiqiao761 dragomirradev prashantkodali c-chaitanya tuxedocat tempestwk1 neuml crystina-z zzmjohn haodeqi abeusher ra2003 rdgozum stungkit aptr322 sleepy-owl datactivist sidhantls sloth2012 shakenetwork forest1102 yogeshchandrasekharuni aung2phyowai geokaragiannis xsardine poke1024 python-repository-hub

magnitude's Issues

lots of pip requirement

many of the module should be install after installing pymagnitude like torch, lz4 and many more. Time consuming.

Non-unit length word vectors

Hi, occasionally I want to use raw vectors and not demean and normalize. Is there a way to do this?

magnitude download feature

I was wondering if magnitude could have a feature like NLTK download where embeddings could be downloaded from your website.

A minor change needed in the most_similar function when used by vector

Regenerate the output to understand the issue:

from pymagnitude import *
glove = Magnitude("path/to/glove.6B.300d.magnitude")
print(glove.most_similar("cat", topn = 2)) # Most similar by key
print(glove.most_similar(glove.query("cat"), topn = 2)) # Most similar by vector

Output will be as follows:```

[('dog', 0.6816746), ('cats', 0.68158376)]
[('cat', 1.0), ('dog', 0.6816746)]


As one can clearly see that the function most_similar works perfectly when called by key. But it returns the same word when used by passing that word's vector. This should not be the case. A minor modification in code should be made to not take into account the same word as output.

can I add a new word and it's vector without regenerating the whole .magnitude file?

i.e. I generated a .magnitude file to do the service, and then I got a new word and it's dense vector. Can I just insert it (overwrite if exists) ?

Error with most_similar "too many SQL variables"

Trying one of the examples on the main page (version 0.1.5 from pypi)

Using the fasttext common crawl vectors (medium)

http://magnitude.plasticity.ai/fasttext+subword/crawl-300d-2M.magnitude

>>> vectors = Magnitude("crawl-300d-2M.magnitude")
>>> vectors.most_similar("cat", topn = 100)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ian/anaconda3/lib/python3.5/site-packages/pymagnitude/third_party/repoze/lru/__init__.py", line 354, in cached_wrapper
    val = func(*args, **kwargs)
  File "/home/ian/anaconda3/lib/python3.5/site-packages/pymagnitude/__init__.py", line 962, in most_similar
    return_similarities=return_similarities, method='distance')
  File "/home/ian/anaconda3/lib/python3.5/site-packages/pymagnitude/__init__.py", line 916, in _db_query_similarity
    return_vector = False)
  File "/home/ian/anaconda3/lib/python3.5/site-packages/pymagnitude/__init__.py", line 760, in index
    return self._keys_for_indices(q, return_vector=return_vector)
  File "/home/ian/anaconda3/lib/python3.5/site-packages/pymagnitude/__init__.py", line 672, in _keys_for_indices
    unseen_indices)
sqlite3.OperationalError: too many SQL variables

Any other info that would be helpful here?

OperationalError: FTS expression tree is too large (maximum depth 12)

Should the caller check the length of the queried words? Passing a very long string like below causes an SQLite error

v.query(1026*'ab')

slice operator does not work.

I get an error when using slice notation on the vector object.

Traceback (most recent call last):
  File magnitude_test.py", line 9, in <module>
    test_list = [vectors[3:20],
  File "pymagnitude\__init__.py", line 1257, in __getitem__
    return_vector=True)
  File "pymagnitude\__init__.py", line 795, in index
    return self._key_for_index_cached(q, return_vector=return_vector)
  File "pymagnitude\third_party\repoze\lru\__init__.py", line 354, in cached_wrapper
    val = func(*args, **kwargs)
  File "pymagnitude\__init__.py", line 276, in _key_for_index_cached
    return self._key_for_index(*args, **kwargs)
  File "pymagnitude\__init__.py", line 670, in _key_for_index
    (int(index + 1),)).fetchall()
TypeError: unsupported operand type(s) for +: 'range' and 'int'

The mistake seems to be in the index method

   def index(self, q, return_vector=True):
        """Gets a key for an index or multiple indices."""
        if isinstance(q, list) or isinstance(q, tuple):
            return self._keys_for_indices(q, return_vector=return_vector)
        else:
            return self._key_for_index_cached(q, return_vector=return_vector)

This seems to not branch properly since what it gets is a range. The following should work for everything that is iterable. For me it fixes the bug.

    def index(self, q, return_vector=True):
        """Gets a key for an index or multiple indices."""
        if hasattr(q,"__iter__"):
            return self._keys_for_indices(q, return_vector=return_vector)
        else:
            return self._key_for_index_cached(q, return_vector=return_vector)

Recursion error

I ran into an interesting RecursionError with a string in a corpus I was using recently:

text = [
 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
]
fasttext_embedding = Magnitude(
    fasttext_embedding_path, pad_to_length=500, pad_left=True
)
twitter_embedding = Magnitude(
    twitter_embedding_path, pad_to_length=500, pad_left=True
)
concatenated_embeddings = Magnitude(fasttext_embedding, twitter_embedding)
concatenated_embeddings.query(text)

Results in the following traceback:

  Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/lib/python3.6/site-packages/pymagnitude/third_party/repoze/lru/__init__.py", line 390
, in cached_wrapper
    val = func(*args, **kwargs)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 2086, in query
    for i, m in enumerate(self.magnitudes)]
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 2086, in <listcomp>
    for i, m in enumerate(self.magnitudes)]
  File "/lib/python3.6/site-packages/pymagnitude/third_party/repoze/lru/__init__.py", line 390
, in cached_wrapper
    val = func(*args, **kwargs)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 1219, in query
    vectors = self._vectors_for_keys_cached(q, normalized)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 1107, in _vectors_for_keys
_cached
    unseen_keys[i], normalized, force=force)
  File "/lib/python3.6/site-packages/pymagnitude/third_party/repoze/lru/__init__.py", line 390, in cached_wrapper
    val = func(*args, **kwargs)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 482, in _out_of_vocab_vector_cached
    return self._out_of_vocab_vector(*args, **kwargs)
  File "lib/python3.6/site-packages/pymagnitude/__init__.py", line 990, in _out_of_vocab_vector
    normalized=normalized) *
  File "lib/python3.6/site-packages/pymagnitude/__init__.py", line 753, in _db_query_similar_keys_vector
    key_stemmed = self._oov_stem(orig_key)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 722, in _oov_stem
    return self._oov_english_stem_english_ixes(key)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 715, in _oov_english_stem_english_ixes
    return self._oov_english_stem_english_ixes(stripped_key)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 715, in _oov_english_stem_english_ixes
    return self._oov_english_stem_english_ixes(stripped_key)
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 715, in _oov_english_stem_english_ixes
    return self._oov_english_stem_english_ixes(stripped_key)
  [Previous line repeated 979 more times]
  File "/lib/python3.6/site-packages/pymagnitude/__init__.py", line 702, in _oov_english_stem_english_ixes
    if key_lower[:len(p)] == p:
RecursionError: maximum recursion depth exceeded in comparison

Any ideas on what might have caused this? Obviously the string should likely be removed, but curious why I ran into the error.

most_similar not working as expected with GoogleNews Light Magnitude

Sample snippet below.
Definitely most_similar to "help" is not "foster_feeling". Also, one would expect the similarity to be between -1.0 to 1.0. Seems like it is calculating eucledian distance

Am I doing something wrong or is this an issue?

from pymagnitude import Magnitude
wv = Magnitude("~/work/data/GoogleNews-vectors-negative300.magnitude.1", normalized = False)
wv.most_similar("help")
[('foster_feeling', 4.73423), ('eyewitness_accounts_background', 3.8858936), ('material_objectionable', 3.84861), ('Trail_rides_signify', 3.5484445), ('daring_TMV', 3.496218), ('free_Yahoo!_Account', 3.1454005), ('please_visit_http://www.TradeTheTrend.com', 2.9243984), ('Live_PR.com', 2.8446026), ('containing_inappropriate_links_obscenities', 2.7534983), ('Help_ICAL_CSV', 2.7428944)

Thanks
Ram

Not possible to install pymagnitude in multiple python environments

Collecting pymagnitude
  Using cached https://files.pythonhosted.org/packages/0a/a3/b9a34d22ed8c0ed59b00ff55092129641cdfa09d82f9abdc5088051a5b0c/pymagnitude-0.1.120.tar.gz
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-PBUBVG/pymagnitude/setup.py", line 178, in <module>
        'a+')
    IOError: [Errno 13] Permission denied: '/tmp/magnitude.install'
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-PBUBVG/pymagnitude/

I am trying to install pymagnitude in conda environment on a linux machine. I also have permission to write in /tmp directory. But seems like if some other user has already installed pymagnitude other users can't install it.
During installion, pymagnitude tries to create magnitude.install file inside /tmp directory but if magnitude.install already exists inside /tmp and some other user is the owner of it, in this case pip install pymagnitude fails with permission issue.
One possible solution:
add a random number at the end of the install directory in the /tmp folder.

converting .vec with annoy causes an infinite loop

Trying to convert a .vec file to .magnitude along with an annoy approximate index falls into an infinte loop whereby it continually prints 100.0%.

Using the repo's test/models/fasttext.vec as a minimal example (though the same behavior occurs with real .vec files):

$ python -m pymagnitude.converter -i fasttext.vec -o out.magnitude -a  
/usr/local/Cellar/python/3.6.4_3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py:125: RuntimeWarning: 'pymagnitude.converter' found in sys.modules after import of package 'pymagnitude', but prior to execution of 'pymagnitude.converter'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
Loading vectors... (this may take some time)
Found 5 key(s)
Each vector has 2 dimension(s)
Creating magnitude format...
Writing vectors... (this may take some time)
0% completed
20% completed
40% completed
60% completed
80% completed
Committing written vectors... (this may take some time)
Entropy of dimension 0 is 2.321928
Entropy of dimension 1 is 2.321928
Creating search index... (this may take some time)
Creating spatial search index for dimension 0 (it has high entropy)... (this may take some time)
Creating approximate nearest neighbors index... (this may take some time)
Dumping approximate nearest neighbors index... (this may take some time)
Compressing approximate nearest neighbors index... (this may take some time)
100.0%
100.0%
100.0%           <-- This line prints forever

Which suggests that something is amiss in the following loop starting at line 392 of converter.py:

for i, chunk in enumerate(iter(partial(ifh.read, chunk_size), '')):
    if i == 0:
        chunk = compressor.begin() + compressor.compress(chunk)
    else:
        chunk = compressor.compress(chunk)
    eprint(str((ifh.tell() / float(full_size)) * 100.0) + "%")

Likely unrelated, but I'm not sure what's causing the RuntimeWarning either. It's thrown even without -a, and the conversion seems to work just fine in that case.

pad_to_length doesn't work when using concatenated Magnitude vectors in call to query()

Code to reproduce:

from pymagnitude import MagnitudeUtils, Magnitude
size = 384
vecs1 = Magnitude(MagnitudeUtils.download_model('glove/medium/glove.6B.100d'))
vecs2 = Magnitude(MagnitudeUtils.download_model('fasttext/medium/wiki-news-300d-1M-subword'))
vecs = Magnitude(vecs1, vecs2)
sents = [["I", "read", "a", "book"], ["I", "read", "a", "magazine"]]
a = vecs.query(sents, pad_to_length=size, pad_left=False, truncate_left=False)
print(a.shape)

Output:
(2, 4, 400)

I don't need an alternative but I want this to work in my project.. So, please solve it as soon as possible..

Thank you..

Largest Heavy ELMO Datasets fail to load

Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

from pymagnitude import *
elmo_vecs = Magnitude("/mnt/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights_GoogleNews_vocab.magnitude")
Traceback (most recent call last):
File "", line 1, in
File "/home/dan/magnitude/pymagnitude/init.py", line 360, in init
"SELECT value FROM magnitude_format WHERE key='size'")
sqlite3.DatabaseError: malformed database schema (magnitude) - too many columns on magnitude
elmo_vecs = Magnitude("/mnt/elmo_2x4096_512_2048cnn_2xhighway_weights_GoogleNews_vocab.magnitude")
Traceback (most recent call last):
File "", line 1, in
File "/home/dan/magnitude/pymagnitude/init.py", line 360, in init
"SELECT value FROM magnitude_format WHERE key='size'")
sqlite3.DatabaseError: malformed database schema (magnitude) - too many columns on magnitude

Encoding error at conversion

Hi Ajay !

I have a .bin sentence embedding file generated as word2vec .bin format, and am trying to use the magnitude converter utility to convert it to a .magnitude file.

I believe a .bin generated with fastText should work for conversion, as mentioned in your documentation, but I'm hitting an encoding error in the file attached.

Could you shed some light on this, since there's no encoding option for the converter ?

Is there a solution ?

Thanks !

Jessica T
magnitude_encoding_error.txt

pip3 install get's stuck

Hey,

This project looks really exciting and kudos on the great work! I am trying to install the pymagnitude package, using the command pip3 install pymagnitude, but it get's stuck after downloading and doesn't proceed to install. I supplied the verbose flag with the command to see what it's getting stuck at and turn's out, it's getting stuck at:

Running setup.py (path:/tmp/pip-build-60o8i7g_/pymagnitude/setup.py) egg_info for package pymagnitude
Running command python setup.py egg_info

I tried installing it in a fresh virtual environment just to ensure that it was not the fault of my environment, but it did not work there too. Is it actually getting stuck or am I just being impatient and the package take 5+ minutes to install?

For your reference, I am running Python 3.6.5 on an Ubuntu 16.04

Any help would be appreciated. Thank you!

most_similar() anomalies

Hi, I converted bilingual FastText embeddings into a medium magnitude model and I'm getting some questionable results:

>>> xlvecs=Magnitude("wiki.+de+en.tag.vec.magnitude")
>>> katze=xlvecs.most_similar("katze@de@", topn=5)
>>> print(katze)
[('rabbit,@en@', 0.3190704584121704), ('dogs,@en@', 0.31559139490127563), ('chickenhound@en@', 0.3059767484664917), ('rabbity@en@', 0.30381107330322266), ('#mouse@en@', 0.29921069741249084)]
>>> xlvecs.similarity("katze@de@", "cat@en@")
0.4569693
>>> xlvecs.similarity("katze@de@", "cats@en@")
0.38769498
>>> xlvecs.similarity("katze@de@", "dog@en@")
0.42773518
>>> xlvecs.similarity("katze@de@", "rabbit@en@")
0.40975133

"cat@en@", "cats@en@", "dog@en@" and even actual "rabbit@en@" (no spurious comma) are more similar to "katze@de@" but instead I'm getting "rabbits,@en@". Am I misunderstanding what most_similar is supposed to do?

I thought maybe I could try setting max_distance to just a hair above xlvecs.distance("katze@de@", "cat@en@) to see what would happen, but I got TypeError: most_similar() got an unexpected keyword argument 'max_distance'

I'm on version 0.1.48

Diffculties with pretrained ELMo models

SQLite has a hard limit of 2000 columns, meaning your 3072 column tables simply don't work. This can be compiled out. You also need to increase the limit of SQL variables up from 999, at least for your converter. I don't know if that's enough, it's still working.
Many of your large ELMo tables do not appear to have a populated magnitude_format field and thus do not work.

Magnitude throws DatabaseError when instantiated from a .vec file

Possibly related to #2.

This happens to me when trying to create a Magnitude object from a .vec file, on both linux and macOS, under python 3.6.

Both self.fd and self.path seem to be set correctly, but something breaks in the call to self._db() . Here's an exerpt from a minimal session using one of the pretrained fasttext .vec files

In [3]: Magnitude("wiki-news-300d-1M.vec")
DatabaseError                             Traceback (most recent call last)
<ipython-input-3-d79f82a84c59> in <module>()
----> 1 Magnitude("wiki-news-300d-1M.vec")
/usr/local/lib/python3.6/site-packages/pymagnitude/__init__.py in __init__(self, path, lazy_loading, blocking, use_numpy, case_insensitive, pad_to_length, truncate_left, pad_left, placeholders, ngram_oov, supress_warnings, batch_size, eager, dtype, _namespace, _number_of_values)
    195         # Get metadata about the vectors
    196         self.length = self._db().execute(
--> 197             "SELECT COUNT(key) FROM magnitude") \
    198             .fetchall()[0][0]
    199         self.original_length = self._db().execute(

DatabaseError: file is not a database

Popping into the debugger, self.path points to the temporary .magnitude file, and instantiating a Magnitude object directly from there works just fine.

ipdb> p self.path
'/var/folders/vc/70hjbvsd3pq85t7x693365rc0000gn/T/c620a226ebc2c24c12ee1be5127d1448.magnitude'
ipdb> Magnitude(self.path)
<pymagnitude.Magnitude object at 0x10f2dea58>

So I modified the definition of self._db in __init__.py to just use

conn = sqlite3.connect(self.path,
          check_same_thread=False)

irrespective of the OS, and that seemed to solve the problem. There's likely a better way of doing things though. (I haven't looked through the code very carefully, and I don't totally understand the nuances of /dev/fd/)

Long time to install

I have seen your efforts on installing this package and its dependencies, and I appreciate your great job.
But it still takes more than an hour installing this package in China, either SKIP_ or not.
So I hope if you could build most of things as binary wheels and let them be downloaded in github for faster installing.
Again, thank you for your time.

Running most_similar on concatenated model

Is there any way to run most_similar() on two concatenated models?

Can't open db file when creating Magnitude Object

I tried following the instructions, downloaded a db file and tried out the code

vectors = Magnitude(MAG_PATH)

The file is definitly there but i got an error
sqlite3.OperationalError: unable to open database file

When looking at the code this seems to get executed starting at line 346

    if self.fd:
         conn = sqlite3.connect('/dev/fd/%d' % self.fd,
          check_same_thread=False)
    else:
        conn = sqlite3.connect(self.path, check_same_thread=False)
        self._create_empty_db(conn.cursor())

right at the start of the _db() method.

self.fd seems to get set whenever path is not none in the init method
it is the return code of os.open(path)

So the path that is tried to open is "/dev/fd/3/" instead of the actual database path

Seems to work when i change the line

conn = sqlite3.connect('/dev/fd/%d' % self.fd,
          check_same_thread=False)

conn = sqlite3.connect(self.path,
          check_same_thread=False)

Pip install is broken since v 1.3.5

Hi,

It seems like during the v1.3.5 release when you moved the installation files it broke pip installing

I've tried both in a docker container running debian jessie as well as on my mac.

Steps to reproduce:

# Breaks
pip install pymagnitude
pip install pymagnitude==0.1.35

# This works
pip install pymagnitude==0.1.34

Error:

Building wheels for collected packages: pymagnitude
  Running setup.py bdist_wheel for pymagnitude ... done
  Stored in directory: /Users/mycoolusername/Library/Caches/pip/wheels/d3/b0/69/5c2868a48835e8e79c2580c641548e8fdf953bfc29df4a
Successfully built pymagnitude
Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/site-packages/numpy-1.15.0.dist-info/METADATA'

I believe the problem is with the python package file move, the solution would most likely to be to move it back to where it was originally or edit the package config file (I forget).

Thank you for making such an awesome project.

[Just a question] Curious about why you made this as opposed to using something like FAISS

Problems with memory mapped files and vectors.most_similar()

Hello. I am on windows using python 3.6

There seems to be a problem with creating the magmap files for using the vectors.most_similar() method. Error:

  File "\pymagnitude\__init__.py", line 1074, in get_vectors_mmap
    os.rename(path_to_mmap_temp, self.path_to_mmap)

PermissionError: [WinError 32] The process cannot access the file because it is in use by another process \\AppDataTemp\\82ce74f40baf842d23c45b0e90688b9f.magmmap.tmp' -> '\AppData\\Local\\Temp\\82ce74f40baf842d23c45b0e90688b9f.magmmap'

I looked into the code and it seems you are creating and filling these files in the background.
It works fine for the vectors.most_similar_approx() method. The memmap file for approx seems to be created just fine.

The problem persists even when i use blocking=True when constructing the magnitude object.
Then i can't even use the approx method. It just hangs while initializing the object.

Thank you for your time.

Segmentation fault on pymagnitude.converter with -a flag

Hi! I am getting the Segmentation fault (core dumped) error when creating approximate nearest neighbors index...

Training of new embeddings

First, thanks for open sourcing this.

Am I correct in that there is currently not support for training new embeddings for a corpus of text? This seems like a critical feature for any word2vec implementation. Is there a plan for this to added?

Thanks.

Requirements do not properly install

clean install in a virtual environment does not install requirements.

pip 18.0    
Python 3.6.3    

➜  virtualenv -p python3 foo; source foo/bin/activate; pip3 install pymagnitude                                         
In [1]: from pymagnitude import Magnitude                           
---------------------------------------------------------------------------                                                             
ModuleNotFoundError                       Traceback (most recent call last)                                                             
<ipython-input-1-a4fc6d35defa> in <module>()                        
----> 1 from pymagnitude import Magnitude                           

/tmp/foo/lib/python3.6/site-packages/pymagnitude/__init__.py in <module>()                                                              
     11 import hashlib                                              
     12 import heapq                                                
---> 13 import lz4.frame                                            
     14 import math                                                 
     15 import operator                                             

ModuleNotFoundError: No module named 'lz4'

How does magnitude generate ELMO vectors for single words?

I'm using this model:

http://magnitude.plasticity.ai/elmo/medium/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights.magnitude

I've extracted ELMO embeddings for personality traits, computed pairwise cosine similarity, performed multidimensional scaling, and then visualized the result:

As you can see, the results don't make much sense. For example, with other embeddings (e.g., word2vec, paragram-sl999), you'll at least get positive traits on one side and negative traits on the other. I don't see much rhyme or reason in the above plot.

I get better results if I get vectors for the above traits by putting each of them in a 'sentence' with the word 'trait'. And I also get decent results if I use Allen NLP's elmo implementation even when not contextualizing the trait words.

I've also tried regressing human judgments about masculinity and femininity directly on the embeddings, and I get pretty much random predictions, whereas using other vectors (again, word2vec, paragram) or the getting ELMO vectors contextualized by the word 'trait' predicts the human judgments pretty well.

Not possible to download the datasets

I attempted a few times to download the datasets, but each time the download stopped after less than 100k of data was received. I don't know whether this is a temporary server issue. Do you have plans to host the files somewhere else?

The conversion from Glove to your format worked flawlessly, but maybe not everyone has the time and resources to perform it.

Error Loading ELMo

I'm just trying to use the ELMo embedding feature and getting the following issue:

using a fresh pip install in a virtualenv
(D) jsedoc@****:$ pip install -U pymagnitude Requirement already up-to-date: pymagnitude in /home/jsedoc/venvs/D/lib/python3.6/site-packages Requirement already up-to-date: numpy>=1.14.0 in /home/jsedoc/venvs/D/lib/python3.6/site-packages (from pymagnitude) Requirement already up-to-date: xxhash>=1.0.1 in /usr/local/lib/python3.6/site-packages (from pymagnitude) Requirement already up-to-date: fasteners>=0.14.1 in /usr/local/lib/python3.6/site-packages (from pymagnitude) Requirement already up-to-date: annoy>=1.11.4 in /home/jsedoc/venvs/D/lib/python3.6/site-packages (from pymagnitude) Requirement already up-to-date: lz4>=1.0.0 in /home/jsedoc/venvs/D/lib/python3.6/site-packages (from pymagnitude) Requirement already up-to-date: h5py>=2.8.0 in /home/jsedoc/venvs/D/lib/python3.6/site-packages (from pymagnitude) Requirement already up-to-date: torch in /home/jsedoc/venvs/D/lib/python3.6/site-packages (from pymagnitude) Requirement already up-to-date: monotonic>=0.1 in /usr/local/lib/python3.6/site-packages (from fasteners>=0.14.1->pymagnitude) Requirement already up-to-date: six in /home/jsedoc/venvs/D/lib/python3.6/site-packages (from fasteners>=0.14.1->pymagnitude) You are using pip version 9.0.1, however version 18.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command.

`(D***) jsedoc@***:$ python
Python 3.6.4 (default, Jan 22 2018, 23:35:54)
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.

from pymagnitude import *
w2v_vecs = Magnitude('/data1/embeddings/pymagnitude/GoogleNews-vectors-negative300.magnitude')
elmo_vecs = Magnitude('/data1/embeddings/pymagnitude/elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights_GoogleNews_vocab.magnitude')
Traceback (most recent call last):
File "", line 1, in
File "/home/jsedoc/venvs/D/lib/python3.6/site-packages/pymagnitude/init.py", line 361, in init
.fetchall()[0][0]
IndexError: list index out of range`

does magnitude support index2word attribute, similar to that of gensim?

In Gensim i follow these steps to load word vectors and just retain a unique set of vectors in next step.

model = gensim.models.KeyedVectors.load_word2vec_format(GoogleNews-vectors-negative300.bin, limit=100000)
index2word_set = set(model.index2word)

In magnitude i load word vectors but how to emulate the index2word?

model = Magnitude("crawl-300d-2M.magnitude")
index2word_set = set(model.index) #does not work, 'method' object is not iterable

Is there a way to read the magnitude files from s3 ?

Is there a way to read the magnitude files from s3

Other languages

Hello. First of all thanks for your effort. This is a pretty impressive library.

I'm not very experienced on nlp but I'm currently working on a sort of nlp task which involves classifying some text messages without having labeled data. Project I'm working on needs to process Turkish sentences. Can I somehow use this library to train on Turkish documents? If so can you provide me an example or guide me on the process? Thanks.

[FEATURE REQUEST] Show download progress when downloading magnitude files from the server

Magnitude(MagnitudeUtils.download_model('glove/medium/glove.6B.{}d'.format(self.glove_dim), download_dir=os.path.join(base_dir, 'magnitude')), case_insensitive=True)

The above piece of code, when downloading data from server, isn't showing progress to users. It feels like the command prompt is not doing any task when the script runs. This can be easily done through tqdm. This feature will be of great help.

Thank you..

Erroneous import in init.py: from pymagnitude.converter_shared import convert

Magnitude.__init__() tried to import non-existent method:

from pymagnitude.converter_shared import convert as convert_vector_file  # noqa

in __init__.py, line 345.
This causes an ImportError: cannot import name 'convert' to be raised when trying to initialize using non-magnitude files.

A possible fix:

from pymagnitude.converter import convert as convert_vector_file  # noqa

Can't install magnitude with Python 3.7

I am trying to install the package in a Python 3.7 environment, but it looks like there is no wheel available:

$ pip install -v pymagnitude
Created temporary directory: /private/var/folders/63/n7b6d_wd4pq_ss3xw0_7mkxmrfmxcs/T/pip-ephem-wheel-cache-eo49q33x
Created temporary directory: /private/var/folders/63/n7b6d_wd4pq_ss3xw0_7mkxmrfmxcs/T/pip-req-tracker-62y2pf1i
Created requirements tracker '/private/var/folders/63/n7b6d_wd4pq_ss3xw0_7mkxmrfmxcs/T/pip-req-tracker-62y2pf1i'
Created temporary directory: /private/var/folders/63/n7b6d_wd4pq_ss3xw0_7mkxmrfmxcs/T/pip-install-zi8viiwh
Collecting pymagnitude
  1 location(s) to search for versions of pymagnitude:
  * https://pypi.org/simple/pymagnitude/
  Getting page https://pypi.org/simple/pymagnitude/
  Looking up "https://pypi.org/simple/pymagnitude/" in the cache
  Request header has "max_age" as 0, cache bypassed
  Starting new HTTPS connection (1): pypi.org:443
  https://pypi.org:443 "GET /simple/pymagnitude/ HTTP/1.1" 304 0
  Analyzing links from page https://pypi.org/simple/pymagnitude/
    Found link https://files.pythonhosted.org/packages/5f/09/ea2d04ee4ff25131d0fe2d797fbb6f0ddd031dc17c2a57a3965d77d1af7b/pymagnitude-0.0.17.tar.gz#sha256=c46741df990eb8daddedd8397e290693a59c89aad4d8ec18eda51885317b55c5 (from https://pypi.org/simple/pymagnitude/), version: 0.0.17
    Found link https://files.pythonhosted.org/packages/00/cb/d1158d99115c81c761703612d6b1670220d17084ad2dca946cdd22f1c41e/pymagnitude-0.0.19.tar.gz#sha256=8449969012a5ec49d61e90edd765199eddd04ec25ac98d33704484bb3f605b71 (from https://pypi.org/simple/pymagnitude/), version: 0.0.19
    Found link https://files.pythonhosted.org/packages/3f/fd/a9559be21ff253be71683b6d630e019a6a3a2b7c3297dce381bdd942f7a8/pymagnitude-0.0.20.tar.gz#sha256=04aeef7cc365bbdb3c451760f1f3d88818a6d8a2638ec14069bbaf1543584db9 (from https://pypi.org/simple/pymagnitude/), version: 0.0.20
    Found link https://files.pythonhosted.org/packages/9e/c9/c8d5d48240705d7395fe554b0c2cfd8fe4a45c9a3bceb283125fcc764c16/pymagnitude-0.0.21.tar.gz#sha256=bed3f35636a2a8de69e31eacfe8a9267e2eb815b83b8b181bbd91984fe1c30de (from https://pypi.org/simple/pymagnitude/), version: 0.0.21
    Found link https://files.pythonhosted.org/packages/92/e5/3da5d95b3bb43d8720be5ad626af0886a9109239d08a89e1656573942fcb/pymagnitude-0.1.0.tar.gz#sha256=9c8dc66d2a81c61bedfbfa687718bb0160ec891a5a0e86571cc1fdb7da1b66f4 (from https://pypi.org/simple/pymagnitude/), version: 0.1.0
    Found link https://files.pythonhosted.org/packages/77/1e/5d0275061319b23014e961588c7bf5d8a26f46dc1eaba124217e735d003b/pymagnitude-0.1.1.tar.gz#sha256=60be91b787c066472e997ca1553881dfc266de150df7f9bbbd22fc73c080cee9 (from https://pypi.org/simple/pymagnitude/), version: 0.1.1
    Found link https://files.pythonhosted.org/packages/2b/f2/156255688ca314cee34a7c4aadc7b878618d3424635adf7223edca5233a1/pymagnitude-0.1.2.tar.gz#sha256=c261f7e46ae202687431c7f1ea7263a96951c3d9b64d3d68fd7f05270183ae13 (from https://pypi.org/simple/pymagnitude/), version: 0.1.2
    Found link https://files.pythonhosted.org/packages/df/10/585dc445d808f8477c886e9dd55f2a415ccc0c56fc6bdf0844c72e0dea4a/pymagnitude-0.1.3.tar.gz#sha256=04236dbc8892e91b5cc4cce408e0831a4870d082ba7c056dfc60c02e75dfd58b (from https://pypi.org/simple/pymagnitude/), version: 0.1.3
    Found link https://files.pythonhosted.org/packages/4e/f0/1c64431555aa6ab1471ba81991903104a75ac2216419a1d7f98b9d2bfd67/pymagnitude-0.1.4.tar.gz#sha256=2a85592ebb34f012d84945fe9d86c380e93785ca1d7c2faf585d91113d3622db (from https://pypi.org/simple/pymagnitude/), version: 0.1.4
    Found link https://files.pythonhosted.org/packages/34/03/214bfce14844fe1e07e7e935eb69d7722a6a81c03091e7149bc43266e840/pymagnitude-0.1.5.tar.gz#sha256=3949704888c3691634b161c0d29bd49374b01ae32cced0a66da72fb89bde5405 (from https://pypi.org/simple/pymagnitude/), version: 0.1.5
    Found link https://files.pythonhosted.org/packages/72/7f/e97e9665a6d4ac916c2e7de1c0da49e4b9d3990b180879ff954b9f62cb92/pymagnitude-0.1.6.tar.gz#sha256=a208a49d0697e8ae49702e2aeff049fdb32398660cd7e05dbe992e13e4d6ec2e (from https://pypi.org/simple/pymagnitude/), version: 0.1.6
    Found link https://files.pythonhosted.org/packages/17/a2/6d2ed2b1c5b26afecddb0912e8f2a07b2e1a116d649ab8cbb4178be46073/pymagnitude-0.1.7.tar.gz#sha256=05c7c4f0c4d69cd0b7bec553492bedf9267e44dba92530357acaa355d69ba74f (from https://pypi.org/simple/pymagnitude/), version: 0.1.7
    Found link https://files.pythonhosted.org/packages/c5/2d/7c80a51a3615fc4bf836bd6c46457831fb7c55d870104d06cbf1e70ee6cd/pymagnitude-0.1.8.tar.gz#sha256=a6360ad822d133af76afa15498397a462503f2b27f9f8aa82c74f98c4bd1fdf1 (from https://pypi.org/simple/pymagnitude/), version: 0.1.8
    Found link https://files.pythonhosted.org/packages/5f/56/e4841dd7982b7ae15cf3d93f57c1194043b22085d3fb03bc01d96273c4a0/pymagnitude-0.1.9.tar.gz#sha256=1bc0d9647ad8248cfef1654bd28111cbaf6a80ab6aa93dca5baed2fd51ba5419 (from https://pypi.org/simple/pymagnitude/), version: 0.1.9
    Found link https://files.pythonhosted.org/packages/92/09/1bf024add287524bd97a9dd25e7e0aef03d4e55acbd40bfd80db9735d5ca/pymagnitude-0.1.12.tar.gz#sha256=67c0eeee8485ef83139d1c536328b6c0c306ed3e534137d599b526577c54c320 (from https://pypi.org/simple/pymagnitude/), version: 0.1.12
    Found link https://files.pythonhosted.org/packages/09/53/59bd19bc5fe7bffd3faf9045bb35dde7849b17d09a6813d576363cee0c53/pymagnitude-0.1.13.tar.gz#sha256=9a16e79a9b55aa5e99e0094c1253801f577a76c93d8454d54f1d7fd5506699fa (from https://pypi.org/simple/pymagnitude/), version: 0.1.13
    Found link https://files.pythonhosted.org/packages/74/b6/399f762f1433eba552545cc453cafe3cd07c33710b5f079dc24d1daa5679/pymagnitude-0.1.14.tar.gz#sha256=154c164502be664df7b293ba869631910d968ea9316b54a29b1dad878b2ce24a (from https://pypi.org/simple/pymagnitude/), version: 0.1.14
    Found link https://files.pythonhosted.org/packages/85/bc/b09716b1743de9e9705f6570aad672aef035c952c499ce1fa118536bff75/pymagnitude-0.1.15.tar.gz#sha256=7b6419f63d6d260420d745045f306c8cdd6c9a114f668ef6c10f3dac590dd998 (from https://pypi.org/simple/pymagnitude/), version: 0.1.15
    Found link https://files.pythonhosted.org/packages/bd/b7/a94a6f89fa7959ec6c9f2c8ceb8747a8eff376570ad2c190534c66d8029e/pymagnitude-0.1.16.tar.gz#sha256=e112c6cf63078ebed0d19ff7843ed9a3e37260db92010e1dc7f66668bb4b5b96 (from https://pypi.org/simple/pymagnitude/), version: 0.1.16
    Found link https://files.pythonhosted.org/packages/2b/cf/43864756a75dd8282743e3f4590c60c224972a6861189aebdf08747b414f/pymagnitude-0.1.17.tar.gz#sha256=8e435df85c3da34c492b195c4c4246970d1e51bb954b18d6032dc91bd3270f53 (from https://pypi.org/simple/pymagnitude/), version: 0.1.17
    Found link https://files.pythonhosted.org/packages/bf/a7/6023e66d4c7f789ab64d43b230af25d563b37a3807abd0131a1574b5ca77/pymagnitude-0.1.18.tar.gz#sha256=4bb871df83251a1b3f3d941b604b750f455ec0449dee10c7cb4b087353182c8d (from https://pypi.org/simple/pymagnitude/), version: 0.1.18
    Found link https://files.pythonhosted.org/packages/f0/c1/5258c9c29ab9af2547e8efac82da0931f5e10bd8cc0ce9168b0ccef0d578/pymagnitude-0.1.19.tar.gz#sha256=d53d1dd054ac736a90d37eb728dc1af314a206fbaab0bc40b80e1ea59be51f4c (from https://pypi.org/simple/pymagnitude/), version: 0.1.19
    Found link https://files.pythonhosted.org/packages/5d/fb/8823951b85ad88c22a05104bce272a2441479254fea1078f5ea3acb53c39/pymagnitude-0.1.20.tar.gz#sha256=fc176787175ed92f1c794265e959bf3cc73e8ed596921a14ed0ed0a3953804ce (from https://pypi.org/simple/pymagnitude/), version: 0.1.20
    Found link https://files.pythonhosted.org/packages/75/fa/f165b31dde7bb87f70e3c7ad3a167dfb8350f7ae15c95e8593e24d160095/pymagnitude-0.1.21.tar.gz#sha256=ddbf506cca2430fcb3bdee07a0224a10b62ff0479e323dbf43c32f21fb4c8f12 (from https://pypi.org/simple/pymagnitude/), version: 0.1.21
    Found link https://files.pythonhosted.org/packages/34/97/d5651fda971025f45976129d9082aba5eba465c2fe865eee9592dcb85b11/pymagnitude-0.1.22.tar.gz#sha256=dc49bfca621951abcb809fe20724bc2403ed69cf2bf68f86bf26f2e9778a3010 (from https://pypi.org/simple/pymagnitude/), version: 0.1.22
    Found link https://files.pythonhosted.org/packages/6d/64/56af6938d2d780d75320cdc210def56406e0c9a35b34ace85dcab9a48fdf/pymagnitude-0.1.23.tar.gz#sha256=c071fdf21e125361f313556602e320059946a969be130e0b23994790910b666a (from https://pypi.org/simple/pymagnitude/), version: 0.1.23
    Found link https://files.pythonhosted.org/packages/94/b7/d30d95c22996a4b1130dfa1141ba78c23aaa24d362ff9b0db2252079fcec/pymagnitude-0.1.24.tar.gz#sha256=b3eb68bdbbecc7a91aa02e5a11f6ab9efa315ff3f51347699525899dee36c91e (from https://pypi.org/simple/pymagnitude/), version: 0.1.24
    Found link https://files.pythonhosted.org/packages/f3/f9/f4249cdfd6812622d2a7ab170f71ea8247477b713d00d7ca0d5bacc293d5/pymagnitude-0.1.25.tar.gz#sha256=62ef14060eeb4f67b6e5e62835d6bd4fd1f73249d0d3619572e0d41d0be709b3 (from https://pypi.org/simple/pymagnitude/), version: 0.1.25
    Found link https://files.pythonhosted.org/packages/28/09/a38178b627f1b40bd740a2066c2ade6c0cd393b38a320db4dfd2cb7c4cc7/pymagnitude-0.1.26.tar.gz#sha256=a05032785465bcd2da7fb93d9fe094deac20ac08e38713b77c5b46246a13ce40 (from https://pypi.org/simple/pymagnitude/), version: 0.1.26
    Found link https://files.pythonhosted.org/packages/20/a7/60926d5fc5817793ee88feb4b0b663a87a08b813610ee81f104a6a36ccba/pymagnitude-0.1.27.tar.gz#sha256=c2895908b47b92bae7390dfd07ea4cd04bbb9f01f04609d26be8414a9c2b9c2f (from https://pypi.org/simple/pymagnitude/), version: 0.1.27
    Found link https://files.pythonhosted.org/packages/28/ac/a2abae2d7b50aede202a442193b4540da19d34b5c9b3693a5420afc75097/pymagnitude-0.1.28.tar.gz#sha256=af8a5c2b32cd69187d20f1769d47c8da5b48f35e29bf79e5f1a5a7a006e39ad0 (from https://pypi.org/simple/pymagnitude/), version: 0.1.28
    Found link https://files.pythonhosted.org/packages/9d/ce/8534bde704bc2940e486787c29fa7b1a041907adfc67d4163b5799871ceb/pymagnitude-0.1.29.tar.gz#sha256=60d51fd8eba43284e02ab58aba63587faaafe7661d275ed40cf96df55c27e058 (from https://pypi.org/simple/pymagnitude/), version: 0.1.29
    Found link https://files.pythonhosted.org/packages/f1/9a/414f8e34d420bf5cd8222e66dad023340cdc1ed246eee0734b95fe56e6cb/pymagnitude-0.1.30.tar.gz#sha256=409e67b4994d670c8cf5ea36019075f2e1e353f012692ced92bd11c5754189d8 (from https://pypi.org/simple/pymagnitude/), version: 0.1.30
    Found link https://files.pythonhosted.org/packages/96/ee/7652c20cf144b14ea15fff086ac0ba358032ed297a430565cd8adff27c0d/pymagnitude-0.1.31.tar.gz#sha256=8ebc6f5c1836164171bb97657fa9c6f88e133615311314d99ae90448072966f4 (from https://pypi.org/simple/pymagnitude/), version: 0.1.31
    Found link https://files.pythonhosted.org/packages/7b/f7/8e961fb57b6f206b2330128b36cf7dcce1ee41bb29b99fc041c7a612b6b8/pymagnitude-0.1.32.tar.gz#sha256=c85959e433439d2fb866383d717fcab642ef98116e1c3128b4049ad32e7826fd (from https://pypi.org/simple/pymagnitude/), version: 0.1.32
    Found link https://files.pythonhosted.org/packages/68/d6/c463ee0b6b44560918b28b7f4af15f071a18438233eef6db715dbadd9c49/pymagnitude-0.1.33.tar.gz#sha256=f3de0b01cf5dcf06cf45531c4f9ebf1ad23369cfc28127d26db2bf94e1955d66 (from https://pypi.org/simple/pymagnitude/), version: 0.1.33
    Found link https://files.pythonhosted.org/packages/9f/7a/568249b44ae4c60be75364abe802f879704afdecc7c4db219a70ee9c5e9b/pymagnitude-0.1.34.tar.gz#sha256=0dc92eab4d36dee8b46d7f54a16bb1e73abb59ac6ebe81490b441b76ee3a31a6 (from https://pypi.org/simple/pymagnitude/), version: 0.1.34
    Found link https://files.pythonhosted.org/packages/74/b9/ebe1fa6e19a820122601ece3aae08b302a4b6312d8c66b5fc917eacd5b60/pymagnitude-0.1.35.tar.gz#sha256=2d9f1df80714d548e044aa977adcf03b92b1042493b993e0e0de0d603ec5c4c6 (from https://pypi.org/simple/pymagnitude/), version: 0.1.35
    Found link https://files.pythonhosted.org/packages/00/c1/898ff27e00677844c46f1ba115dc73909855c8b48bac1285f391e2620597/pymagnitude-0.1.36.tar.gz#sha256=3eaa279449eefc0f57898822fc8566a2d9c4c5273446d004d1da6ec82d1a26f6 (from https://pypi.org/simple/pymagnitude/), version: 0.1.36
    Found link https://files.pythonhosted.org/packages/32/ba/4e192621a4673e9770a4c5bfed43dfb7e35952fd17709dc2874388c11b4e/pymagnitude-0.1.37.tar.gz#sha256=ebe92fc43c1de54aac8cb550e9832079a179494ef59c33073dfa0e4862ba0b86 (from https://pypi.org/simple/pymagnitude/), version: 0.1.37
    Found link https://files.pythonhosted.org/packages/92/cf/3cc8fe084e3245b8ed1b77266bd0355f20fe8c409f2bc91230925dacc4dc/pymagnitude-0.1.38.tar.gz#sha256=a36221c2e1eed6c7382abd537dbe0fec0f6f5979630c037550809b62196a7fc0 (from https://pypi.org/simple/pymagnitude/), version: 0.1.38
    Found link https://files.pythonhosted.org/packages/e3/fc/09b0c45ba2bc9b7f5e7671c300c1bb88a5b617286f8f8a9ebb69341187f9/pymagnitude-0.1.39.tar.gz#sha256=19f3608407c4312f7f777eb36decaa1c9b6bd2b6498440a2a080e014225c93d4 (from https://pypi.org/simple/pymagnitude/), version: 0.1.39
    Found link https://files.pythonhosted.org/packages/71/83/ab0d369b563b8c009d2a1d00b80cd3c762f635178235deb000553fdacc07/pymagnitude-0.1.40.tar.gz#sha256=71e26ed3462387310b98daf3eb8719b664f2471533c907fe742b8392941e9b7d (from https://pypi.org/simple/pymagnitude/), version: 0.1.40
    Found link https://files.pythonhosted.org/packages/05/d7/bc403a9594c511e32e5f0890f2caad1bf2059f3da4dd8ee9f5a991ba9ced/pymagnitude-0.1.41.tar.gz#sha256=087f6070d78552f1f818df6520e6a614f6b618c3b56d45df75b9551e5a8985d2 (from https://pypi.org/simple/pymagnitude/), version: 0.1.41
    Found link https://files.pythonhosted.org/packages/c0/c2/5e4026fb545c919eef259b1fd28ae8b2834f4cf426a19e923e6c22f6a262/pymagnitude-0.1.42.tar.gz#sha256=7bc87897b90837eac3928af1c7c306928fb6e66e3d2bb81911aef2ffc16571d6 (from https://pypi.org/simple/pymagnitude/), version: 0.1.42
    Found link https://files.pythonhosted.org/packages/5c/35/3cf5f27c58fe2609e4bcade0d54d19b677ef424191d7450ce0f7e718adf1/pymagnitude-0.1.43.tar.gz#sha256=0abaa85974d3f164940fd258ad8c5d6ae8c7cc079ca013784f7bb0c145b8ae60 (from https://pypi.org/simple/pymagnitude/), version: 0.1.43
    Found link https://files.pythonhosted.org/packages/83/20/ddb7409c180205094b2a042d76209a5db10055df96a0d13caacd767a22af/pymagnitude-0.1.44.tar.gz#sha256=ab19bfff030da3ea8a9f7f9d871589b99bc4656a77c56fe15be656f8d3a531c1 (from https://pypi.org/simple/pymagnitude/), version: 0.1.44
    Found link https://files.pythonhosted.org/packages/ad/d2/127f8a272c946d70b300ce392a152703098dffdffbdf0204f1b2b948b094/pymagnitude-0.1.45.tar.gz#sha256=ffd346aff8b9879045e861670a356b9d61743844c2e579fbc03e55add60ef817 (from https://pypi.org/simple/pymagnitude/), version: 0.1.45
    Found link https://files.pythonhosted.org/packages/93/b2/f953f1cd24619f37707364bd3388d5e785c68a18f529602f62c0dc10cdf0/pymagnitude-0.1.46.tar.gz#sha256=88d4fee6b0046b3cf0a7c4b94bcfdfdf0a6aa3c9eda52d3fa716c004033ec152 (from https://pypi.org/simple/pymagnitude/), version: 0.1.46
    Found link https://files.pythonhosted.org/packages/70/c7/5460e33ee13e336be3ef1ad394d4556a6570d1b33abf83ea41ebb106383a/pymagnitude-0.1.47.tar.gz#sha256=9339a6a1ece8e2db0412d4a70b67f60d3653ef6ca539082d815eb08a66549c05 (from https://pypi.org/simple/pymagnitude/), version: 0.1.47
    Found link https://files.pythonhosted.org/packages/1a/07/c3b8c598ff61a23526b22a4d4a73c3bee02e3bd463965ffabaa8a0d57c17/pymagnitude-0.1.48.tar.gz#sha256=3c6b5d89e14a48b6c9a4410f8ea06383eedb3497b551541baba2dcd7a07fae6b (from https://pypi.org/simple/pymagnitude/), version: 0.1.48
    Found link https://files.pythonhosted.org/packages/d3/cc/5de5242cd5144778ebf32553d16a1eaaf7baaebd41e31436e365f4d504c9/pymagnitude-0.1.49.tar.gz#sha256=95596d91cbef04725dd8a8d16090b495e6dca13d7ac66f9fa7713fec253017cf (from https://pypi.org/simple/pymagnitude/), version: 0.1.49
    Found link https://files.pythonhosted.org/packages/d5/8d/6c2d433f9afeba01296a69d414e7d11ce23e919a98e3aaae2833018dda77/pymagnitude-0.1.50.tar.gz#sha256=3b47034ca327b0b4e9cbbaa81455e319d361e6eca6f7ae3e4a4efdbd4cdb3409 (from https://pypi.org/simple/pymagnitude/), version: 0.1.50
    Found link https://files.pythonhosted.org/packages/c3/4e/7d1274a548a19705834e2886334bef96c3e838774f29aaddf011413b7ae1/pymagnitude-0.1.51.tar.gz#sha256=d6434ede3ffdc3107b80946456c101c251fafa261d09f3b02b90b966cbbc1d3f (from https://pypi.org/simple/pymagnitude/), version: 0.1.51
    Found link https://files.pythonhosted.org/packages/2e/89/6f6db5d7bc3cb9a021b9e7fec004482f28f4548a460524a7b22bea896cd3/pymagnitude-0.1.52.tar.gz#sha256=8ba67fe4922d4ece67aab6582a3c190d6ab3dc1e9fff08f7d951aa8500aae675 (from https://pypi.org/simple/pymagnitude/), version: 0.1.52
    Found link https://files.pythonhosted.org/packages/53/cb/73f4443760e9f011857422e75c6ac251a1afee6aaead3872d048c467a390/pymagnitude-0.1.53.tar.gz#sha256=3627dc1969bcd0742de36e78b14b54a149d27d6c55ff9ba297a5fbbf6e515a34 (from https://pypi.org/simple/pymagnitude/), version: 0.1.53
    Found link https://files.pythonhosted.org/packages/bb/8b/f73e74b956cac04c74def4bae1e6185d2b15b52944b1813b79e53525f77a/pymagnitude-0.1.55.tar.gz#sha256=310a03cb3052517f7d3b405726a9848e0a2a21439c265df02ab7cc30ad23862e (from https://pypi.org/simple/pymagnitude/), version: 0.1.55
    Found link https://files.pythonhosted.org/packages/62/cc/368a8e9cd82b6e75f5dacceda40a2a639dc52b40bfa4b9da0dfddce2dbff/pymagnitude-0.1.56.tar.gz#sha256=c614b804ce2d0607edc32a59e06708f96dc9c66f8bc826ef860758ab0134a356 (from https://pypi.org/simple/pymagnitude/), version: 0.1.56
    Found link https://files.pythonhosted.org/packages/6a/67/b3d016f816a6d9920c8b6953b89411ba7713896c30731bfb185a2309c199/pymagnitude-0.1.57.tar.gz#sha256=b3ff70a512b2659c7fa189062dce37fb300c3f390599efd59eadca846127e62b (from https://pypi.org/simple/pymagnitude/), version: 0.1.57
    Found link https://files.pythonhosted.org/packages/a5/6d/b8ad39226a1fb3ffa9af5bd386a65ae491e038f6f54ae0d968483645b36c/pymagnitude-0.1.58.tar.gz#sha256=aa20fea1d4af8c840696ebb7b8ee09c26b19aab3706be908599404fbcd768317 (from https://pypi.org/simple/pymagnitude/), version: 0.1.58
    Found link https://files.pythonhosted.org/packages/ee/de/bbb70d60dcf602dbae5048f0f903358cfd9f7046e00e745edadc30b1b7ce/pymagnitude-0.1.59.tar.gz#sha256=fcadae77ac6bd59384ac17748fe25d05ae31a9f9793dddf6f887aa236ced3907 (from https://pypi.org/simple/pymagnitude/), version: 0.1.59
    Found link https://files.pythonhosted.org/packages/46/c9/47441a58e0ef9d20aef55b2e673f9d9feb4d68d0fa883e6dc3f7a20e5c0f/pymagnitude-0.1.60.tar.gz#sha256=c5b020ab57041a02cd5889d161945a15523a54fd84d10a88bd2e3762170a986e (from https://pypi.org/simple/pymagnitude/), version: 0.1.60
    Found link https://files.pythonhosted.org/packages/46/91/e2b316587f827dda29a7e96a90e1ee5933d7b531055b6c3e12c6440d79c3/pymagnitude-0.1.62.tar.gz#sha256=236ab335077566e0aff2d9320cd20988d857d27f547979a79132ae6778a0868e (from https://pypi.org/simple/pymagnitude/), version: 0.1.62
    Found link https://files.pythonhosted.org/packages/06/be/3109ec44f4310e1b0140f80addf25c73f176c35ef2ad96492d176873b446/pymagnitude-0.1.63.tar.gz#sha256=0c62f2dfd8b4b9ac71c700b6cdfcd052789de8f5ccac883266b1e09509c4ec22 (from https://pypi.org/simple/pymagnitude/), version: 0.1.63
    Found link https://files.pythonhosted.org/packages/ab/9f/cf967306e43f6c85f708f233c28fc9a79cb28dfd66c30e484fe02306cfb4/pymagnitude-0.1.64.tar.gz#sha256=51c000fef64dbeaafaa7d8a4fe5b182edefdfaa14dcb8613712040e01ab0daf6 (from https://pypi.org/simple/pymagnitude/), version: 0.1.64
    Found link https://files.pythonhosted.org/packages/e9/49/221306a30368d9797a2dc986a3c0f73157f4d9a72a268a997c5a4919ee97/pymagnitude-0.1.65.tar.gz#sha256=ec7f7c4ed6709b0e2116b9a6761b434967aabdcf1527ec5684119bcf8723fb70 (from https://pypi.org/simple/pymagnitude/), version: 0.1.65
    Found link https://files.pythonhosted.org/packages/a4/ad/afa5bdb223dbb0b2311f900e8205cbf05678264234210178e3bf0c827b4a/pymagnitude-0.1.66.tar.gz#sha256=509ebb04e9d21540ee6fba52ea2d4f6eeb635f098e9a329d1e992cd69db91cac (from https://pypi.org/simple/pymagnitude/), version: 0.1.66
    Found link https://files.pythonhosted.org/packages/f9/b6/d9a027422a8042fd579b497b3af9dd336c6a163442a5bf7b4c1cbadab07c/pymagnitude-0.1.67.tar.gz#sha256=be8cc81ef19be52c0f42be19b9f5b3c19fbbb98ea2c734cc0cca871cbdec679a (from https://pypi.org/simple/pymagnitude/), version: 0.1.67
    Found link https://files.pythonhosted.org/packages/05/d4/fabd773348f46ff4064ae9541c8b89e5fd7b167b0e16ccdb2a7e1eb0c9b4/pymagnitude-0.1.68.tar.gz#sha256=e54c13f06f5361f428450f2e9200a11d5e1cc8502050b163e0451c7a7c9b862b (from https://pypi.org/simple/pymagnitude/), version: 0.1.68
    Found link https://files.pythonhosted.org/packages/86/89/cecdde9bc1c0bf37405b167a6079c2f7fe1d1c00d0bc3be5c8cc9c7ad76a/pymagnitude-0.1.69.tar.gz#sha256=cd2f40ca0d8c3043cba21febe91838ff4ccf358d2d84e0fe66d5819182c4213f (from https://pypi.org/simple/pymagnitude/), version: 0.1.69
    Found link https://files.pythonhosted.org/packages/ca/e0/c6473c79aebf6dd1e05418edc41eb6e30c524ebcb5089085cee24c479618/pymagnitude-0.1.70.tar.gz#sha256=e7feb50b969704f8e507b3c2ad37ee3846079784a7c98ea8017dab25c9a07c03 (from https://pypi.org/simple/pymagnitude/), version: 0.1.70
    Found link https://files.pythonhosted.org/packages/72/db/030bd8e678b226ff5f1ee9b7ea07d56b60b9c4662d6c469586b19701807c/pymagnitude-0.1.71.tar.gz#sha256=9375b97f84beba0684e10e66f3a05333dc7fe8645f8f4de6ecf9eda52ef610a4 (from https://pypi.org/simple/pymagnitude/), version: 0.1.71
    Found link https://files.pythonhosted.org/packages/dd/5f/e7792f1ed21551e060259c681b3265e6892ec4c925059d4f7631cfeec6e8/pymagnitude-0.1.72.tar.gz#sha256=0b69391b87c6855cff397fa93e2bcebd0a650d03b64cfa45de9a1b8c57634287 (from https://pypi.org/simple/pymagnitude/), version: 0.1.72
    Found link https://files.pythonhosted.org/packages/15/58/f486236486423c1c9971cee46f3bf6403cb0f124742202f0561e9e5cfc7e/pymagnitude-0.1.73.tar.gz#sha256=d5fdd2f3fed30623b5175d268024d5f2107fcd51f2137fa32114934316a5c544 (from https://pypi.org/simple/pymagnitude/), version: 0.1.73
    Found link https://files.pythonhosted.org/packages/33/a9/ce32389eff352ad1b9948ba2009380d194fcadd8df77f5a27f7c271c968b/pymagnitude-0.1.74.tar.gz#sha256=2f654fc0b002b6e288ec599a5f06e160b61ad6045916dd32863c332ecb60095e (from https://pypi.org/simple/pymagnitude/), version: 0.1.74
    Found link https://files.pythonhosted.org/packages/e1/21/196efec9eaddce64fd6ea041caff01ccb96bd5830c6bd85e1f029c6e1043/pymagnitude-0.1.76.tar.gz#sha256=a23e79b0aef1cb0d6f5ff9f0003982d1ea53718c12db5acdd3cce8251677f54b (from https://pypi.org/simple/pymagnitude/), version: 0.1.76
    Found link https://files.pythonhosted.org/packages/0c/ce/0346291728e71327b96189c18c2e15d72206569325b54edadf728b7bf4ac/pymagnitude-0.1.77.tar.gz#sha256=299e5802abeb12949cbf335b53996bfd975f5a1acc97ad356079ac17887fd279 (from https://pypi.org/simple/pymagnitude/), version: 0.1.77
    Found link https://files.pythonhosted.org/packages/ad/b6/0abbec6818d5d635fddf2f25bf02ffeadf0f21dc2208d3c72b119d06c149/pymagnitude-0.1.78.tar.gz#sha256=640779bb6d413b17d4a96aeb31501cee4b745f6582075775561d7434350da48b (from https://pypi.org/simple/pymagnitude/), version: 0.1.78
    Found link https://files.pythonhosted.org/packages/d7/06/81aad356903eb91c0173635313e7e37426777e8a13e33a8f8caf72995a67/pymagnitude-0.1.79.tar.gz#sha256=52e8b1a574c03c0d17771325d87543fab6ea15131a2596d4c8538e7834ca8f98 (from https://pypi.org/simple/pymagnitude/), version: 0.1.79
    Found link https://files.pythonhosted.org/packages/f7/a1/48949ebe7dd0f140e09b462b470a8f6b56ad4e626bfc8c2a63ddb0cba708/pymagnitude-0.1.80.tar.gz#sha256=eb3937e7e537504b6aaf00b03d5c70b35f7e6ca6ede20b3031ad711d99f48189 (from https://pypi.org/simple/pymagnitude/), version: 0.1.80
    Found link https://files.pythonhosted.org/packages/1f/72/97de3f1aeffa972cdb8d5e507c205fea6548372e084ca9d781337ef4fc9e/pymagnitude-0.1.81.tar.gz#sha256=ab38475c888bc682895a24cafcb7ebc059d48d85b63c2a786ddd3f18aa4f2f64 (from https://pypi.org/simple/pymagnitude/), version: 0.1.81
    Found link https://files.pythonhosted.org/packages/e1/77/88faed06d39286c6aac55a761a94e2a25754f5a628ea41a644247fe2a89e/pymagnitude-0.1.82.tar.gz#sha256=6886a96d80e675978665176f860826e5a5a8bb52e62c10ce80589050b4953f28 (from https://pypi.org/simple/pymagnitude/), version: 0.1.82
    Found link https://files.pythonhosted.org/packages/c2/7b/70554a7644f5e64569a9d0dfa5a208be584b3fcfaca9c0488808234882b1/pymagnitude-0.1.83.tar.gz#sha256=9dce51e91d80b63ef39603d20eff5747aa47ba09bc1637f47257287f78d83aec (from https://pypi.org/simple/pymagnitude/), version: 0.1.83
    Found link https://files.pythonhosted.org/packages/c2/c7/e15aa9cad87e12b872d94cdfec60641041d604b4b20dfd3cc7069c98d1d9/pymagnitude-0.1.84.tar.gz#sha256=2acd4d2c66d10247d38c529ef7de2ba9e5538eb9ba013940cedfd0d0532dc125 (from https://pypi.org/simple/pymagnitude/), version: 0.1.84
    Found link https://files.pythonhosted.org/packages/42/6c/35894b8db09e619052be2a6f4a1e30269cf7f287659aa6ffd59b68ee9976/pymagnitude-0.1.85.tar.gz#sha256=81f2cfc0064d101b1416bc2d583b24e948ced279c079dc19dfee56e81239c134 (from https://pypi.org/simple/pymagnitude/), version: 0.1.85
    Found link https://files.pythonhosted.org/packages/fc/8a/ee98858549e4aea5a28b9391edbe1f4e9cc1e8d5cfd9dc71bd6ce6f5a391/pymagnitude-0.1.86.tar.gz#sha256=a48fa8ee7fb9e3cee5253bcf4c8589dbaeb61cddc3d9d7718d5683ce5758d570 (from https://pypi.org/simple/pymagnitude/), version: 0.1.86
    Found link https://files.pythonhosted.org/packages/59/c8/21443c02dedc3d7b67fe0da884cdc37fcecf295f49e4100e1126f5fc0edc/pymagnitude-0.1.87.tar.gz#sha256=6e6d1a9a22dd0448b5f5af3e4fee2eabcf502670e9e9868f6f0c6b422a066a84 (from https://pypi.org/simple/pymagnitude/), version: 0.1.87
    Found link https://files.pythonhosted.org/packages/92/62/98098665e2f4ffc76334d034684400073d3ca154d4126bf35e0558e2e793/pymagnitude-0.1.88.tar.gz#sha256=3bc70821971d0cc1148438ea249c64a1dc5141cf61aa5e86f79db4219a158973 (from https://pypi.org/simple/pymagnitude/), version: 0.1.88
    Found link https://files.pythonhosted.org/packages/6c/aa/8a0680db9e18cbae77710f33a258870ac605c8a9b358e388436f3c21e904/pymagnitude-0.1.93.tar.gz#sha256=d98d913ecacba182f6cfa559ce67d94b2c45d6e2be8e7e750fca485bb417f575 (from https://pypi.org/simple/pymagnitude/), version: 0.1.93
    Found link https://files.pythonhosted.org/packages/01/13/804dcfbe12a777f1e2d7dd7bd7cfc4e2803bf086e89bcf3429ba96e6ee65/pymagnitude-0.1.94.tar.gz#sha256=0b8fe9020257695271d49dd27cb01c97a41403ed437badc06e7991f2b7ce38fb (from https://pypi.org/simple/pymagnitude/), version: 0.1.94
    Found link https://files.pythonhosted.org/packages/a5/d7/583c70a06dc7f12292345e41aec059655eb7edfc86310b559b244b2c5f52/pymagnitude-0.1.95.tar.gz#sha256=956f6108f906451f0484a2d569574b8a39ca568563c1e5d54921bbd5938aa79a (from https://pypi.org/simple/pymagnitude/), version: 0.1.95
    Found link https://files.pythonhosted.org/packages/b2/b6/5fe1b38617e97319bd108bfe0c88a12126a1ce00f5a67689d1b9075efd28/pymagnitude-0.1.96.tar.gz#sha256=cb31417da6a1b6cecc4d2726d350a86dc74c8cf8a604538577c9703b79bd72b5 (from https://pypi.org/simple/pymagnitude/), version: 0.1.96
    Found link https://files.pythonhosted.org/packages/61/bf/ebdccde3d4dd68d059fe0ff740d280478886a7d6a2deecd1cf0656212b9e/pymagnitude-0.1.98.tar.gz#sha256=5321ea915d3adc3b255497784aad697d56a7383a6b7aeef7118638544353d64d (from https://pypi.org/simple/pymagnitude/), version: 0.1.98
    Found link https://files.pythonhosted.org/packages/cd/35/46835f8b11d6a7d03d3db3b2e88b28f6363f6ee752f2a31dbeddcfb08b58/pymagnitude-0.1.99.tar.gz#sha256=058379cb4915de345775c3880f37884c4ad4021256eb48dcc892de1ff803d54f (from https://pypi.org/simple/pymagnitude/), version: 0.1.99
    Found link https://files.pythonhosted.org/packages/b8/4e/1eada32c0f72dfd880a58de3dcaec7f92a7e131cf3162562f9ccee7d5b74/pymagnitude-0.1.100.tar.gz#sha256=391224db7b202c80d1c94ebcd71f2cb4fdd612d9d7fd9ea1351843f694a7838b (from https://pypi.org/simple/pymagnitude/), version: 0.1.100
    Found link https://files.pythonhosted.org/packages/96/22/4b4abcf6d2afa418f235e5267f92391c6b36d7d2322fbcc61f2eb6b95ff1/pymagnitude-0.1.101.tar.gz#sha256=b5c2615d1ff484d6d397ae65178897ece0512238dfbf2301bef3368f4c49b149 (from https://pypi.org/simple/pymagnitude/), version: 0.1.101
    Found link https://files.pythonhosted.org/packages/cd/ff/5bc9b6c54a68a18212b3ad69b6504aacaf370e2711afaf9253bec00d3d0a/pymagnitude-0.1.102.tar.gz#sha256=2e74d5cb45e9a5b6b238b30d98f72aaa85fc9a99d98e91a9dbfe628150accbf8 (from https://pypi.org/simple/pymagnitude/), version: 0.1.102
    Found link https://files.pythonhosted.org/packages/74/d1/7182d4383559b5cf898e6dcbc24a57ae8276d8e85c0b92056a02abdf4115/pymagnitude-0.1.103.tar.gz#sha256=16e9cd72486f6ced7eea07b23a1c56e9dd0d3f111cb02a3be7aecc38c71e02a6 (from https://pypi.org/simple/pymagnitude/), version: 0.1.103
    Found link https://files.pythonhosted.org/packages/c3/bf/db8a9473c8407c3a8f564c0d3496156e0a352ad27c4bb8f73ffd37097939/pymagnitude-0.1.104.tar.gz#sha256=214bdde1f90d6ac8886021a7e43fb90e48dcbf6a034ad3aeec68eb951e3200c2 (from https://pypi.org/simple/pymagnitude/), version: 0.1.104
    Found link https://files.pythonhosted.org/packages/d2/83/1238c3413d8e7df3d3ad9f79bc6fe38dccd0943d7b327468412f02a9fd5a/pymagnitude-0.1.105.tar.gz#sha256=404d51c7c911cc5f7f1e8345d98b41fdf6bd654c784d507386c1b5898e7d0091 (from https://pypi.org/simple/pymagnitude/), version: 0.1.105
    Found link https://files.pythonhosted.org/packages/a6/39/ee552504426c0396f82353d5741453f2b7be909da42f377a9537b6ba10ec/pymagnitude-0.1.106.tar.gz#sha256=a369e076b3c349d9e4d64b8b3907fd68b9d08762e9d6d53dd9790d049cadb7b1 (from https://pypi.org/simple/pymagnitude/), version: 0.1.106
    Found link https://files.pythonhosted.org/packages/8e/7a/1a420f0519c1fb3bfa3cafaf23b64bb2d1b81d42d0c4922f4230bf8ed17a/pymagnitude-0.1.107.tar.gz#sha256=a9b7c2d84e915389f291c99875a1dfbd57b977c0705787b0190830f8ef6779fa (from https://pypi.org/simple/pymagnitude/), version: 0.1.107
    Found link https://files.pythonhosted.org/packages/d1/e4/ae5242e97e39402026a0027c0d8bb5fb4761a57e033c494a749e0cfd94cb/pymagnitude-0.1.108.tar.gz#sha256=5b3f3090516dec1b6fb9a81f837b2c68b3145dd1cdf822bc4f8180e338a5ee27 (from https://pypi.org/simple/pymagnitude/), version: 0.1.108
    Found link https://files.pythonhosted.org/packages/f7/e5/eea31e53bb0df81bacc78dc5119fded20280fa601454aa5d03b9a191c1b7/pymagnitude-0.1.109.tar.gz#sha256=6594c1a06001a4d31693f48fb2e32c5e2c6b567f1a45a6059a53414d333f0113 (from https://pypi.org/simple/pymagnitude/), version: 0.1.109
    Found link https://files.pythonhosted.org/packages/a8/c6/8a32824e7b5892a4f389ac0d397b6e6dbed3a129d144f711918b3f62ef70/pymagnitude-0.1.111.tar.gz#sha256=779358ec06a93c1cd14798acf4afa23885993fccc81ed809b78f1372427a8c2f (from https://pypi.org/simple/pymagnitude/), version: 0.1.111
    Found link https://files.pythonhosted.org/packages/df/b6/843d8aeca327b88d55ab924714f5aba4a99b56f22721a56d9d2aa49733b9/pymagnitude-0.1.112.tar.gz#sha256=0f130f6614beaa7dd2160f012d5b1b1859e64c3e81d606c002e74d5ff35bf01b (from https://pypi.org/simple/pymagnitude/), version: 0.1.112
    Found link https://files.pythonhosted.org/packages/80/cd/ac42c271943611c3416fa0311420c882c32da7afcc10fa651431b1a91528/pymagnitude-0.1.113.tar.gz#sha256=354a88082294816a61fced8c8aa0b82bc11bf85ab48091b1c314c7d8c0e4624f (from https://pypi.org/simple/pymagnitude/), version: 0.1.113
    Found link https://files.pythonhosted.org/packages/85/98/ae02dc3fd2fd003ac1d80f4376f673d77dc4872a1b95fb2e9137273d06d0/pymagnitude-0.1.114.tar.gz#sha256=b522fd3e93228642751dd1fc8e55e2bcf8b9598efd60b20d39cd8829e7ed40f7 (from https://pypi.org/simple/pymagnitude/), version: 0.1.114
    Found link https://files.pythonhosted.org/packages/5d/f7/3705eb77e951c8f2bd9d3dfef6c025edb1b347c644fb26c79cf9b727672f/pymagnitude-0.1.115.tar.gz#sha256=b9687a63891f5acc842db58d3a4493526735ef61d95fe9e3a4e10a0d4e50826e (from https://pypi.org/simple/pymagnitude/), version: 0.1.115
    Found link https://files.pythonhosted.org/packages/f8/4b/a195b5deaae347f6b9caf31e6efdb4bbf27eb866e6871d390e0f5fc4bbe0/pymagnitude-0.1.116.tar.gz#sha256=b2ea5f08afc55c2fd735eaab2200ab954efa9180a1ce9cc181f17b5014b75b6b (from https://pypi.org/simple/pymagnitude/), version: 0.1.116
    Found link https://files.pythonhosted.org/packages/8d/40/01152525862d9f33a94fb145bac0cc799fa10edd9fffb87f480fe9625c49/pymagnitude-0.1.117.tar.gz#sha256=d19fa033aa71ef9515b9a93bcbf6f33f7b7d762cfa0f34ac7ee74a9a4b133ceb (from https://pypi.org/simple/pymagnitude/), version: 0.1.117
    Found link https://files.pythonhosted.org/packages/02/ef/2a500484c5bcd26abf8cb01cde662215217c39bc54a4cb10b7fa687b9148/pymagnitude-0.1.118.tar.gz#sha256=b16ffe6c953bb5812774bc320f6da605ce6f824770d66a4ed3f4ec1488fcec61 (from https://pypi.org/simple/pymagnitude/), version: 0.1.118
    Found link https://files.pythonhosted.org/packages/11/40/7620f7d23862fb53ea6acffcdcf11abcb0f2eae32f4e6b9bdecf74519e8a/pymagnitude-0.1.119.tar.gz#sha256=4471de6837becc1456fb98f874d2bcfcfd63e7cec0f879cb3ab11d3dc3b30612 (from https://pypi.org/simple/pymagnitude/), version: 0.1.119
    Found link https://files.pythonhosted.org/packages/0a/a3/b9a34d22ed8c0ed59b00ff55092129641cdfa09d82f9abdc5088051a5b0c/pymagnitude-0.1.120.tar.gz#sha256=0a59df1151e2859c54a4db1f6c2dc414d666e4099724516af87d6cc4f4cbe276 (from https://pypi.org/simple/pymagnitude/), version: 0.1.120
  Using version 0.1.120 (newest of versions: 0.0.17, 0.0.19, 0.0.20, 0.0.21, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.1.12, 0.1.13, 0.1.14, 0.1.15, 0.1.16, 0.1.17, 0.1.18, 0.1.19, 0.1.20, 0.1.21, 0.1.22, 0.1.23, 0.1.24, 0.1.25, 0.1.26, 0.1.27, 0.1.28, 0.1.29, 0.1.30, 0.1.31, 0.1.32, 0.1.33, 0.1.34, 0.1.35, 0.1.36, 0.1.37, 0.1.38, 0.1.39, 0.1.40, 0.1.41, 0.1.42, 0.1.43, 0.1.44, 0.1.45, 0.1.46, 0.1.47, 0.1.48, 0.1.49, 0.1.50, 0.1.51, 0.1.52, 0.1.53, 0.1.55, 0.1.56, 0.1.57, 0.1.58, 0.1.59, 0.1.60, 0.1.62, 0.1.63, 0.1.64, 0.1.65, 0.1.66, 0.1.67, 0.1.68, 0.1.69, 0.1.70, 0.1.71, 0.1.72, 0.1.73, 0.1.74, 0.1.76, 0.1.77, 0.1.78, 0.1.79, 0.1.80, 0.1.81, 0.1.82, 0.1.83, 0.1.84, 0.1.85, 0.1.86, 0.1.87, 0.1.88, 0.1.93, 0.1.94, 0.1.95, 0.1.96, 0.1.98, 0.1.99, 0.1.100, 0.1.101, 0.1.102, 0.1.103, 0.1.104, 0.1.105, 0.1.106, 0.1.107, 0.1.108, 0.1.109, 0.1.111, 0.1.112, 0.1.113, 0.1.114, 0.1.115, 0.1.116, 0.1.117, 0.1.118, 0.1.119, 0.1.120)
  Created temporary directory: /private/var/folders/63/n7b6d_wd4pq_ss3xw0_7mkxmrfmxcs/T/pip-unpack-a78af61l
  Looking up "https://files.pythonhosted.org/packages/0a/a3/b9a34d22ed8c0ed59b00ff55092129641cdfa09d82f9abdc5088051a5b0c/pymagnitude-0.1.120.tar.gz" in the cache
  Current age based on date: 1122
  Ignoring unknown cache-control directive: immutable
  Freshness lifetime from max-age: 365000000
  The response is "fresh", returning cached response
  365000000 > 1122
  Using cached https://files.pythonhosted.org/packages/0a/a3/b9a34d22ed8c0ed59b00ff55092129641cdfa09d82f9abdc5088051a5b0c/pymagnitude-0.1.120.tar.gz
  Downloading from URL https://files.pythonhosted.org/packages/0a/a3/b9a34d22ed8c0ed59b00ff55092129641cdfa09d82f9abdc5088051a5b0c/pymagnitude-0.1.120.tar.gz#sha256=0a59df1151e2859c54a4db1f6c2dc414d666e4099724516af87d6cc4f4cbe276 (from https://pypi.org/simple/pymagnitude/)
  Added pymagnitude from https://files.pythonhosted.org/packages/0a/a3/b9a34d22ed8c0ed59b00ff55092129641cdfa09d82f9abdc5088051a5b0c/pymagnitude-0.1.120.tar.gz#sha256=0a59df1151e2859c54a4db1f6c2dc414d666e4099724516af87d6cc4f4cbe276 to build tracker '/private/var/folders/63/n7b6d_wd4pq_ss3xw0_7mkxmrfmxcs/T/pip-req-tracker-62y2pf1i'
  Running setup.py (path:/private/var/folders/63/n7b6d_wd4pq_ss3xw0_7mkxmrfmxcs/T/pip-install-zi8viiwh/pymagnitude/setup.py) egg_info for package pymagnitude
    Running command python setup.py egg_info
    Downloading and installing wheel (if it exists)...
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_14_x86_64.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_14_intel.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_13_x86_64.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_13_intel.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_12_x86_64.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_12_intel.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_11_x86_64.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_11_intel.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_10_x86_64.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_10_intel.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_9_x86_64.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_9_intel.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_8_x86_64.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_8_intel.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_7_x86_64.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_7_intel.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_6_x86_64.whl
    FAILED
    Trying... http://s3.amazonaws.com/magnitude.plasticity.ai/wheelhouse/pymagnitude-0.1.120-cp37-cp37m-macosx_10_6_intel.whl
...

Convert numberbatch text file to magnitude format

I am trying to convert a "numberbatch-en.txt" pre-trained word vector file to magnitude format using the converter but it's throwing ValueError. Is it only useful for converting glove & word2vec pre-trained files?
https://github.com/commonsense/conceptnet-numberbatch

@AjayP13 @acsands13 Please look into this issue.Thanks!

Bad Magnitude File

The file
English Wikipedia 2017 + subword 16B
heavy model

http://magnitude.plasticity.ai/fasttext+approx/wiki-news-300d-1M-subword.magnitude

seems to have an empty magnitude table.
When i execute len(vectors) it gives back 0
Also in the program" DB Browser for SQLite" the table is empty.

The other file
http://magnitude.plasticity.ai/fasttext+approx/wiki-news-300d-1M.magnitude
works fine and the magnitude table is not empty.

"TypeError: must be str, not bytes" while creating deterministic hash

How to reproduce:

from pymagnitude import *
pos_vectors = FeaturizerMagnitude(100, namespace = "PartsOfSpeech")
print(pos_vectors.query("NN"))

Output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-2f9508aa33b7> in <module>()
      2 pos_vectors = FeaturizerMagnitude(100, namespace = "PartsOfSpeech")
      3 print(pos_vectors.dim) # 4 - number of dims automatically determined by Magnitude from 100
----> 4 print(pos_vectors.query("NN")) # - array([ 0.08040417, -0.71705252,  0.61228951,  0.32322192])
      5 print(pos_vectors.query("JJ")) # - array([-0.11681135,  0.10259253,  0.8841201 , -0.44063763])
      6 print(pos_vectors.query("NN")) # - array([ 0.08040417, -0.71705252,  0.61228951,  0.32322192]) (deterministic hashing so the same value is returned every time for the same key)

C:\Python36\lib\site-packages\pymagnitude\third_party\repoze\lru\__init__.py in cached_wrapper(*args, **kwargs)
    352             else:
    353                 if val is marker:
--> 354                     val = func(*args, **kwargs)
    355                     cache.put(key, val)
    356                 return val

C:\Python36\lib\site-packages\pymagnitude\__init__.py in query(self, q, pad_to_length, pad_left, truncate_left)
    879             vec = self._vector_for_key_cached(q)
    880             if vec is None:
--> 881                 return self._out_of_vocab_vector_cached(q)
    882             else:
    883                 return vec

C:\Python36\lib\site-packages\pymagnitude\third_party\repoze\lru\__init__.py in cached_wrapper(*args, **kwargs)
    352             else:
    353                 if val is marker:
--> 354                     val = func(*args, **kwargs)
    355                     cache.put(key, val)
    356                 return val

C:\Python36\lib\site-packages\pymagnitude\__init__.py in _out_of_vocab_vector_cached(*args, **kwargs)
    334             @lru_cache(None)
    335             def _out_of_vocab_vector_cached(*args, **kwargs):
--> 336                 return self._out_of_vocab_vector(*args, **kwargs)
    337 
    338             @lru_cache(None)

C:\Python36\lib\site-packages\pymagnitude\__init__.py in _out_of_vocab_vector(self, key)
    673             random_vectors = []
    674             for i, ngram in enumerate(ngrams):
--> 675                 seed = self._seed(ngram)
    676                 Magnitude.OOV_RNG_LOCK.acquire()
    677                 np.random.seed(seed=seed)

C:\Python36\lib\site-packages\pymagnitude\__init__.py in _seed(self, val)
    646         """Returns a unique seed for val and the (optional) namespace."""
    647         if self._namespace:
--> 648             return xxhash.xxh32(self._namespace + Magnitude.RARE_CHAR +
    649                                 val.encode('utf-8')).intdigest()
    650         else:

TypeError: must be str, not bytes

Magnitude queries extremely slow for some queries with medium model.

Also, I also don't seem to be getting the following advantage described in the documentation: "Moreover, memory maps are cached between runs so even after closing a process, speed improvements are reaped."

See the following log.

$ python
Python 3.4.6 (default, Mar 22 2017, 12:26:13) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pymagnitude import Magnitude
>>> vectors = Magnitude('/nlp/data/embeddings_magnitude/eng/GoogleNews-vectors-negative300.magnitude.medium')
>>> from timeit import timeit
>>> timeit('vectors.query(\'cat\')', 'from __main__ import vectors', number=1)
0.0585936838760972
>>> timeit('vectors.query(\'food\')', 'from __main__ import vectors', number=1)
0.03608247195370495
>>> timeit('vectors.query(\'believe\')', 'from __main__ import vectors', number=1)
0.02389267599210143
>>> timeit('vectors.query(\'denormalization\')', 'from __main__ import vectors', number=1)
27.955912864999846
>>> timeit('vectors.query(\'tariffication\')', 'from __main__ import vectors', number=1)
36.63970931386575
>>> timeit('vectors.query(\'tariffication\')', 'from __main__ import vectors', number=1)
7.962598465383053e-05
>>> exit()
$ python
Python 3.4.6 (default, Mar 22 2017, 12:26:13) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pymagnitude import Magnitude
>>> vectors = Magnitude('/nlp/data/embeddings_magnitude/eng/GoogleNews-vectors-negative300.magnitude.medium')
>>> from timeit import timeit
>>> timeit('vectors.query(\'tariffication\')', 'from __main__ import vectors', number=1)
34.75812460412271
>>>

I understand that some queries (especially OOV ones) should be slower than others, but 36 seconds seems excessive. This issue doesn't affect all out-of-vocabulary words. For example:

>>> timeit('vectors.query(\'catdogcow\')', 'from __main__ import vectors', number=1)
1.1214001160115004
>>> 'catdogcow' in vectors
False

Is there anything I can do to get all queries run within some reasonable threshold, say 2 seconds, or to get caching to work? Maybe there should be some feature where if OOV querying is taking too long, a random vector, like for the light model, is returned?

Database disk image malformed with multiprocessing

Hi @AjayP13, I was curious if you had any examples of how you've used this with multiprocessing previously. I'm bumping into a pysqlite error when I try to run with multiprocessing:

coord = tf.train.Coordinator()
processes = []
for i in range(num_processes):
    args = (texts_sliced[i], labels_sliced[i], output_files[i], concatenated_embeddings)
    p = Process(target=_convert_shard, args=args)
    p.start()
    processes.append(p)
coord.join(processes)

  File "/home/jacob/test.py", line 454, in _convert_shard 
    text_embedding = embedding.query(text) 
  File "/home/jacob/anaconda3/pymagnitude/third_party/repoze/lru/__init__.py", line 390, in cached_wrapper                                                                 
    val = func(*args, **kwargs) 
pysqlite2.dbapi2.DatabaseError: database disk image is malformed                                                         
  File "/home/jacob/anaconda3/pymagnitude/__init__.py", line 2088, in query
    for i, m in enumerate(self.magnitudes)]
  File "/home/jacob/anaconda3/pymagnitude/__init__.py", line 2088, in <listcomp>
    for i, m in enumerate(self.magnitudes)] 
  File "/home/jacob/anaconda3/pymagnitude/third_party/repoze/lru/__init__.py", line 390, in cached_wrapper
    val = func(*args, **kwargs)
  File "/home/jacob/anaconda3/pymagnitude/__init__.py", line 1221, in query
    vectors = self._vectors_for_keys_cached(q, normalized)
  File "/home/jacob/anaconda3/pymagnitude/__init__.py", line 1109, in _vectors_for_keys_cached 
    unseen_keys[i], normalized, force=force)
  File "/home/jacob/anaconda3/pymagnitude/third_party/repoze/lru/__init__.py", line 390, in cached_wrapper 
    val = func(*args, **kwargs)
  File "/home/jacob/anaconda3/pymagnitude/__init__.py", line 483, in _out_of_vocab_vector_cached
    return self._out_of_vocab_vector(*args, **kwargs)
  File "/home/jacob/anaconda3/pymagnitude/__init__.py", line 992, in _out_of_vocab_vector, normalized=normalized)
  File "/home/jacob/anaconda3/pymagnitude/__init__.py", line 829, in _db_query_similar_keys_vector, params).fetchall()   
pysqlite2.dbapi2.DatabaseError: database disk image is malformed

I've tried reloading the .Magnitude files as well as setting blocking=True, but can't seem to get around it. Any ideas?

Thanks!

Python 3

Nice project! Any plans for python 3 support?

Error while trying to pip install pymagnitude

  Using cached https://files.pythonhosted.org/packages/0a/a3/b9a34d22ed8c0ed59b00ff55092129641cdfa09d82f9abdc5088051a5b0c/pymagnitude-0.1.120.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-guvdu2_b/pymagnitude/setup.py", line 178, in <module>
        'a+')
    PermissionError: [Errno 13] Permission denied: '/tmp/magnitude.install'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-guvdu2_b/pymagnitude/```

Normalization

First of all: Nice work in implementing useful tasks such as lazy loading and memory mapped files .

As a suggestion, I think it would be nice to have an option regarding the normalization. I might need the vectors as they are and not normalized.

Cache is reusable by another container

I am running pymagnitude inside a docker container. I mount the magnitude file as a volume on the container but I believe that the cache is written to a temp dir inside the container. Is it possible to configure where the cache is written to? I would like to write the to the same volume mounted to the container so that the next container that starts up can use it as well.

When I develop locally I see that the first load of the magnitude file take s ~120s with the options lazy_loading=-1 and blocking=True. I am using these options so that my API does not start serving requests until pymagnitude can have consistent response times when running most_similar(). The next time I start the API I see that pymagnitude locally only takes ~300ms load. However, when I run the api in a docker container the load time is consistently ~120s. I am assuming the this is because the cache is written to a temp dir.

https://github.com/plasticityai/magnitude/blob/master/pymagnitude/__init__.py#L375-L378

I also maybe completely misunderstanding why sometimes it loads faster locally.

Thanks!

Query time slower than gensim?

Hi!

I really hope this question doesn't come across as critical - I think this project is a great idea and really loving the speed at which it can lazy-load models.

I had one question - loading the Google news vectors is massively quicker in magnitude than gensim, however I'm finding that querying is significantly slower. Is this to be expected? It's is quite possible that this is a trade-off against loading time but want to confirm that there's nothing weird going on in my environment.

Code i'm using for testing:

import json
import os
import timeit


ITERATIONS = 500

# Tokens are loaded from disk.
# tokens = ...
tokens = json.dumps(tokens)

mag = timeit.timeit(
'''
for token in tokens:
    try:
        getVector(token)
    except:
        pass
''',
    setup =
'''
from pymagnitude import Magnitude
vec = Magnitude('/home/dom/Code/ner/ner/data/GoogleNews-vectors-negative300.magnitude')
getVector = vec.query
tokens = {}
'''.format(tokens),
    number = ITERATIONS
)

gensim = timeit.timeit(
'''
for token in tokens:
    try:
        getVector(token)
    except:
        pass
''',
    setup = 
'''
from gensim.models import KeyedVectors
vec = KeyedVectors.load('/home/dom/Code/ner/ner/data/GoogleNews-vectors-negative300.w2v', mmap='r')
getVector = vec.__getitem__
tokens = {}
'''.format(tokens),
    number = ITERATIONS
)

print('Gensim is {}x faster'.format(mag / gensim))

For the code in the above; I get gensim being approximately 5x faster if memory-mapped and if not over 13x faster.

Ignoring malformed vectors

The current code throws an error given below when it encounters a malformed vector. With this error the partially built SQLLite database couldn't be used to query the vectors written in the database. As metadata is written into the database later. Wouldn't it be good to ignore the malformed vectors (Throwing a warning message to make user know of it) and try building the database anyway?

  File "/home/rajesh/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/rajesh/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/rajesh/anaconda3/lib/python3.6/site-packages/pymagnitude/converter.py", line 509, in <module>
    approx=approx, approx_trees=approx_trees)
  File "/home/rajesh/anaconda3/lib/python3.6/site-packages/pymagnitude/converter.py", line 324, in convert
    for v in vector))
pysqlite2.dbapi2.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 301, and there are 166 supplied. ```

ValueError: kth(=-1) out of bounds (400000)

Steps to reproduce:

from pymagnitude import *
glove = Magnitude("../../../Datasets/Magnitude/glove.6B.300d.magnitude")
print(glove.closer_than("cat", "tiger"))

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-36-15ba9e6c70f4> in <module>()
----> 1 print(glove.closer_than("cat", "tiger")) # ["dog", ...]

C:\Python36\lib\site-packages\pymagnitude\third_party\repoze\lru\__init__.py in cached_wrapper(*args, **kwargs)
    352             else:
    353                 if val is marker:
--> 354                     val = func(*args, **kwargs)
    355                     cache.put(key, val)
    356                 return val

C:\Python36\lib\site-packages\pymagnitude\__init__.py in closer_than(self, key, q, topn)
   1232 
   1233         return self.most_similar(key, topn=topn, min_similarity=min_similarity,
-> 1234                                  return_similarities=False)
   1235 
   1236     def get_vectors_mmap(self):

C:\Python36\lib\site-packages\pymagnitude\third_party\repoze\lru\__init__.py in cached_wrapper(*args, **kwargs)
    352             else:
    353                 if val is marker:
--> 354                     val = func(*args, **kwargs)
    355                     cache.put(key, val)
    356                 return val

C:\Python36\lib\site-packages\pymagnitude\__init__.py in most_similar(self, positive, negative, topn, min_similarity, return_similarities)
   1163                 negative),
   1164             return_similarities=return_similarities,
-> 1165             method='distance')
   1166 
   1167     @lru_cache(DEFAULT_LRU_CACHE_SIZE, ignore_unhashable_args=True)

C:\Python36\lib\site-packages\pymagnitude\__init__.py in _db_query_similarity(self, positive, negative, min_similarity, topn, exclude_keys, return_similarities, method, effort)
   1068 
   1069                 partition_results = np.argpartition(similiarities, -1 * min(
-> 1070                     filter_topn, self.batch_size - 1))[-filter_topn:]
   1071 
   1072                 for index in partition_results:

C:\Python36\lib\site-packages\numpy\core\fromnumeric.py in argpartition(a, kth, axis, kind, order)
    755 
    756     """
--> 757     return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
    758 
    759 

C:\Python36\lib\site-packages\numpy\core\fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     49 def _wrapfunc(obj, method, *args, **kwds):
     50     try:
---> 51         return getattr(obj, method)(*args, **kwds)
     52 
     53     # An AttributeError occurs if the object does not have

ValueError: kth(=-1) out of bounds (400000)

pip install pygmagnitude won't install on Python 2.7 because it requires pytorch which is not available for Python 2.7 (Windows)

C:\Python27>pip install -U pymagnitude
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
Requirement already up-to-date: pymagnitude in c:\python27\lib\site-packages (0.1.120)
Requirement already satisfied, skipping upgrade: numpy>=1.14.0 in c:\python27\lib\site-packages (from pymagnitude) (1.16.2+mkl)
Requirement already satisfied, skipping upgrade: xxhash>=1.0.1 in c:\python27\lib\site-packages (from pymagnitude) (1.3.0)
Requirement already satisfied, skipping upgrade: fasteners>=0.14.1 in c:\python27\lib\site-packages (from pymagnitude) (0.14.1)
Requirement already satisfied, skipping upgrade: annoy>=1.11.4 in c:\python27\lib\site-packages (from pymagnitude) (1.15.2)
Requirement already satisfied, skipping upgrade: lz4>=1.0.0 in c:\python27\lib\site-packages (from pymagnitude) (2.1.6)
Requirement already satisfied, skipping upgrade: h5py>=2.8.0 in c:\python27\lib\site-packages (from pymagnitude) (2.9.0)
Collecting torch (from pymagnitude)
Downloading https://files.pythonhosted.org/packages/5f/e9/bac4204fe9cb1a002ec6140b47f51affda1655379fe302a1caef421f9846/torch-0.1.2.post1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "c:\users\d97ha\appdata\local\temp\pip-install-v9jcwa\torch\setup.py", line 11, in
raise RuntimeError(README)
RuntimeError: PyTorch does not currently provide packages for PyPI (see status at pytorch/pytorch#566).

Please follow the instructions at http://pytorch.org/ to install with miniconda instead.


----------------------------------------

plasticityai / magnitude Goto Github PK

magnitude's Introduction

Magnitude: a fast, simple vector embedding utility library

Table of Contents

Installation

Motivation

Benchmarks and Features

Pre-converted Magnitude Formats of Popular Embeddings Models

Using the Library

Constructing a Magnitude Object

Querying

Basic Out-of-Vocabulary Keys

Advanced Out-of-Vocabulary Keys

Handling Misspellings and Typos

Concatenation of Multiple Models

Additional Featurization (Parts of Speech, etc.)

Using Magnitude with a ML library

Keras

PyTorch

TFLearn

Utils

Concurrency and Parallelism

File Format and Converter

Remote Loading

Remote Streaming over HTTP

Other Documentation

Other Languages

Other Programming Languages

Other Domains

Contributing

Roadmap

Other Notable Projects

Citing this Repository

LICENSE and Attribution

magnitude's People

Contributors

Stargazers

Watchers

Forkers

magnitude's Issues

Regenerate the output to understand the issue:

Recommend Projects

Recommend Topics

Recommend Org