jsilter / parametric_tsne Goto Github PK

View Code? Open in Web Editor NEW

148.0 148.0 31.0 7.26 MB

Python / Tensorflow / Keras implementation of Parametric tSNE algorithm

License: MIT License

Python 100.00%

parametric_tsne's People

Contributors

Stargazers

Watchers

parametric_tsne's Issues

Epochs of 5000 causes fit to run endlessly

I noticed that when I set epochs up to 5000 (to match what I would use in scikit's tSNE) and I have verbose set to 1, fit just keeps running over and over again, when it hits 5000/5000 it just starts back at 1. Here are my parameters:

ptSNE = Parametric_tSNE(10, 2, 20,
                        alpha=1., do_pretrain=True, batch_size=128,
                        seed=54321)

My training data is of shape (3057, 10).

The performance is VERY different to sklearn.manifold.TSNE

First of all, great work! :)
It's much needed for a bunch of reasons.

Second, in the readme, you should add "ptSNE.fit(...)", as it is written now, it doesn't do anything.

Third, I tried to compare this implementation to the sklearn one based on mnist, and this one performs rather poorly for me... I tried going up in the epochs as far as 400 (VERY slow, even on my GPUs), but not much improvement...
Is this my error in usage?

Here is the code:

import numpy as np
np.random.seed(71)

import matplotlib
from keras import backend as K
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD
from keras.callbacks import Callback
from keras.utils import np_utils
from keras.objectives import categorical_crossentropy
from keras.datasets import cifar10, mnist

import matplotlib.pyplot as plt

# mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()
n, row, col = X_train.shape
channel = 1

X_train = X_train.reshape(-1, channel * row * col)
X_test = X_test.reshape(-1, channel * row * col)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

#just make this run faster
X_train = X_train[:5000,:]
X_test = X_test[:500,:]
y_train = y_train[:500]
y_test = y_test[:500]

print("X_train.shape:", X_train.shape)
print("X_test.shape:", X_test.shape)

high_dims = X_train.shape[1]
num_outputs = 2
perplexity = 20
ptSNE = Parametric_tSNE(high_dims, num_outputs, perplexity)
ptSNE.fit(X_train, verbose=1,epochs=400)
output_res = ptSNE.transform(X_train)
output_res2 = ptSNE.transform(X_test)

plt.scatter(output_res[:, 0], output_res[:, 1], marker='o', s=4, edgecolor='')
plt.savefig("tsne_train_result.png")

plt.scatter(output_res2[:, 0], output_res2[:, 1], marker='o', s=4, edgecolor='')
plt.savefig("tsne_train_result2.png")

from sklearn import manifold
cdata_tsne0 = manifold.TSNE(n_components=2, init='random', random_state=0, perplexity=20,
learning_rate=300, n_iter=400)
print("Doing tSNE")
cdata_tsne_out0 = cdata_tsne0.fit_transform(X_train)
plt.scatter(cdata_tsne_out0[:,0], cdata_tsne_out0[:,1], marker='o', s=4, edgecolor='')
plt.savefig("tsne_train_result_orig.png")

from IPython.display import Image
Image("tsne_train_result1.png")
Image("tsne_train_result2.png")
Image("tsne_train_result_orig.png")

Can you Pacakge This

Great work!

Can you package this so that it works easier. Right now from core import Paramtetirc... doesnt work.

Supply custom metric to the neighbor embedding?

Is there a way to supply a custom distance metric other than euclidian to the neighbor embedding? This would be very useful for feature vectors of varying sizes.

xrange called, which errors in Python 3

Thanks for publishing this code, excited to try it out. Running into an error with xrange:

Toy code:

import numpy as np
from parametric_tSNE import Parametric_tSNE

X = np.array([[1,0,1],
          [0,1,0]
         ])

ptsne = Parametric_tSNE(3, 2, 5)
ptsne.fit(X)

Results in error:

parametric_tSNE/utils.py in calc_betas_loop(indata, perplexity, tol, max_tries)
132     in_sq_diffs = get_squared_cross_diff_np(indata)
133  
--> 134     loop_samps = xrange(num_samps)
135     for ss in loop_samps:
136         betamin = -np.inf

NameError: name 'xrange' is not defined

IIUC, xrange in Py2 became range in Py3. Is your code intended for Py2 or 3?

Thanks

Publish this on Pypi for easier use

Hello and thank you for making this library.

Do you think it could be possible for it to be published on PyPI?
If necessary I would gladly help prepare the package.

Best,
Luca

Implement multiscale tSNE

A comment (at http://www.jacobsilterra.com/2017/12/11/classifying-and-clustering-with-fasttext/#comment-13461) suggested use of multi-scale t-SNE or UMAP. Multi-scale tSNE in particular seems like it could fit in easily.

Reconstruction from the compressed data

Hello. Thank you for this wonderful library. Would it be possible to reconstruct i.e., to project the transformed data from low dimension back to high dimension space?

Sincerely,
Asha

jsilter / parametric_tsne Goto Github PK

parametric_tsne's People

Contributors

Stargazers

Watchers

Forkers

parametric_tsne's Issues

Epochs of 5000 causes fit to run endlessly

The performance is VERY different to sklearn.manifold.TSNE

# mnist

Can you Pacakge This

Supply custom metric to the neighbor embedding?

xrange called, which errors in Python 3

Publish this on Pypi for easier use

Implement multiscale tSNE

Reconstruction from the compressed data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent