jsilter / parametric_tsne Goto Github PK
View Code? Open in Web Editor NEWPython / Tensorflow / Keras implementation of Parametric tSNE algorithm
License: MIT License
Python / Tensorflow / Keras implementation of Parametric tSNE algorithm
License: MIT License
I noticed that when I set epochs up to 5000 (to match what I would use in scikit's tSNE) and I have verbose set to 1, fit just keeps running over and over again, when it hits 5000/5000 it just starts back at 1. Here are my parameters:
ptSNE = Parametric_tSNE(10, 2, 20,
alpha=1., do_pretrain=True, batch_size=128,
seed=54321)
My training data is of shape (3057, 10).
First of all, great work! :)
It's much needed for a bunch of reasons.
Second, in the readme, you should add "ptSNE.fit(...)", as it is written now, it doesn't do anything.
Third, I tried to compare this implementation to the sklearn one based on mnist, and this one performs rather poorly for me... I tried going up in the epochs as far as 400 (VERY slow, even on my GPUs), but not much improvement...
Is this my error in usage?
Here is the code:
import numpy as np
np.random.seed(71)
import matplotlib
from keras import backend as K
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD
from keras.callbacks import Callback
from keras.utils import np_utils
from keras.objectives import categorical_crossentropy
from keras.datasets import cifar10, mnist
import matplotlib.pyplot as plt
(X_train, y_train), (X_test, y_test) = mnist.load_data()
n, row, col = X_train.shape
channel = 1
X_train = X_train.reshape(-1, channel * row * col)
X_test = X_test.reshape(-1, channel * row * col)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
#just make this run faster
X_train = X_train[:5000,:]
X_test = X_test[:500,:]
y_train = y_train[:500]
y_test = y_test[:500]
print("X_train.shape:", X_train.shape)
print("X_test.shape:", X_test.shape)
high_dims = X_train.shape[1]
num_outputs = 2
perplexity = 20
ptSNE = Parametric_tSNE(high_dims, num_outputs, perplexity)
ptSNE.fit(X_train, verbose=1,epochs=400)
output_res = ptSNE.transform(X_train)
output_res2 = ptSNE.transform(X_test)
plt.scatter(output_res[:, 0], output_res[:, 1], marker='o', s=4, edgecolor='')
plt.savefig("tsne_train_result.png")
plt.scatter(output_res2[:, 0], output_res2[:, 1], marker='o', s=4, edgecolor='')
plt.savefig("tsne_train_result2.png")
from sklearn import manifold
cdata_tsne0 = manifold.TSNE(n_components=2, init='random', random_state=0, perplexity=20,
learning_rate=300, n_iter=400)
print("Doing tSNE")
cdata_tsne_out0 = cdata_tsne0.fit_transform(X_train)
plt.scatter(cdata_tsne_out0[:,0], cdata_tsne_out0[:,1], marker='o', s=4, edgecolor='')
plt.savefig("tsne_train_result_orig.png")
from IPython.display import Image
Image("tsne_train_result1.png")
Image("tsne_train_result2.png")
Image("tsne_train_result_orig.png")
Great work!
Can you package this so that it works easier. Right now from core import Paramtetirc...
doesnt work.
Is there a way to supply a custom distance metric other than euclidian to the neighbor embedding? This would be very useful for feature vectors of varying sizes.
Thanks for publishing this code, excited to try it out. Running into an error with xrange:
Toy code:
import numpy as np
from parametric_tSNE import Parametric_tSNE
X = np.array([[1,0,1],
[0,1,0]
])
ptsne = Parametric_tSNE(3, 2, 5)
ptsne.fit(X)
Results in error:
parametric_tSNE/utils.py in calc_betas_loop(indata, perplexity, tol, max_tries)
132 in_sq_diffs = get_squared_cross_diff_np(indata)
133
--> 134 loop_samps = xrange(num_samps)
135 for ss in loop_samps:
136 betamin = -np.inf
NameError: name 'xrange' is not defined
IIUC, xrange
in Py2 became range
in Py3. Is your code intended for Py2 or 3?
Thanks
Hello and thank you for making this library.
Do you think it could be possible for it to be published on PyPI?
If necessary I would gladly help prepare the package.
Best,
Luca
A comment (at http://www.jacobsilterra.com/2017/12/11/classifying-and-clustering-with-fasttext/#comment-13461) suggested use of multi-scale t-SNE or UMAP. Multi-scale tSNE in particular seems like it could fit in easily.
Hello. Thank you for this wonderful library. Would it be possible to reconstruct i.e., to project the transformed data from low dimension back to high dimension space?
Sincerely,
Asha
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.