lferry007 / largevis Goto Github PK

View Code? Open in Web Editor NEW

698.0 698.0 168.0 21.1 MB

License: Apache License 2.0

Python 3.82% C++ 90.59% C 5.49% Shell 0.10%

largevis's People

Contributors

Stargazers

Watchers

Forkers

yydxlv nianfudong pandasasa fulquan hihihippp eriche2016 benjamesbabala thirdwing bayesquant rmax-archive zhmz90 directorscut82 erogol kylemcdonald chuckcho dmartinpro atencra frankszn rwzhao caomw furiouslycurious cookies-gh elbamos ncammarata tpltnt mmateja cserxy sonjageorgievska 3d-e-chem talhaasmal edwardzeng josvanroosmalen gjtjx jayinai yijunran nguyenvo09 anukat2015 zzzrbx vikingmew psesg chenying99 gaolemeng normanliu loisaidasam lmcinnes thaussma ywryoo wgmg165 allanlrh apprisi sjtucsly ryfan-rs wisonhuang zeitgeistqian coldkey2003 theolivenbaum tandychao tartaruszen amojry techstone tengke-xiong tpr-ly movinghera colfire yucoian tjrileywisc sin-mike freesiemens embedxj allenwang616 song-xx hongliangwei aabbcc0812206523 lxhsjtu zhydhkcws bigheiniu qianlinjun yajunhuang jwang41 lqcheng2017 huangpeng1126 kethasi shunsunsun flyinsky235019 calebgeniesse xiangjun0103 souvag chizhou-siti grseb9s fanyangpku damluffy repletetop zakheav eugenepig afcarl edwardabraham zhouyonglong lcorvle jalamao tommylee3003

largevis's Issues

typo in LargeVis.cpp

Line 38 is this:

neg_table NULL;

It needs an equals sign or it won't compile.

Any recommendations to run this distributed (Spark) ?

Hi,

I was wondering if there are lots of data on HDFS/Object Store and if I want
to leverage Spark, what would be the best method.

Any examples/suggestions would be appreciated.

Thanks,
Rajesh

Dimensionality reduction clumps all but one point together

I'm able to reduce the mnist data reduction as in the description, but the output of the program for my own data is producing nonsense. Any leads appreciated. Running on OS X.

The output of LargeVis is one outlier point, with all other points in a very tightly-clumped diagonal line: first 10 lines of the file are here. All the remaining points are right next to the last 8 points here.

10000 2
-57.236931 0.250471
1.739228 0.140431
1.739322 0.140431
1.739219 0.140431
1.739558 0.140430
1.739269 0.140430
1.739119 0.140431
1.739207 0.140431
1.739546 0.140430

Perhaps I'm not understanding some particularity of the input data format? Here's what a sample of that looks like: the format appears to me the same as the mnist, except that I have negative numbers. The full test set (20MB) is here.

➜  Linux git:(master) ✗ head ../as_text.txt| cut -c 1-140                            
10000 640
0.068507 0.088455 0.004352 0.062336 -0.008105 -0.065166 0.005332 -0.004465 0.009418 0.053710 0.021793 0.002761 -0.045826 0.047004 -0.021048 
0.030815 0.061551 0.055325 0.014904 0.009537 0.003453 -0.041773 0.070575 0.004215 0.034589 0.026759 0.009715 -0.037361 0.003642 -0.062977 -0
0.044672 0.028437 0.024890 -0.025580 -0.002071 -0.013081 -0.038324 0.007230 0.024878 -0.006843 -0.022699 -0.018267 -0.048828 0.053914 -0.038
-0.003441 0.067980 0.047075 -0.006172 -0.017513 0.022899 0.013291 0.032307 -0.071118 -0.007152 0.019992 -0.019428 -0.069072 0.058524 0.01285
-0.010718 -0.002089 -0.008822 -0.035114 -0.066692 0.038011 -0.019087 0.011121 -0.029621 -0.024403 -0.052654 0.047402 0.006711 -0.064290 -0.0
0.052364 -0.007353 0.006950 0.039280 0.018387 -0.083283 -0.038789 0.022860 -0.029142 0.029422 0.011834 0.073171 -0.025516 0.064107 -0.001747
-0.029875 0.070031 -0.011460 -0.003957 0.025676 0.002881 0.041085 0.009806 0.015105 -0.051295 -0.029721 -0.003456 -0.072049 0.012853 0.05745
0.007060 0.103973 0.024584 0.031729 -0.031754 -0.024805 0.051161 0.042864 -0.021417 0.027601 0.017241 -0.017261 -0.043754 0.008115 -0.017126
-0.055455 -0.063698 0.063268 0.012776 0.005479 -0.033595 -0.063750 0.038983 -0.025671 -0.002447 0.044772 -0.005042 -0.047169 0.030342 0.0006

Computation on CondMat_network example takes too much time

With 4 threads and alpha = 4 it takes 80 minutes. My machine has 2 cores times 2 GHz and 16 GB RAM. The process used 428 MB and CPU utilization was between 90% and 100% all the time.

Has anyone else had this problem? Or any insights?

Windows install error: command '…\Microsoft Visual Studio 10.0\\VC\\BIN\\cl.exe' failed with exit status 2

I've got problems installing LargeVis in Windows:
D:\largeVis\LargeVis-master\Windows>python setup.py install running install running build running build_ext building 'LargeVis' extension C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\cl.exe /c /nolo go /Ox /MD /W3 /GS- /DNDEBUG -ID:\boost_1_58_0 -IC:\Users\sbt-zhidkov-yi\AppData \Local\Continuum\Anaconda2\include -IC:\Users\sbt-zhidkov-yi\AppData\Local\Conti nuum\Anaconda2\PC /TpLargeVis.cpp /Fobuild\temp.win-amd64-2.7\Release\LargeVis.o bj /Ox LargeVis.cpp C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\xlocale(323) : wa rning C4530: C++ exception handler used, but unwind semantics are not enabled. S pecify /EHsc LargeVis.cpp(43) : warning C4244: '=' : conversion from 'double' to '__int64', p ossible loss of data LargeVis.cpp(63) : warning C4996: 'fopen': This function or variable may be unsa fe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_W ARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdio.h(2 34) : see declaration of 'fopen' LargeVis.cpp(70) : warning C4996: 'fscanf': This function or variable may be uns afe. Consider using fscanf_s instead. To disable deprecation, use _CRT_SECURE_NO _WARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdio.h(2 53) : see declaration of 'fscanf' LargeVis.cpp(76) : warning C4996: 'fscanf': This function or variable may be uns afe. Consider using fscanf_s instead. To disable deprecation, use _CRT_SECURE_NO _WARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdio.h(2 53) : see declaration of 'fscanf' LargeVis.cpp(102) : warning C4996: 'fopen': This function or variable may be uns afe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_ WARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdio.h(2 34) : see declaration of 'fopen' LargeVis.cpp(109) : warning C4996: 'fscanf': This function or variable may be un safe. Consider using fscanf_s instead. To disable deprecation, use _CRT_SECURE_N O_WARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdio.h(2 53) : see declaration of 'fscanf' LargeVis.cpp(115) : warning C4244: 'argument' : conversion from '__int64' to 'co nst int', possible loss of data LargeVis.cpp(116) : warning C4244: 'argument' : conversion from '__int64' to 'co nst int', possible loss of data LargeVis.cpp(142) : warning C4996: 'fopen': This function or variable may be uns afe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_ WARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdio.h(2 34) : see declaration of 'fopen' LargeVis.cpp(266) : warning C4244: 'argument' : conversion from '__int64' to 'in t', possible loss of data LargeVis.cpp(272) : warning C4244: 'argument' : conversion from '__int64' to 'in t', possible loss of data LargeVis.cpp(273) : warning C4018: '<' : signed/unsigned mismatch LargeVis.cpp(286) : warning C4244: 'argument' : conversion from '__int64' to 'in t', possible loss of data LargeVis.cpp(288) : warning C4244: 'argument' : conversion from '__int64' to 'in t', possible loss of data LargeVis.cpp(289) : warning C4244: 'argument' : conversion from '__int64' to 'in t', possible loss of data LargeVis.cpp(313) : warning C4244: '=' : conversion from '__int64' to 'int', pos sible loss of data LargeVis.cpp(319) : warning C4244: '=' : conversion from '__int64' to 'int', pos sible loss of data LargeVis.cpp(329) : warning C4244: '=' : conversion from '__int64' to 'int', pos sible loss of data LargeVis.cpp(432) : warning C4018: '<' : signed/unsigned mismatch LargeVis.cpp(485) : warning C4244: '=' : conversion from 'double' to '__int64', possible loss of data LargeVis.cpp(495) : warning C4018: '<' : signed/unsigned mismatch LargeVis.cpp(511) : warning C4996: 'fopen': This function or variable may be uns afe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_ WARNINGS. See online help for details. C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdio.h(2 34) : see declaration of 'fopen' LargeVis.cpp(531) : error C2666: 'pow' : 6 overloads have similar conversions C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\math.h(58 3): could be 'long double pow(long double,int)' C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\math.h(58 1): or 'long double pow(long double,long double)' C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\math.h(53 5): or 'float pow(float,int)' C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\math.h(53 3): or 'float pow(float,float)' C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\math.h(49 7): or 'double pow(double,int)' C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\math.h(12 2): or 'double pow(double,double)' while trying to match the argument list '(real, double)' LargeVis.cpp(539) : warning C4244: '=' : conversion from '__int64' to 'int', pos sible loss of data LargeVis.cpp(563) : warning C4244: '=' : conversion from 'double' to 'real', pos sible loss of data LargeVis.cpp(564) : warning C4244: '=' : conversion from 'double' to 'real', pos sible loss of data LargeVis.cpp(568) : warning C4244: 'argument' : conversion from 'double' to 'rea l', possible loss of data LargeVis.cpp(568) : warning C4244: 'argument' : conversion from 'double' to 'rea l', possible loss of data LargeVis.cpp(583) : warning C4244: '=' : conversion from 'double' to 'real', pos sible loss of data LargeVis.cpp(608) : warning C4244: '=' : conversion from 'double' to 'real', pos sible loss of data LargeVis.cpp(627) : warning C4244: '=' : conversion from 'double' to 'real', pos sible loss of data LargeVis.cpp(634) : warning C4244: '=' : conversion from 'double' to 'real', pos sible loss of data LargeVis.cpp(635) : warning C4244: '=' : conversion from 'double' to 'real', pos sible loss of data error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 10.0\\VC\\BIN\\ amd64\\cl.exe' failed with exit status 2

I'm using MS Visual Studio 2010 and boost v1.58.0.
The system is Windows 7 x64.
There was no modifications in setup.py.
boost v1.58.0 extacted into "D:\boost_1_58_0" (the same path as in setup.py).

Largevis doesn't read the complete edgelist (Run using cpp)

Segmentation fault (core dumped)

Running the network example I get the following error:

Total vertices : 7564   Total edges : 102372
Fitting model   Alpha: 0.989569 Progress: 1.043%Segmentation fault (core dumped)

Running it several times it manages to progress from around 1% to around 4% but never beyond. Any insights?

running build_ext error: [WinError 2]

Hi! when I use the command 'python setup.py install ' to install LargeVis in Windows 10,
it appears that:
running install
running build
running build_ext
error: [WinError 2]
I search it for a long time, but cant't get a solution. Would you help me? many thanks!

Why is the number of nodes in 2D vectors reduced?

The node-sequence of original feature is consistent with the generated 2-D vectors or not???

Hi, there is a issue puzzled me. You know that in LargeVis -fea means specify whether the input file is high-dimensional feature vectors (1) or networks (0). Default is 1. I have a file of feature vectors, and when i use LargeVis i can get a set of 2-D vectors. I want to know the node-sequence of original feature is consistent with the generated 2-D vectors or not. I try to read the source code, but I cannot get the answer. Thank u.

Speed Improvements

Hey guys,
Thanks for the great implementation, has really helped visualise data that I'm working on.
However, I can see two ways in which the implementation could be improved.
(1) NMSLib can compute K-NN trees 10 times as quickly as Annoy, and allows for 10 times as many queries per second.
(2) When computing the objective of the model, you could use a GPU library like pytorch and batch compute. This might speed up the calculations by a big factor if you can allow for large batches.

I'd be willing to work on this as a project, if you guys are up for it. I'm not sure about the optimisation tricks you've used in (2) training the objective, so would be less likely to try to implement by myself.

Thanks,
Max

cosine distance support

I think it would be great to have support for cosine distance as in largeVis R package:
https://cran.r-project.org/web/packages/largeVis/vignettes/largeVis.html

Duplicates?

How are you handling duplicates?

There are cases - which come up suprisingly often with embeddings - where a point will have a number of duplicate points > K.

It seems incorrect to simply leave a point as a nearest neighbor of only duplicate points, but not a nearest neighbor of other duplicates by random chance.

Lonely points

Hi,
I'm trying to use LargeVis to visualize my doc2vec features of 20NG (7532 test documents, 100 features each). I'm using all the default parameters, and I get the following result.

I was surprised by he lonely points in the data as their corresponding documents were not noticeably different than the others in their category. I tried running the algorithms after taking these documents out of the dataset, but got a similar pattern of results - a few ~6-9 lonely points representing seemingly normal documents. I previously modeled this data using various TSNE methods and none showed such a pattern of results. I am wondering if there is a simple explanation or something I am overlooking?

Also, plot.py only works for me if I change in row 29: vec[1], vec[2] to vec[0], vec[1]

Thanks in advance
Shani

mac unable to install

the commend line infer that gcc could not compile cpp file with #include
warning: include path for stdlibc++ headers not found; pass '-std=libc++' on the command line to use the libc++ standard library instead [-Wstdlibcxx-not-found]

What's the complexity of LargeViz to the dimension of data points?

e.g. I have 10k data points each with 40k dimension.

how to import LargeVis, when run the LargeVis_run.py,ModuleNotFoundError: No module named LargeVis

Installation failed on Windows 10 (Python 2.7 and 3.6)

Hi there. I'm really new to computer science, I'm a humanities student and I'm preparing part of my thesis in computational linguistics. I would need LargeVis to visualize my data (vectors created from doc2vec algorigithm)

I have Windows, I have installed boost 1.58.0, the Python compiler and Visual Studio. I cannot install the module and for sure I'm doing some noob errors. The boost path is set `correclty and I'm using the command prompt of VS 2017. Am I missing something? Do I need to do some preliminary steps? Maybe I'm missing something that is presupposed in the documentation. If someone could help me I would be grateful.

Here is my code

c:\Users\mrleo\Downloads\LargeVis-master\Windows>python setup.py install running install running build running build_ext building 'LargeVis' extension C:\Users\mrleo\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG "-IC:\Program Files\boost_1_58_0" -IC:\Python27\include -IC:\Python27\PC /TpLargeVis.cpp /Fobuild\temp.win-amd64-2.7\Release\LargeVis.obj /Ox LargeVis.cpp C:\Python27\include\xlocale(342) : warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc C:\Program Files\boost_1_58_0\boost/random/detail/polynomial.hpp(114) : warning C4996: 'std::fill_n': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators' C:\Python27\include\xutility(3283) : see declaration of 'std::fill_n' C:\Program Files\boost_1_58_0\boost/random/detail/polynomial.hpp(256) : warning C4996: 'std::fill_n': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators' C:\Python27\include\xutility(3283) : see declaration of 'std::fill_n' C:\Program Files\boost_1_58_0\boost/random/detail/polynomial.hpp(264) : warning C4996: 'std::fill_n': Function call with parameters that may be unsafe - this call relies on the caller to check that the passed values are correct. To disable this warning, use -D_SCL_SECURE_NO_WARNINGS. See documentation on how to use Visual C++ 'Checked Iterators' C:\Python27\include\xutility(3283) : see declaration of 'std::fill_n' c:\users\mrleo\downloads\largevis-master\windows\annoy\stdint.h(241) : warning C4005: 'INTMAX_C' : macro redefinition C:\Program Files\boost_1_58_0\boost/cstdint.hpp(460) : see previous definition of 'INTMAX_C' c:\users\mrleo\downloads\largevis-master\windows\annoy\stdint.h(242) : warning C4005: 'UINTMAX_C' : macro redefinition C:\Program Files\boost_1_58_0\boost/cstdint.hpp(461) : see previous definition of 'UINTMAX_C' LargeVis.cpp(43) : warning C4244: '=' : conversion from 'double' to '__int64', possible loss of data LargeVis.cpp(63) : warning C4996: 'fopen': This function or variable may be unsafe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Python27\include\stdio.h(237) : see declaration of 'fopen' LargeVis.cpp(70) : warning C4996: 'fscanf': This function or variable may be unsafe. Consider using fscanf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Python27\include\stdio.h(256) : see declaration of 'fscanf' LargeVis.cpp(71) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(76) : warning C4996: 'fscanf': This function or variable may be unsafe. Consider using fscanf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Python27\include\stdio.h(256) : see declaration of 'fscanf' LargeVis.cpp(102) : warning C4996: 'fopen': This function or variable may be unsafe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Python27\include\stdio.h(237) : see declaration of 'fopen' LargeVis.cpp(109) : warning C4996: 'fscanf': This function or variable may be unsafe. Consider using fscanf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Python27\include\stdio.h(256) : see declaration of 'fscanf' LargeVis.cpp(115) : warning C4244: 'argument' : conversion from '__int64' to 'const int', possible loss of data LargeVis.cpp(116) : warning C4244: 'argument' : conversion from '__int64' to 'const int', possible loss of data LargeVis.cpp(129) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(133) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(134) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(142) : warning C4996: 'fopen': This function or variable may be unsafe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Python27\include\stdio.h(237) : see declaration of 'fopen' LargeVis.cpp(146) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(175) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(210) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(211) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(213) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(214) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(215) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(221) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(222) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(266) : warning C4244: 'argument' : conversion from '__int64' to 'int', possible loss of data LargeVis.cpp(273) : warning C4244: 'argument' : conversion from '__int64' to 'size_t', possible loss of data LargeVis.cpp(273) : warning C4244: 'argument' : conversion from '__int64' to 'size_t', possible loss of data LargeVis.cpp(273) : warning C4244: 'argument' : conversion from '__int64' to 'int', possible loss of data LargeVis.cpp(275) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(277) : warning C4244: 'argument' : conversion from '__int64' to '__w64 int', possible loss of data LargeVis.cpp(287) : warning C4244: 'argument' : conversion from '__int64' to 'int', possible loss of data LargeVis.cpp(289) : warning C4244: 'argument' : conversion from '__int64' to 'int', possible loss of data LargeVis.cpp(290) : warning C4244: 'argument' : conversion from '__int64' to 'int', possible loss of data LargeVis.cpp(292) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(294) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(308) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(314) : warning C4244: '=' : conversion from '__int64' to 'int', possible loss of data LargeVis.cpp(319) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(320) : warning C4244: '=' : conversion from '__int64' to 'int', possible loss of data LargeVis.cpp(326) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(328) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(330) : warning C4244: '=' : conversion from '__int64' to 'int', possible loss of data LargeVis.cpp(351) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(352) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(375) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(377) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(378) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(393) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(395) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(395) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(397) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(399) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(411) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(413) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(414) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(416) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(418) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(427) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(436) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(445) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(450) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(457) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(459) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(460) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(468) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(471) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(471) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(472) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(472) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(472) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(472) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(486) : warning C4244: '=' : conversion from 'double' to '__int64', possible loss of data LargeVis.cpp(496) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(512) : warning C4996: 'fopen': This function or variable may be unsafe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. C:\Python27\include\stdio.h(237) : see declaration of 'fopen' LargeVis.cpp(515) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(515) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(515) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(524) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(528) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(530) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(532) : error C2666: 'pow' : 6 overloads have similar conversions C:\Python27\include\math.h(575): could be 'long double pow(long double,int)' C:\Python27\include\math.h(573): or 'long double pow(long double,long double)' C:\Python27\include\math.h(527): or 'float pow(float,int)' C:\Python27\include\math.h(525): or 'float pow(float,float)' C:\Python27\include\math.h(489): or 'double pow(double,int)' C:\Python27\include\math.h(123): or 'double pow(double,double)' while trying to match the argument list '(real, double)' LargeVis.cpp(536) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(540) : warning C4244: '=' : conversion from '__int64' to 'int', possible loss of data LargeVis.cpp(554) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(555) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(564) : warning C4244: '=' : conversion from 'double' to 'real', possible loss of data LargeVis.cpp(565) : warning C4244: '=' : conversion from 'double' to 'real', possible loss of data LargeVis.cpp(569) : warning C4244: 'argument' : conversion from 'double' to 'real', possible loss of data LargeVis.cpp(569) : warning C4244: 'argument' : conversion from 'double' to 'real', possible loss of data LargeVis.cpp(570) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(571) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(579) : warning C4244: 'argument' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(584) : warning C4244: '=' : conversion from 'double' to 'real', possible loss of data LargeVis.cpp(608) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(609) : warning C4244: '=' : conversion from 'double' to 'real', possible loss of data LargeVis.cpp(612) : warning C4244: 'initializing' : conversion from '__int64' to 'unsigned int', possible loss of data LargeVis.cpp(628) : warning C4244: '=' : conversion from 'double' to 'real', possible loss of data LargeVis.cpp(635) : warning C4244: '=' : conversion from 'double' to 'real', possible loss of data LargeVis.cpp(636) : warning C4244: '=' : conversion from 'double' to 'real', possible loss of data error: command '"C:\Users\mrleo\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\bin\cl.exe"' failed with exit status 2

Adding points

Have you folks looked-at/made-any-progress-toward adding new points to an existing visualization?

MacOS install error

Hello Team,

This error when compiling on MacOS:

clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk -I/usr/local/opt/openjdk/include -I/usr/local/include -I/usr/local/opt/[email protected]/Frameworks/Python.framework/Versions/3.9/include/python3.9 -c LargeVismodule.cpp -o build/temp.macosx-12-x86_64-cpython-39/LargeVismodule.o "-lm -pthread -lgsl -lgslcblas -Ofast -march=native -ffast-math"
clang-13: warning: -lm -pthread -lgsl -lgslcblas -Ofast -march=native -ffast-math: 'linker' input unused [-Wunused-command-line-argument]
LargeVismodule.cpp:97:18: error: use of undeclared identifier 'PyString_AsString'
real x = atof(PyString_AsString(PyObject_Str(PyList_GetItem(vec, j))));
^
LargeVismodule.cpp:130:2: error: use of undeclared identifier 'Py_InitModule'
Py_InitModule("LargeVis", PyExtMethods);
^
2 errors generated.
error: command '/usr/local/opt/llvm/bin/clang' failed with exit code 1

Thanks,

Jianshu

How to Keep Consistence in Asynchronous Stochastic Gradient Descent

hi, in the paper it has said that asynchronous update will not make conflict, but when I want to add Polynomial Kernel Function in calculating the probability of observing an edge in low dimension, multithread went not correct, but one thread works well.

So do you have suggestions on keeping parameters consistency, currently I just want to imitate a parameter server to keep the data consistency.

To further accelerate the training process, we adopt the asynchronous stochastic gradient descent, which is very ef- ﬁcient and eﬀective on sparse graphs [19]. The reason is that when diﬀerent threads sample diﬀerent edges for model updating, as the graph is very sparse, the vertices of the sampled edges in diﬀerent threads seldom overlap

Is this the official reference implementation?

Hi,

Can I get some background about the author(s)/status of lferry007/LargeVis? Is this "THE" reference implementation of LargeVis (written/backed by the authors of the paper) , or is this just "one-another" implementation?

As far as I know the code related to the paper is not published (or is this it)? If this is not, is the original reference implementation available, and if yes, what does this implementation add?

I want to use LargeVis/tSNE in my master thesis using this implementation, but this require that I get a bit more information about the background of this implementation.

Tx,

Jos

KNN-graph as file

It would be great to have access to KNN-graph as a file (maybe option for LargeVis) supported by Gephi.
https://gephi.org/users/supported-graph-formats/

LargeVis.cpp:351:48: warning: format specifies type 'int' but the argument has type 'long long'

Environment: macOS Sierra v.10.12

So after modifying line 347 of annoylib.h to change lseek64 to lseek, I compile the source file (in the Linux folder) via:

g++ LargeVis.cpp main.cpp -o LargeVis -lm -pthread -lgsl -lgslcblas -Ofast -march=native -ffast-math -L/usr/local/lib -I/usr/local/include

But got this error

LargeVis.cpp:351:48: warning: format specifies type 'int' but the argument has type 'long long' [-Wformat]
                printf("Running propagation %d/%d%c", i + 1, n_propagations, 13);
                                               ~~            ^~~~~~~~~~~~~~
                                               %lld
1 warning generated.

Documentation & Acknowledgment

Hi,

First, thank you for sharing this implementation of LargeVis. I like the idea of having simple code doing a complex computation. In saying that, it would be great if you can include documentation to the source. This would help to improve or extend the source.

Also, I can see that some of the methods have been taken from a previous work of the original authors of LargeVis https://github.com/tangjianpku/LINE with minors C++ modifications. In my opinion, you should acknowledge that fact in the documentation as well.

Regards

Only installs on Python 2

The documentation doesn't indicate this, but the LargeVis Python wrapper seems to be written as an extension module for Python 2 only. If the active environment is Python 3, then python setup.py install will fail with this error:

LargeVismodule.cpp: In function ‘PyObject* initLargeVis()’:
LargeVismodule.cpp:130:40: error: ‘Py_InitModule’ was not declared in this scope
  Py_InitModule("LargeVis", PyExtMethods);

Segmentation fault

Running on my network, get following error.

Total vertices : 639 Dimension : 256
Normalizing ...... Done.
Running ANNOY ...... Done.
Running propagation 3/3
Test knn accuracy : 53.93%
Computing similarities ...... Done.
Fitting model Alpha: 0.905441 Progress: 9.456%Segmentation fault

Running it several times, and this is the best result. I don't know what cause the segmentation fault. Could you help me solve this?

Thanks in advance

Potential bug in LargeVis::search_reverse_thread

I believe there is a small bug in the search_reverse_thread call - I am not sure what the correct fix is, but the current version does not really make sense to me:

`void LargeVis::search_reverse_thread(int id)
{
    long long lo = id * n_vertices / n_threads;
    long long hi = (id + 1) * n_vertices / n_threads;
    long long x, y, p, q;
    for (x = lo; x < hi; ++x)
    {
        for (p = head[x]; p >= 0; p = next[p])
        {
            y = edge_to[p];
            for (q = head[x]; q >= 0; q = next[q])
            {
                if (edge_to[q] == x) break;
            }
            reverse[p] = q;
        }
    }
}`

In the inner loop y gets assigned the index of a connected node, but then y is not being used at all. Since it's supposed to be a backwards search my guess is that the fix would be to replace head[x] with head[y] in the inner loop - so there is a search back from the other edges neighbors until the two paths meet.

`void LargeVis::search_reverse_thread(int id)
{
    long long lo = id * n_vertices / n_threads;
    long long hi = (id + 1) * n_vertices / n_threads;
    long long x, y, p, q;
    for (x = lo; x < hi; ++x)
    {
        for (p = head[x]; p >= 0; p = next[p])
        {
            y = edge_to[p];
            for (q = head[y]; q >= 0; q = next[q])
            {
                if (edge_to[q] == x) break;
            }
            reverse[p] = q;
        }
    }
}`

Installation on Mac

Does it needs any specific installation for Mac ?
I am getting this error while compling in Mac (El Capiton)

~/codes/LargeVis/Linux$ g++ -I/usr/local/include/ LargeVis.cpp main.cpp -o LargeVis -lm -pthread -lgsl -lgslcblas -Ofast -march=native -ffast-math
In file included from LargeVis.cpp:1:
In file included from ./LargeVis.h:10:
./ANNOY/annoylib.h:347:22: error: use of undeclared identifier 'lseek64'; did you mean 'lseek'?
    long long size = lseek64(fd, 0, SEEK_END);
                     ^~~~~~~
                     lseek
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/unistd.h:464:8: note: 'lseek' declared here
off_t    lseek(int, off_t, int);
         ^
LargeVis.cpp:350:48: warning: format specifies type 'int' but the argument has type 'long long' [-Wformat]
                printf("Running propagation %d/%d%c", i + 1, n_propagations, 13);
                                               ~~            ^~~~~~~~~~~~~~
                                               %lld
1 warning and 1 error generated.
In file included from main.cpp:1:
In file included from ./LargeVis.h:10:
./ANNOY/annoylib.h:347:22: error: use of undeclared identifier 'lseek64'; did you mean 'lseek'?
    long long size = lseek64(fd, 0, SEEK_END);
                     ^~~~~~~
                     lseek
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/unistd.h:464:8: note: 'lseek' declared here
off_t    lseek(int, off_t, int);
         ^
1 error generated.

Option to specify random seed

Would be nice for repeatability to have the option to set the random seed as a parameter at runtime. I've looked through the code base, but got a little confused with tracking all the different spots the RNG gets called, and I'm not sure where it should be initialized. Thoughts?