Git Product home page Git Product logo

reymond-group / tmap Goto Github PK

View Code? Open in Web Editor NEW
201.0 13.0 30.0 338.01 MB

A very fast visualization library for large, high-dimensional data sets.

Home Page: http://tmap.gdb.tools

CMake 0.60% Batchfile 0.01% Shell 0.01% Python 1.94% C++ 96.67% CSS 0.01% HTML 0.02% C 0.52% Makefile 0.01% NASL 0.24%
visualization visualization-tools visualization-library python-library plotting graph-algorithms graphics

tmap's Introduction

tmap

tmap is a very fast visualization library for large, high-dimensional data sets. Currently, tmap is available for Python. tmaps graph layouts are based on the OGDF library.

Tutorial and Documentation

See http://tmap.gdb.tools

Notebook

Examples

Name Description
NIPS Conference Papers A tmap visualization showing the linguistic relationship between NIPS conference papers. view
Project Gutenberg A tmap visualization of the linguistic relationships between books and authors extracted from Project Gutenberg. view
MNIST A visualization of the well known MNIST data set. No further explanation needed. view
Fashion MNIST A visualization of a more fashionable variant of MNIST. view
Drugbank A tmap visualization of all drugs registered in Drugbank. view
RNAseq RNA sequencing data of tumor samples. Visualized using tmap. view
Flowcytometry Flowcytometry data visualized using tmap. view
MiniBooNE tmap data visualization of a particle detection physics experiment. view

Availability

Language Operating System Status
Python Linux Available
Windows Available1
macOS Available
R Unvailable2

1Works with WSL 2FOSS R developers wanted!

Installation

tmap is installed using the conda package manager. Don't have conda? Download miniconda.

conda install -c tmap tmap

We suggest using faerun to plot the data layed out by tmap. But you can of course also use matplotlib (which might be to slow for large data sets and doesn't provide interactive features).

pip install faerun
# pip install matplotlib

tmap's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tmap's Issues

kNN graph for a subset of entries in the index

Hi, examining the API for the LSHForest class, I see that it has a method get_knn_graph. If I understand correctly, it will construct the kNN graph for all entries in the index.

Would it be possible to have a method that constructs the kNN graph for a subset of entries in the index, specified as a list of VectorUint objects or list of id's

Map is missing São Tomé and Príncipe

Hello,

I am trying to highlight different countries on the world map using tmap and I noticed that São Tomé and Príncipe in Africa is not listed. How can I add this so I can work on my project?

Many thanks

Tag a new version

Could you add a tag to this git repository? This is useful to create a conda forge package.

Patches

Here are two patches I used to be able to create a conda package for tmap using a modern compiler C++11.

diff --git a/tmap/tmap/analyse.cc b/tmap/tmap/analyse.cc
index b282b74..1d6e686 100644
--- a/tmap/tmap/analyse.cc
+++ b/tmap/tmap/analyse.cc
@@ -30,7 +30,7 @@ namespace {
 
     std::vector<float> diff(weights.size());
     std::transform(weights.begin(), weights.end(), diff.begin(),
-                  std::bind2nd(std::minus<float>(), mean));
+                  std::bind(std::minus<float>(), std::placeholders::_1, mean));
     float sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
     float stdev = std::sqrt(sq_sum / weights.size());
 
@@ -64,7 +64,7 @@ namespace {
 
       std::vector<float> diff(weights.size());
       std::transform(weights.begin(), weights.end(), diff.begin(),
-                    std::bind2nd(std::minus<float>(), mean));
+                    std::bind(std::minus<float>(), std::placeholders::_1, mean));
       float sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
       float stdev = std::sqrt(sq_sum / weights.size());
diff --git a/tmap/setup.py b/tmap/setup.py
index 28c0d06..4d7e240 100644
--- a/tmap/setup.py
+++ b/tmap/setup.py
@@ -54,8 +54,6 @@ class CMakeBuild(build_ext):
             cmake_args += [
                 "-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{}={}".format(cfg.upper(), extdir)
             ]
-            if sys.maxsize > 2 ** 32:
-                cmake_args += ["-A", "x64"]
             build_args += ["--", "/m"]
         elif platform.system() == 'Darwin':
             cmake_args += ['-DOpenMP_C_FLAG=-fopenmp']

crash on linux calling `layout_from_lsh_forest` with 0-size `LSHForest`

Repo:

  1. build with debug symbol on linux (I used the python:3.9.10-bullseye docker image)
  2. call into layout_from_lsh_forest with a zero-size LSHForest
  3. observe a crash when de-referencing an invalid end-iterator

The cause seems to be float max_x = *max_element(x.begin(), x.end()); where we dereference the invalid end-iterator returned by max_element when called with an empty container. This should crash more often, but although I've not looked into it properly, but because we dereference into a float we probably just get garbage data on the other platforms.

I have a fix for this code path and MSTFromLSHForest where we return an tuple of empty objects for an empty forest. The fix is on the development branch.

Enable pickling of tmap.VectorUint

Hi, I'm trying to compute MHFP's in parallel and index them in a parallel fashion. The code looks like

def fp_function(pair):
    smi, molid = pair
    mol = AllChem.MolFromSmiles(smi)
    fp = tmap.VectorUint(enc.encode_mol(mol, min_radius=0))
    return molid, fp

num_cores = multiprocessing.cpu_count()
fps = Parallel(n_jobs=num_cores)(delayed(fp_function)(input_pair) for input_pair in molcsv)
pickle.dump(fps, open("libcomp_fps.pkl", 'wb'))

However on running this I get

Traceback (most recent call last):
  File "/Users/guha/src/tmap/lsh_query.py", line 27, in <module>
    fps = Parallel(n_jobs=num_cores)(delayed(fp_function)(input_pair) for input_pair in molcsv)
  File "/anaconda3/envs/rdkit/lib/python3.6/site-packages/joblib/parallel.py", line 996, in __call__
    self.retrieve()
  File "/anaconda3/envs/rdkit/lib/python3.6/site-packages/joblib/parallel.py", line 899, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/anaconda3/envs/rdkit/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 517, in wrap_future_result
    return future.result(timeout=timeout)
  File "/anaconda3/envs/rdkit/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/anaconda3/envs/rdkit/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
TypeError: can't pickle tmap.VectorUint objects

Are there plans to make VectorUint pickleable? Or is there an alternative approach to parallelizing this type of computation?

Availability of processed datasets for example reproducibility

Dear Reymond Group,

Thank you very much for the development and release of this great tool! As I started to look through some of the examples, I wondered if you might be able to make the processed data available for the worked examples. I'm guessing my data will be most similar in format/shape to the RNA Seq data, but I'm having some issues confirming that.

For instance, in the RNA Sequencing example, your input files are generically named ("data.csv.xz" and "labels.csv":

DATA = pd.read_csv("data.csv.xz", index_col=0, sep=",")
LABELS = pd.read_csv("labels.csv", index_col=0, sep=",")

I see at the top of that file the data source is https://gdc.cancer.gov/about-data/publications/pancanatlas, but when I follow that URL, I'm not clear on which file in particular I should download and if there's any further processing required to get the "labels.csv."

Is it the EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv file (it's 1.88 GB in size)?

So, I'm wondering if you'd be able to host the processed example data files on your site? Or offer more info on their shape/format?

Thanks again,
Ian

conda install not working

I get the following error when attempting the conda install (conda install -c tmap tmap):

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  • tmap

Jupyter Kernel Dying while using batch_add

Running the following commands seem to both kill my jupyter kernel and I can't figure out why. I appreciate the help:
lf.batch_add(enc.batch_from_weight_array(fps, method="I2CWS"))
lf.index()

Thanks

Installation on Mac M1 Monterey 12.5.1

Hi there, I have been trying to install TMAP on my macbook pro running M1 and have tried various means after conda didn't work for me (using rosetta etc). I was wondering if anyone has managed to install this successfully?
much appreciated.
Shawn

importError on macOS Mojave

I followed the instructions on http://tmap.gdb.tools/, I installed conda first and conda install -c tmap tmap

But I got ImportError when try to import tmap to follow your examples:

ImportError: dlopen(/xxx/envs/lib/python3.7/site-packages/tmap.cpython-37m-darwin.so, 2): Library not loaded: @rpath/libomp.dylib
  Referenced from: /xxx/envs/lib/python3.7/site-packages/tmap.cpython-37m-darwin.so
  Reason: image not found

Which step did I do it wrong? thanks.

Tmap display

When I generated tmap following the drugBank example in your document, TMap generated HTML files normally, but did not display tmap tree.

tree is built without leaves

Good morning,

I went through all the stages of plotting fearun as usual, without any experiments. All of a sudden, now I got a tree, but without dots on it. I didn't find any problems with smiles, they should be correct. What could be the reason for this behavior?
export (1)

Thank you,
Alina

Kernel dying when trying to run `tm.layout_from_edge_list()`

Hi tmap @reymond-group, @daenuprobst @undeadpixel, and community

First of all ! It is a great tool, so congratulations for your work and manuscript, and thanks a lot for your effort support it.

The tmap documentation is very clear and detailed, however I found the following issue:

  • I am trying to run tm.layout_from_edge_list() on a pairwise similarity list, and was able to run the example provided on https://tmap.gdb.tools/#simple-graph. But when trying to run from a different edge_list object, the kernel dies without any error message, so I am very puzzled.
    I looked at the object type and it matches perfectly the exemple object .. so I am very puzzled.

Anyone had seen a similar issue ? Anyone would know what is wrong with this ?
I put a Binder link online so you can directly run the notebook online.
https://mybinder.org/v2/gh/lfnothias/tmap_testing/46b4bbf833fdd22fdbc1de26dd0157cb663f4006.

Thanks in advance

Change the style of datapoints

Hi Daniel,

Is there any way the style used to draw the datapoints can be adapted? Per default, every node of the tmap is visualized as a circle. I would like to show a subset of the samples not as dots but as triangles/squares/crosses or so.

I digged in the documentation but couldn't find anything. Is there any way to control this?

Illegal instruction

Hello,

This is the first time I have encountered error Illegal instruction
The versions of the imports - rdkit=2022.03, python==3.9.12, tmap==1.0.6, pandas==1.4.3 and mnfp==1.9.2
The code works until the last line and then I get error Illegal instruction

import pandas as pd
import tmap as tm
from rdkit.Chem import AllChem
from mhfp.encoder import MHFPEncoder

df = pd.read_csv('test.smi', sep='\t')

enc = MHFPEncoder(1024)
lf = tm.LSHForest(1024, 64, store=True)

fps = []
smiles = []
mol_id = []
clss = []

for i, row in df.iterrows():
    mol = AllChem.MolFromSmiles(row['smiles'])
    fps.append(tm.VectorUint(enc.encode_mol(mol)))

    smiles.append(row['smiles'])
    mol_id.append(row['mol_id'])
    clss.append(row['class'])

lf.batch_add(fps)
lf.index()

cfg = tm.LayoutConfiguration()
cfg.node_size = 1 / 70
cfg.mmm_repeats = 2
cfg.sl_repeats = 2

x, y, s, t, _ = tm.layout_from_lsh_forest(lf, cfg)

Maybe you know what could potentially be the problem?

LSHForest.batch_add() resulting in empty tmap

I am analysing a batch of chemicals using the protocol on the following blog: https://xinhaoli74.github.io/posts/2020/05/TMAP/

However, the output plots are empty. I suspect this is due to the LSHForest.batch_add() command as LSHForest.index() does not report anything. Has the functions within LSHForest changed at all over the last year? It has been a while since I ran the method and I am unsure where I am going wrong, the scripts being used are the same and the environments should be identical.

Advice on dealing with nan values for colormaps

I may want to post this issue in the faerun-python github but posting here for now since its relevant to tmap.

I'm creating tmaps for chemical libraries and coloring each structure by various numerical (non-categorical) properties (eg. cLogP). How would you suggest dealing with missing data points (eg. only having some cLogP measurements for some samples)? I tried using np.nan to fill the missing values but the html files come out blank, black background & menu appear but no tree or legend.

It also generates a Runtime Exception when calculating the diff variable in faerun.py because the max/min values yield nan.

RuntimeWarning: invalid value encountered in double_scalars

A couple things I tried in editing faerun.py: Changing the the way diff is calculated to use np.nanmin and np.nanmax removes the runtime warning but still doesn't create the plot correctly. Similarly, trying to set the color for NaN values using cmap.set_bad(color='gray') doesn't work either.

Support for Python >=3.10

Description:
The codebase is currently support Python 3.9, I believe it's important for us to adapt code to be compatible with this newer version, python >=3.10.

Expected Behavior:
Code should be compatible with Python 3.10 without any errors or warnings.

Installation of tmap - macOS Monterey Apple M1 Chip

Met with multiple issues trying to install tmap into my Conda environments.

  1. installing via "conda install -c tmap tmap"
    This approach successfully installs a package called tmap, but I suspect it is the repo contained at https://github.com/GPZ-Bioinfo/tmap instead due to the matching file structure. Regardless, the install appears to be successful but once I attempt "import tmap" in a script or in the terminal, my kernel immediately dies.

  2. installing via GitHub repo address
    I am met with an error after a long pause where pip cannot access the GitHub repo. When git clone starts to run, the following error messages appear:

fatal: unable to connect to github.com: github.com[0: 140.82.121.4]: errno=Operation timed out

  1. installing via clone and running setup.py
    This approach results in various errors, most predominately a CMake Error. Also tried running this in a Rosetta terminal in case its an incompatibility with M1, but I am met with the same error.
CMake Error at /opt/homebrew/Cellar/cmake/3.23.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
      Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)

Would anyone who has successfully installed tmap while using the M1 chip be able to provide some advice on how to install the package? Thank you

Embedding javascript in the output HTML tmap file

When generating a tmap in an jupyter notebook, the output is a single HTML file that includes the javascript used to plot the map within the HTML file. Oddly, when turning the same code into a script, it generates the .js and .html files separately. Is there an option that can be called to embed the javascript within the html file, such that there's only one file thats created?

AttributeError: module 'tmap' has no attribute 'layout_from_edge_list' on WSL2

I'm trying to run the "Laying out a Simple Graph" script on WSL2 and I'm getting this problem:

Traceback (most recent call last):
  File "tmap.py", line 40, in <module>
    main()
  File "tmap.py", line 18, in main
    x, y, s, t, _ = tm.layout_from_edge_list(
AttributeError: module 'tmap' has no attribute 'layout_from_edge_list'

I saw that a person had this issue running on MAC OS and I would like to know if the module has been tested on WSL2 environment or If I am doing something wrong to run the module.

can not search

image

My dear friends, when I search, anything I type is not searchable, whether it is searchable or not, it does not respond.
What should I do? thank you

Search not working as expected

Are there restrictions on what strings we can search by? I am able to search by some terms, but not all.

I am using the standard '__' for label formatting.

In the (heavily redacted) example below, I can search by NRX04W03.35091.x.36336, the ID_cycleA:, and some of the other values, but cannot search by 'WT(multi) = Binder' or by 'Selectivity(multi) = ' for example, among other strings I've tried to search. I've tried this with multiple different tmaps.

screenshot

dot '.' in name will cause blank plot.

Just in case anyone has the same issue, to get the empty results.
The solution is to remove the dot in your canvas's name.
For example, if you set 'myTmap.good' in
faerun.add_scatter( 'myTmap.good',..... )
Or in
point_helper='myTmap.good'
you will get empty results.
Change 'myTmap.good' to 'myTmap_good' will solve it.
This is definitely a bug.

Installation from pip

It looks like there are some C++ dependencies in tmap. Would it be possible to build some wheels and distribute them on PyPI? I'm personally not very happy to use conda as a crutch to installing things, as it introduces a manual step installation that works pretty differently on system to system.

Tmap plot colors and viewing are no longer rendering correctly in Windows Chrome or Brave

image

Tmaps are no longer rendering properly in Chrome or Brave. This includes the drugbank example and tmaps that I generated for my own data that previously worked wonderfully.

In some tmaps (but not all!) it also acts like I'm trying to explore the tree in 3D space and lets me rotate the entire tree around as opposed to moving the tree around in 2D space. This is not the case for the drugbank example, but is true for older tmaps that I generated that previously worked fine.

github actions not producing arm64 Mach-0 files but local builds do

The wheels downloaded from the github actions do not contain arm-64 format binaries for osx, but locally built files, using the same process on BOTH intel and arm64 machines, do.

  1. Download the artifacts from an action e.g. this one
  2. Unzip the artifacts
  3. Unzip any of the tmap_viz-*-macosx_*_arm64.whl
  4. From inside the unzip file location In terminal run file _tmap.cpython-<X>-darwin.so (also run it on the other binaries)
  5. Notice that the .so is amd64 instead of the requested arm64 format

ModuleNotFoundError: No module named 'tmap.core'; 'tmap' is not a package

When I try to import Faerun, I find a bug.

Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 8.4.0
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32
from faerun import Faerun
Traceback (most recent call last):
  File "C:\Users\liu\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-f83fed409234>", line 1, in <cell line: 1>
    from faerun import Faerun
  File "C:\Program Files\JetBrains\PyCharm 2021.3.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\liu\anaconda3\lib\site-packages\faerun\__init__.py", line 4, in <module>
    from .plot import FaerunPlot
  File "C:\Program Files\JetBrains\PyCharm 2021.3.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\liu\anaconda3\lib\site-packages\faerun\plot.py", line 3, in <module>
    from tmap.core import TMAPEmbedding
  File "C:\Program Files\JetBrains\PyCharm 2021.3.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'tmap.core'; 'tmap' is not a package

This seems to be caused by a wrong import of your latest version, from tmap.core import TMAPEmbedding,
I don't understand why this was inported, tmap seems to have no core or TMAPEmbedding.

I solved by downgrading to 0.4.0.
pip install faerun==0.4.0

However, I think you should fix this bug.
Thanks.

Docs on support for platform `osx-arm64`

Apparently there are some workarounds to get tmap running on macOS, but there does not seem to be a stable release that supports the arm64 architecture. Is there a chance this will happen in the future? Until then, I would suggest mentioning in the README, that tmap for macOS only works on the legacy Intel architectures and Rosetta.

Instructions for building from scratch

Last summer I really needed tmap working on my pc, but package didn't work due to "unspecified cpu instruction".
I had ubuntu 18.06. My friend had this package working on his mac, and another friend on Arch.. well he didn't want to build it)
Also there was no package for Windows. So... i had to build it from scratch.

Tmap assembly instructions

You need to know that:

OGDF related links:

Also there is anaconda repository - https://anaconda.org/tmap/tmap

All packages were collected by the author through this script - https://github.com/reymond-group/tmap/blob/master/azure-pipelines.yml

The test method of the assembled package is to run the script https://github.com/reymond-group/tmap/blob/master/tmap/tests/test_layout.py
Line 19 should work -### it is critical. If you debug and have error on this line - check previous errors. If still wrong - good luck.

tldr instructions:

  1. you need clean conda or miniconda for python 3.7 (3.6 is acceptable, but definitely not 3.8 - I don't think tmap can be built on this version of python)
  2. create an environment so as not to fuck up your own
  3. do conda install conda-build
  4. do conda build ogdf-conda
  5. do conda build tmap
  6. install built packages from local source (just google, it is easy)

IMPORTANT -after steps 4 and 5 you can find in console output locations of log files. Save them in safe place just in case.
sidenote - I had pythons 3.6 and 3.7, so two packages were built in my case.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.