reymond-group / tmap Goto Github PK

A very fast visualization library for large, high-dimensional data sets.

CMake 0.60% Batchfile 0.01% Shell 0.01% Python 1.94% C++ 96.67% CSS 0.01% HTML 0.02% C 0.52% Makefile 0.01% NASL 0.24%

visualization visualization-tools visualization-library python-library plotting graph-algorithms graphics

tmap's Issues

LSHForest.batch_add() resulting in empty tmap

I am analysing a batch of chemicals using the protocol on the following blog: https://xinhaoli74.github.io/posts/2020/05/TMAP/

However, the output plots are empty. I suspect this is due to the LSHForest.batch_add() command as LSHForest.index() does not report anything. Has the functions within LSHForest changed at all over the last year? It has been a while since I ran the method and I am unsure where I am going wrong, the scripts being used are the same and the environments should be identical.

What is the license of tmap?

I'm unable to find the license of tmap

Instructions for building from scratch

Last summer I really needed tmap working on my pc, but package didn't work due to "unspecified cpu instruction".
I had ubuntu 18.06. My friend had this package working on his mac, and another friend on Arch.. well he didn't want to build it)
Also there was no package for Windows. So... i had to build it from scratch.

Tmap assembly instructions

You need to know that:

Main url - https://github.com/reymond-group/tmap
The article is here - https://arxiv.org/abs/1908.10410
The site is here (go through the web archive) - http://tmap.gdb.tools/
Tmap depends on the ogdf package. The repository contains ogdf and tmap. Although the ogdf can be downloaded pre-built from conda repository, I built both - since the error was in the ogdf part.

OGDF related links:

Also there is anaconda repository - https://anaconda.org/tmap/tmap

All packages were collected by the author through this script - https://github.com/reymond-group/tmap/blob/master/azure-pipelines.yml

The test method of the assembled package is to run the script https://github.com/reymond-group/tmap/blob/master/tmap/tests/test_layout.py
Line 19 should work -### it is critical. If you debug and have error on this line - check previous errors. If still wrong - good luck.

tldr instructions:

you need clean conda or miniconda for python 3.7 (3.6 is acceptable, but definitely not 3.8 - I don't think tmap can be built on this version of python)
create an environment so as not to fuck up your own
do conda install conda-build
do conda build ogdf-conda
do conda build tmap
install built packages from local source (just google, it is easy)

IMPORTANT -after steps 4 and 5 you can find in console output locations of log files. Save them in safe place just in case.
sidenote - I had pythons 3.6 and 3.7, so two packages were built in my case.

Enable pickling of tmap.VectorUint

Hi, I'm trying to compute MHFP's in parallel and index them in a parallel fashion. The code looks like

def fp_function(pair):
    smi, molid = pair
    mol = AllChem.MolFromSmiles(smi)
    fp = tmap.VectorUint(enc.encode_mol(mol, min_radius=0))
    return molid, fp

num_cores = multiprocessing.cpu_count()
fps = Parallel(n_jobs=num_cores)(delayed(fp_function)(input_pair) for input_pair in molcsv)
pickle.dump(fps, open("libcomp_fps.pkl", 'wb'))

However on running this I get

Traceback (most recent call last):
  File "/Users/guha/src/tmap/lsh_query.py", line 27, in <module>
    fps = Parallel(n_jobs=num_cores)(delayed(fp_function)(input_pair) for input_pair in molcsv)
  File "/anaconda3/envs/rdkit/lib/python3.6/site-packages/joblib/parallel.py", line 996, in __call__
    self.retrieve()
  File "/anaconda3/envs/rdkit/lib/python3.6/site-packages/joblib/parallel.py", line 899, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/anaconda3/envs/rdkit/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 517, in wrap_future_result
    return future.result(timeout=timeout)
  File "/anaconda3/envs/rdkit/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/anaconda3/envs/rdkit/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
TypeError: can't pickle tmap.VectorUint objects

Are there plans to make VectorUint pickleable? Or is there an alternative approach to parallelizing this type of computation?

Installation from pip

It looks like there are some C++ dependencies in tmap. Would it be possible to build some wheels and distribute them on PyPI? I'm personally not very happy to use conda as a crutch to installing things, as it introduces a manual step installation that works pretty differently on system to system.

Docs on support for platform `osx-arm64`

Apparently there are some workarounds to get tmap running on macOS, but there does not seem to be a stable release that supports the arm64 architecture. Is there a chance this will happen in the future? Until then, I would suggest mentioning in the README, that tmap for macOS only works on the legacy Intel architectures and Rosetta.

Kernel dying when trying to run `tm.layout_from_edge_list()`

Hi tmap @reymond-group, @daenuprobst @undeadpixel, and community

First of all ! It is a great tool, so congratulations for your work and manuscript, and thanks a lot for your effort support it.

The tmap documentation is very clear and detailed, however I found the following issue:

I am trying to run tm.layout_from_edge_list() on a pairwise similarity list, and was able to run the example provided on https://tmap.gdb.tools/#simple-graph. But when trying to run from a different edge_list object, the kernel dies without any error message, so I am very puzzled.
I looked at the object type and it matches perfectly the exemple object .. so I am very puzzled.

Anyone had seen a similar issue ? Anyone would know what is wrong with this ?
I put a Binder link online so you can directly run the notebook online.
https://mybinder.org/v2/gh/lfnothias/tmap_testing/46b4bbf833fdd22fdbc1de26dd0157cb663f4006.

Thanks in advance

importError on macOS Mojave

I followed the instructions on http://tmap.gdb.tools/, I installed conda first and conda install -c tmap tmap

But I got ImportError when try to import tmap to follow your examples:

ImportError: dlopen(/xxx/envs/lib/python3.7/site-packages/tmap.cpython-37m-darwin.so, 2): Library not loaded: @rpath/libomp.dylib
  Referenced from: /xxx/envs/lib/python3.7/site-packages/tmap.cpython-37m-darwin.so
  Reason: image not found

Which step did I do it wrong? thanks.

impossible to install mol-map-backend on M1 mac

I try to install using the #23 but It failed at the tmap compilation. Would be nice to have a fix on this.

Change the style of datapoints

Hi Daniel,

Is there any way the style used to draw the datapoints can be adapted? Per default, every node of the tmap is visualized as a circle. I would like to show a subset of the samples not as dots but as triangles/squares/crosses or so.

I digged in the documentation but couldn't find anything. Is there any way to control this?

Illegal instruction

Hello,

This is the first time I have encountered error Illegal instruction
The versions of the imports - rdkit=2022.03, python==3.9.12, tmap==1.0.6, pandas==1.4.3 and mnfp==1.9.2
The code works until the last line and then I get error Illegal instruction

import pandas as pd
import tmap as tm
from rdkit.Chem import AllChem
from mhfp.encoder import MHFPEncoder

df = pd.read_csv('test.smi', sep='\t')

enc = MHFPEncoder(1024)
lf = tm.LSHForest(1024, 64, store=True)

fps = []
smiles = []
mol_id = []
clss = []

for i, row in df.iterrows():
    mol = AllChem.MolFromSmiles(row['smiles'])
    fps.append(tm.VectorUint(enc.encode_mol(mol)))

    smiles.append(row['smiles'])
    mol_id.append(row['mol_id'])
    clss.append(row['class'])

lf.batch_add(fps)
lf.index()

cfg = tm.LayoutConfiguration()
cfg.node_size = 1 / 70
cfg.mmm_repeats = 2
cfg.sl_repeats = 2

x, y, s, t, _ = tm.layout_from_lsh_forest(lf, cfg)

Maybe you know what could potentially be the problem?

kNN graph for a subset of entries in the index

Hi, examining the API for the LSHForest class, I see that it has a method get_knn_graph. If I understand correctly, it will construct the kNN graph for all entries in the index.

Would it be possible to have a method that constructs the kNN graph for a subset of entries in the index, specified as a list of VectorUint objects or list of id's

Installation of tmap - macOS Monterey Apple M1 Chip

Met with multiple issues trying to install tmap into my Conda environments.

installing via "conda install -c tmap tmap"
This approach successfully installs a package called tmap, but I suspect it is the repo contained at https://github.com/GPZ-Bioinfo/tmap instead due to the matching file structure. Regardless, the install appears to be successful but once I attempt "import tmap" in a script or in the terminal, my kernel immediately dies.
installing via GitHub repo address
I am met with an error after a long pause where pip cannot access the GitHub repo. When git clone starts to run, the following error messages appear:

fatal: unable to connect to github.com: github.com[0: 140.82.121.4]: errno=Operation timed out

installing via clone and running setup.py
This approach results in various errors, most predominately a CMake Error. Also tried running this in a Rosetta terminal in case its an incompatibility with M1, but I am met with the same error.

CMake Error at /opt/homebrew/Cellar/cmake/3.23.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
      Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)

Would anyone who has successfully installed tmap while using the M1 chip be able to provide some advice on how to install the package? Thank you

Conda install command yields someone else's package

Using conda install -c tmap tmap
Installs someone else's software which is also named tmap (https://github.com/GPZ-Bioinfo/tmap).

Is there a way to install this package from a git clone of this repository?

AttributeError: module 'tmap' has no attribute 'Minhash'

I followed installation, but

dim = 2048
ENC = tm.Minhash(dim)

Run this code, AttrubuteError happened.

help me plz!

AttributeError: module 'tmap' has no attribute 'layout_from_edge_list' on WSL2

I'm trying to run the "Laying out a Simple Graph" script on WSL2 and I'm getting this problem:

Traceback (most recent call last):
  File "tmap.py", line 40, in <module>
    main()
  File "tmap.py", line 18, in main
    x, y, s, t, _ = tm.layout_from_edge_list(
AttributeError: module 'tmap' has no attribute 'layout_from_edge_list'

I saw that a person had this issue running on MAC OS and I would like to know if the module has been tested on WSL2 environment or If I am doing something wrong to run the module.

ModuleNotFoundError: No module named 'tmap.core'; 'tmap' is not a package

When I try to import Faerun, I find a bug.

Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 8.4.0
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32
from faerun import Faerun
Traceback (most recent call last):
  File "C:\Users\liu\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3398, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-f83fed409234>", line 1, in <cell line: 1>
    from faerun import Faerun
  File "C:\Program Files\JetBrains\PyCharm 2021.3.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\liu\anaconda3\lib\site-packages\faerun\__init__.py", line 4, in <module>
    from .plot import FaerunPlot
  File "C:\Program Files\JetBrains\PyCharm 2021.3.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\liu\anaconda3\lib\site-packages\faerun\plot.py", line 3, in <module>
    from tmap.core import TMAPEmbedding
  File "C:\Program Files\JetBrains\PyCharm 2021.3.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'tmap.core'; 'tmap' is not a package

This seems to be caused by a wrong import of your latest version, from tmap.core import TMAPEmbedding,
I don't understand why this was inported, tmap seems to have no core or TMAPEmbedding.

I solved by downgrading to 0.4.0.
pip install faerun==0.4.0

However, I think you should fix this bug.
Thanks.

Patches

Here are two patches I used to be able to create a conda package for tmap using a modern compiler C++11.

diff --git a/tmap/tmap/analyse.cc b/tmap/tmap/analyse.cc
index b282b74..1d6e686 100644
--- a/tmap/tmap/analyse.cc
+++ b/tmap/tmap/analyse.cc
@@ -30,7 +30,7 @@ namespace {
 
     std::vector<float> diff(weights.size());
     std::transform(weights.begin(), weights.end(), diff.begin(),
-                  std::bind2nd(std::minus<float>(), mean));
+                  std::bind(std::minus<float>(), std::placeholders::_1, mean));
     float sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
     float stdev = std::sqrt(sq_sum / weights.size());
 
@@ -64,7 +64,7 @@ namespace {
 
       std::vector<float> diff(weights.size());
       std::transform(weights.begin(), weights.end(), diff.begin(),
-                    std::bind2nd(std::minus<float>(), mean));
+                    std::bind(std::minus<float>(), std::placeholders::_1, mean));
       float sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
       float stdev = std::sqrt(sq_sum / weights.size());

diff --git a/tmap/setup.py b/tmap/setup.py
index 28c0d06..4d7e240 100644
--- a/tmap/setup.py
+++ b/tmap/setup.py
@@ -54,8 +54,6 @@ class CMakeBuild(build_ext):
             cmake_args += [
                 "-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{}={}".format(cfg.upper(), extdir)
             ]
-            if sys.maxsize > 2 ** 32:
-                cmake_args += ["-A", "x64"]
             build_args += ["--", "/m"]
         elif platform.system() == 'Darwin':
             cmake_args += ['-DOpenMP_C_FLAG=-fopenmp']

can not search

My dear friends, when I search, anything I type is not searchable, whether it is searchable or not, it does not respond.
What should I do? thank you

AttributeError: module 'tmap' has no attribute 'layout_from_edge_list'

Getting the following error when running the simple graph
AttributeError: module 'tmap' has no attribute 'layout_from_edge_list'

Installation on Mac M1 Monterey 12.5.1

Hi there, I have been trying to install TMAP on my macbook pro running M1 and have tried various means after conda didn't work for me (using rosetta etc). I was wondering if anyone has managed to install this successfully?
much appreciated.
Shawn

Tmap display

When I generated tmap following the drugBank example in your document, TMap generated HTML files normally, but did not display tmap tree.

Jupyter Kernel Dying while using batch_add

Running the following commands seem to both kill my jupyter kernel and I can't figure out why. I appreciate the help:
lf.batch_add(enc.batch_from_weight_array(fps, method="I2CWS"))
lf.index()

Thanks

Using tmap with other fingerprints

I want to use the TMAP functions with other RDKit-based fingerprints. Is there any way to do so?

Tag a new version

Could you add a tag to this git repository? This is useful to create a conda forge package.

tree is built without leaves

Good morning,

I went through all the stages of plotting fearun as usual, without any experiments. All of a sudden, now I got a tree, but without dots on it. I didn't find any problems with smiles, they should be correct. What could be the reason for this behavior?

Thank you,
Alina

Tmap plot colors and viewing are no longer rendering correctly in Windows Chrome or Brave

Tmaps are no longer rendering properly in Chrome or Brave. This includes the drugbank example and tmaps that I generated for my own data that previously worked wonderfully.

In some tmaps (but not all!) it also acts like I'm trying to explore the tree in 3D space and lets me rotate the entire tree around as opposed to moving the tree around in 2D space. This is not the case for the drugbank example, but is true for older tmaps that I generated that previously worked fine.

Search not working as expected

Are there restrictions on what strings we can search by? I am able to search by some terms, but not all.

I am using the standard '__' for label formatting.

In the (heavily redacted) example below, I can search by NRX04W03.35091.x.36336, the ID_cycleA:, and some of the other values, but cannot search by 'WT(multi) = Binder' or by 'Selectivity(multi) = ' for example, among other strings I've tried to search. I've tried this with multiple different tmaps.

conda install not working

I get the following error when attempting the conda install (conda install -c tmap tmap):

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

tmap

Advice on dealing with nan values for colormaps

I may want to post this issue in the faerun-python github but posting here for now since its relevant to tmap.

I'm creating tmaps for chemical libraries and coloring each structure by various numerical (non-categorical) properties (eg. cLogP). How would you suggest dealing with missing data points (eg. only having some cLogP measurements for some samples)? I tried using np.nan to fill the missing values but the html files come out blank, black background & menu appear but no tree or legend.

It also generates a Runtime Exception when calculating the diff variable in faerun.py because the max/min values yield nan.

RuntimeWarning: invalid value encountered in double_scalars

A couple things I tried in editing faerun.py: Changing the the way diff is calculated to use np.nanmin and np.nanmax removes the runtime warning but still doesn't create the plot correctly. Similarly, trying to set the color for NaN values using cmap.set_bad(color='gray') doesn't work either.

github actions not producing arm64 Mach-0 files but local builds do

The wheels downloaded from the github actions do not contain arm-64 format binaries for osx, but locally built files, using the same process on BOTH intel and arm64 machines, do.

Download the artifacts from an action e.g. this one
Unzip the artifacts
Unzip any of the tmap_viz-*-macosx_*_arm64.whl
From inside the unzip file location In terminal run file _tmap.cpython-<X>-darwin.so (also run it on the other binaries)
Notice that the .so is amd64 instead of the requested arm64 format

Map is missing São Tomé and Príncipe

Hello,

I am trying to highlight different countries on the world map using tmap and I noticed that São Tomé and Príncipe in Africa is not listed. How can I add this so I can work on my project?

Many thanks

AttributeError: module 'tmap' has no attribute 'Minhash'

I am trying to install tmap. After installing through pip, I get the error
AttributeError: module 'tmap' has no attribute 'Minhash'. I see that this issue previously created #22 (comment) but no answer was given. How can I properly install tmap?

Availability of processed datasets for example reproducibility

Dear Reymond Group,

Thank you very much for the development and release of this great tool! As I started to look through some of the examples, I wondered if you might be able to make the processed data available for the worked examples. I'm guessing my data will be most similar in format/shape to the RNA Seq data, but I'm having some issues confirming that.

For instance, in the RNA Sequencing example, your input files are generically named ("data.csv.xz" and "labels.csv":

DATA = pd.read_csv("data.csv.xz", index_col=0, sep=",")
LABELS = pd.read_csv("labels.csv", index_col=0, sep=",")

I see at the top of that file the data source is https://gdc.cancer.gov/about-data/publications/pancanatlas, but when I follow that URL, I'm not clear on which file in particular I should download and if there's any further processing required to get the "labels.csv."

Is it the EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv file (it's 1.88 GB in size)?

So, I'm wondering if you'd be able to host the processed example data files on your site? Or offer more info on their shape/format?

Thanks again,
Ian

What if I have a distance matrix (Gram matrix)?

Hello,
It would be nice if you can work from that.
I.e. what I want to map are not molecules, but I have the full distance matrix
for the dataset.
Regards,
F.

Embedding javascript in the output HTML tmap file

When generating a tmap in an jupyter notebook, the output is a single HTML file that includes the javascript used to plot the map within the HTML file. Oddly, when turning the same code into a script, it generates the .js and .html files separately. Is there an option that can be called to embed the javascript within the html file, such that there's only one file thats created?

dot '.' in name will cause blank plot.

Just in case anyone has the same issue, to get the empty results.
The solution is to remove the dot in your canvas's name.
For example, if you set 'myTmap.good' in
faerun.add_scatter( 'myTmap.good',..... )
Or in
point_helper='myTmap.good'
you will get empty results.
Change 'myTmap.good' to 'myTmap_good' will solve it.
This is definitely a bug.

crash on linux calling `layout_from_lsh_forest` with 0-size `LSHForest`

Repo:

build with debug symbol on linux (I used the python:3.9.10-bullseye docker image)
call into layout_from_lsh_forest with a zero-size LSHForest
observe a crash when de-referencing an invalid end-iterator

The cause seems to be float max_x = *max_element(x.begin(), x.end()); where we dereference the invalid end-iterator returned by max_element when called with an empty container. This should crash more often, but although I've not looked into it properly, but because we dereference into a float we probably just get garbage data on the other platforms.

I have a fix for this code path and MSTFromLSHForest where we return an tuple of empty objects for an empty forest. The fix is on the development branch.

Support for Python >=3.10

Description:
The codebase is currently support Python 3.9, I believe it's important for us to adapt code to be compatible with this newer version, python >=3.10.

Expected Behavior:
Code should be compatible with Python 3.10 without any errors or warnings.

Incorrect code in docs for ProteomeHD example

The code to generate the ProteomeHD tmap example is incorrect, it is a duplicate of the code for the RNA seq example.

reymond-group / tmap Goto Github PK

tmap's Issues

Recommend Projects

Recommend Topics

Recommend Org