reymond-group / tmap Goto Github PK
View Code? Open in Web Editor NEWA very fast visualization library for large, high-dimensional data sets.
Home Page: http://tmap.gdb.tools
A very fast visualization library for large, high-dimensional data sets.
Home Page: http://tmap.gdb.tools
I am analysing a batch of chemicals using the protocol on the following blog: https://xinhaoli74.github.io/posts/2020/05/TMAP/
However, the output plots are empty. I suspect this is due to the LSHForest.batch_add() command as LSHForest.index() does not report anything. Has the functions within LSHForest changed at all over the last year? It has been a while since I ran the method and I am unsure where I am going wrong, the scripts being used are the same and the environments should be identical.
I'm unable to find the license of tmap
Last summer I really needed tmap working on my pc, but package didn't work due to "unspecified cpu instruction".
I had ubuntu 18.06. My friend had this package working on his mac, and another friend on Arch.. well he didn't want to build it)
Also there was no package for Windows. So... i had to build it from scratch.
Tmap assembly instructions
You need to know that:
OGDF related links:
Also there is anaconda repository - https://anaconda.org/tmap/tmap
All packages were collected by the author through this script - https://github.com/reymond-group/tmap/blob/master/azure-pipelines.yml
The test method of the assembled package is to run the script https://github.com/reymond-group/tmap/blob/master/tmap/tests/test_layout.py
Line 19 should work -### it is critical. If you debug and have error on this line - check previous errors. If still wrong - good luck.
tldr instructions:
conda install conda-build
conda build ogdf-conda
conda build tmap
IMPORTANT -after steps 4 and 5 you can find in console output locations of log files. Save them in safe place just in case.
sidenote - I had pythons 3.6 and 3.7, so two packages were built in my case.
Hi, I'm trying to compute MHFP's in parallel and index them in a parallel fashion. The code looks like
def fp_function(pair):
smi, molid = pair
mol = AllChem.MolFromSmiles(smi)
fp = tmap.VectorUint(enc.encode_mol(mol, min_radius=0))
return molid, fp
num_cores = multiprocessing.cpu_count()
fps = Parallel(n_jobs=num_cores)(delayed(fp_function)(input_pair) for input_pair in molcsv)
pickle.dump(fps, open("libcomp_fps.pkl", 'wb'))
However on running this I get
Traceback (most recent call last):
File "/Users/guha/src/tmap/lsh_query.py", line 27, in <module>
fps = Parallel(n_jobs=num_cores)(delayed(fp_function)(input_pair) for input_pair in molcsv)
File "/anaconda3/envs/rdkit/lib/python3.6/site-packages/joblib/parallel.py", line 996, in __call__
self.retrieve()
File "/anaconda3/envs/rdkit/lib/python3.6/site-packages/joblib/parallel.py", line 899, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/anaconda3/envs/rdkit/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 517, in wrap_future_result
return future.result(timeout=timeout)
File "/anaconda3/envs/rdkit/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/anaconda3/envs/rdkit/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
TypeError: can't pickle tmap.VectorUint objects
Are there plans to make VectorUint
pickleable? Or is there an alternative approach to parallelizing this type of computation?
It looks like there are some C++ dependencies in tmap. Would it be possible to build some wheels and distribute them on PyPI? I'm personally not very happy to use conda as a crutch to installing things, as it introduces a manual step installation that works pretty differently on system to system.
Apparently there are some workarounds to get tmap running on macOS, but there does not seem to be a stable release that supports the arm64
architecture. Is there a chance this will happen in the future? Until then, I would suggest mentioning in the README, that tmap for macOS only works on the legacy Intel architectures and Rosetta.
Hi tmap @reymond-group, @daenuprobst @undeadpixel, and community
First of all ! It is a great tool, so congratulations for your work and manuscript, and thanks a lot for your effort support it.
The tmap documentation is very clear and detailed, however I found the following issue:
tm.layout_from_edge_list()
on a pairwise similarity list, and was able to run the example provided on https://tmap.gdb.tools/#simple-graph. But when trying to run from a different edge_list
object, the kernel dies without any error message, so I am very puzzled.Anyone had seen a similar issue ? Anyone would know what is wrong with this ?
I put a Binder link online so you can directly run the notebook online.
https://mybinder.org/v2/gh/lfnothias/tmap_testing/46b4bbf833fdd22fdbc1de26dd0157cb663f4006.
Thanks in advance
I followed the instructions on http://tmap.gdb.tools/, I installed conda first and conda install -c tmap tmap
But I got ImportError when try to import tmap
to follow your examples:
ImportError: dlopen(/xxx/envs/lib/python3.7/site-packages/tmap.cpython-37m-darwin.so, 2): Library not loaded: @rpath/libomp.dylib
Referenced from: /xxx/envs/lib/python3.7/site-packages/tmap.cpython-37m-darwin.so
Reason: image not found
Which step did I do it wrong? thanks.
I try to install using the #23 but It failed at the tmap compilation. Would be nice to have a fix on this.
Hi Daniel,
Is there any way the style used to draw the datapoints can be adapted? Per default, every node of the tmap is visualized as a circle. I would like to show a subset of the samples not as dots but as triangles/squares/crosses or so.
I digged in the documentation but couldn't find anything. Is there any way to control this?
Hello,
This is the first time I have encountered error Illegal instruction
The versions of the imports - rdkit=2022.03
, python==3.9.12
, tmap==1.0.6
, pandas==1.4.3
and mnfp==1.9.2
The code works until the last line and then I get error Illegal instruction
import pandas as pd
import tmap as tm
from rdkit.Chem import AllChem
from mhfp.encoder import MHFPEncoder
df = pd.read_csv('test.smi', sep='\t')
enc = MHFPEncoder(1024)
lf = tm.LSHForest(1024, 64, store=True)
fps = []
smiles = []
mol_id = []
clss = []
for i, row in df.iterrows():
mol = AllChem.MolFromSmiles(row['smiles'])
fps.append(tm.VectorUint(enc.encode_mol(mol)))
smiles.append(row['smiles'])
mol_id.append(row['mol_id'])
clss.append(row['class'])
lf.batch_add(fps)
lf.index()
cfg = tm.LayoutConfiguration()
cfg.node_size = 1 / 70
cfg.mmm_repeats = 2
cfg.sl_repeats = 2
x, y, s, t, _ = tm.layout_from_lsh_forest(lf, cfg)
Maybe you know what could potentially be the problem?
Hi, examining the API for the LSHForest
class, I see that it has a method get_knn_graph
. If I understand correctly, it will construct the kNN graph for all entries in the index.
Would it be possible to have a method that constructs the kNN graph for a subset of entries in the index, specified as a list of VectorUint
objects or list of id's
Met with multiple issues trying to install tmap into my Conda environments.
installing via "conda install -c tmap tmap"
This approach successfully installs a package called tmap, but I suspect it is the repo contained at https://github.com/GPZ-Bioinfo/tmap instead due to the matching file structure. Regardless, the install appears to be successful but once I attempt "import tmap" in a script or in the terminal, my kernel immediately dies.
installing via GitHub repo address
I am met with an error after a long pause where pip cannot access the GitHub repo. When git clone starts to run, the following error messages appear:
fatal: unable to connect to github.com: github.com[0: 140.82.121.4]: errno=Operation timed out
CMake Error at /opt/homebrew/Cellar/cmake/3.23.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
Would anyone who has successfully installed tmap while using the M1 chip be able to provide some advice on how to install the package? Thank you
Using conda install -c tmap tmap
Installs someone else's software which is also named tmap (https://github.com/GPZ-Bioinfo/tmap).
Is there a way to install this package from a git clone of this repository?
I followed installation, but
dim = 2048
ENC = tm.Minhash(dim)
Run this code, AttrubuteError happened.
help me plz!
I'm trying to run the "Laying out a Simple Graph" script on WSL2 and I'm getting this problem:
Traceback (most recent call last):
File "tmap.py", line 40, in <module>
main()
File "tmap.py", line 18, in main
x, y, s, t, _ = tm.layout_from_edge_list(
AttributeError: module 'tmap' has no attribute 'layout_from_edge_list'
I saw that a person had this issue running on MAC OS and I would like to know if the module has been tested on WSL2 environment or If I am doing something wrong to run the module.
When I try to import Faerun, I find a bug.
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.
PyDev console: using IPython 8.4.0
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32
from faerun import Faerun
Traceback (most recent call last):
File "C:\Users\liu\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3398, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-f83fed409234>", line 1, in <cell line: 1>
from faerun import Faerun
File "C:\Program Files\JetBrains\PyCharm 2021.3.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\liu\anaconda3\lib\site-packages\faerun\__init__.py", line 4, in <module>
from .plot import FaerunPlot
File "C:\Program Files\JetBrains\PyCharm 2021.3.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\liu\anaconda3\lib\site-packages\faerun\plot.py", line 3, in <module>
from tmap.core import TMAPEmbedding
File "C:\Program Files\JetBrains\PyCharm 2021.3.2\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'tmap.core'; 'tmap' is not a package
This seems to be caused by a wrong import of your latest version, from tmap.core import TMAPEmbedding
,
I don't understand why this was inported, tmap seems to have no core or TMAPEmbedding.
I solved by downgrading to 0.4.0.
pip install faerun==0.4.0
However, I think you should fix this bug.
Thanks.
Here are two patches I used to be able to create a conda package for tmap using a modern compiler C++11.
diff --git a/tmap/tmap/analyse.cc b/tmap/tmap/analyse.cc
index b282b74..1d6e686 100644
--- a/tmap/tmap/analyse.cc
+++ b/tmap/tmap/analyse.cc
@@ -30,7 +30,7 @@ namespace {
std::vector<float> diff(weights.size());
std::transform(weights.begin(), weights.end(), diff.begin(),
- std::bind2nd(std::minus<float>(), mean));
+ std::bind(std::minus<float>(), std::placeholders::_1, mean));
float sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
float stdev = std::sqrt(sq_sum / weights.size());
@@ -64,7 +64,7 @@ namespace {
std::vector<float> diff(weights.size());
std::transform(weights.begin(), weights.end(), diff.begin(),
- std::bind2nd(std::minus<float>(), mean));
+ std::bind(std::minus<float>(), std::placeholders::_1, mean));
float sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
float stdev = std::sqrt(sq_sum / weights.size());
diff --git a/tmap/setup.py b/tmap/setup.py
index 28c0d06..4d7e240 100644
--- a/tmap/setup.py
+++ b/tmap/setup.py
@@ -54,8 +54,6 @@ class CMakeBuild(build_ext):
cmake_args += [
"-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{}={}".format(cfg.upper(), extdir)
]
- if sys.maxsize > 2 ** 32:
- cmake_args += ["-A", "x64"]
build_args += ["--", "/m"]
elif platform.system() == 'Darwin':
cmake_args += ['-DOpenMP_C_FLAG=-fopenmp']
Getting the following error when running the simple graph
AttributeError: module 'tmap' has no attribute 'layout_from_edge_list'
Hi there, I have been trying to install TMAP on my macbook pro running M1 and have tried various means after conda didn't work for me (using rosetta etc). I was wondering if anyone has managed to install this successfully?
much appreciated.
Shawn
When I generated tmap following the drugBank example in your document, TMap generated HTML files normally, but did not display tmap tree.
Running the following commands seem to both kill my jupyter kernel and I can't figure out why. I appreciate the help:
lf.batch_add(enc.batch_from_weight_array(fps, method="I2CWS"))
lf.index()
Thanks
I want to use the TMAP functions with other RDKit-based fingerprints. Is there any way to do so?
Could you add a tag to this git repository? This is useful to create a conda forge package.
Tmaps are no longer rendering properly in Chrome or Brave. This includes the drugbank example and tmaps that I generated for my own data that previously worked wonderfully.
In some tmaps (but not all!) it also acts like I'm trying to explore the tree in 3D space and lets me rotate the entire tree around as opposed to moving the tree around in 2D space. This is not the case for the drugbank example, but is true for older tmaps that I generated that previously worked fine.
Are there restrictions on what strings we can search by? I am able to search by some terms, but not all.
I am using the standard '__' for label formatting.
In the (heavily redacted) example below, I can search by NRX04W03.35091.x.36336, the ID_cycleA:, and some of the other values, but cannot search by 'WT(multi) = Binder' or by 'Selectivity(multi) = ' for example, among other strings I've tried to search. I've tried this with multiple different tmaps.
I get the following error when attempting the conda install (conda install -c tmap tmap):
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
I may want to post this issue in the faerun-python github but posting here for now since its relevant to tmap.
I'm creating tmaps for chemical libraries and coloring each structure by various numerical (non-categorical) properties (eg. cLogP). How would you suggest dealing with missing data points (eg. only having some cLogP measurements for some samples)? I tried using np.nan to fill the missing values but the html files come out blank, black background & menu appear but no tree or legend.
It also generates a Runtime Exception when calculating the diff
variable in faerun.py because the max/min values yield nan.
RuntimeWarning: invalid value encountered in double_scalars
A couple things I tried in editing faerun.py: Changing the the way diff is calculated to use np.nanmin and np.nanmax removes the runtime warning but still doesn't create the plot correctly. Similarly, trying to set the color for NaN values using cmap.set_bad(color='gray') doesn't work either.
The wheels downloaded from the github actions do not contain arm-64 format binaries for osx, but locally built files, using the same process on BOTH intel and arm64 machines, do.
tmap_viz-*-macosx_*_arm64.whl
file _tmap.cpython-<X>-darwin.so
(also run it on the other binaries)amd64
instead of the requested arm64
formatHello,
I am trying to highlight different countries on the world map using tmap and I noticed that São Tomé and Príncipe in Africa is not listed. How can I add this so I can work on my project?
Many thanks
I am trying to install tmap. After installing through pip, I get the error
AttributeError: module 'tmap' has no attribute 'Minhash'
. I see that this issue previously created #22 (comment) but no answer was given. How can I properly install tmap?
Dear Reymond Group,
Thank you very much for the development and release of this great tool! As I started to look through some of the examples, I wondered if you might be able to make the processed data available for the worked examples. I'm guessing my data will be most similar in format/shape to the RNA Seq data, but I'm having some issues confirming that.
For instance, in the RNA Sequencing example, your input files are generically named ("data.csv.xz" and "labels.csv":
DATA = pd.read_csv("data.csv.xz", index_col=0, sep=",")
LABELS = pd.read_csv("labels.csv", index_col=0, sep=",")
I see at the top of that file the data source is https://gdc.cancer.gov/about-data/publications/pancanatlas, but when I follow that URL, I'm not clear on which file in particular I should download and if there's any further processing required to get the "labels.csv."
Is it the EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv file (it's 1.88 GB in size)?
So, I'm wondering if you'd be able to host the processed example data files on your site? Or offer more info on their shape/format?
Thanks again,
Ian
Hello,
It would be nice if you can work from that.
I.e. what I want to map are not molecules, but I have the full distance matrix
for the dataset.
Regards,
F.
When generating a tmap in an jupyter notebook, the output is a single HTML file that includes the javascript used to plot the map within the HTML file. Oddly, when turning the same code into a script, it generates the .js and .html files separately. Is there an option that can be called to embed the javascript within the html file, such that there's only one file thats created?
Just in case anyone has the same issue, to get the empty results.
The solution is to remove the dot in your canvas's name.
For example, if you set 'myTmap.good' in
faerun.add_scatter( 'myTmap.good',..... )
Or in
point_helper='myTmap.good'
you will get empty results.
Change 'myTmap.good' to 'myTmap_good' will solve it.
This is definitely a bug.
Repo:
layout_from_lsh_forest
with a zero-size LSHForest
The cause seems to be float max_x = *max_element(x.begin(), x.end());
where we dereference the invalid end-iterator returned by max_element
when called with an empty container. This should crash more often, but although I've not looked into it properly, but because we dereference into a float we probably just get garbage data on the other platforms.
I have a fix for this code path and MSTFromLSHForest
where we return an tuple of empty objects for an empty forest. The fix is on the development branch.
Description:
The codebase is currently support Python 3.9, I believe it's important for us to adapt code to be compatible with this newer version, python >=3.10.
Expected Behavior:
Code should be compatible with Python 3.10 without any errors or warnings.
The code to generate the ProteomeHD tmap example is incorrect, it is a duplicate of the code for the RNA seq example.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.