ma-compbio / phylo-hmrf Goto Github PK

View Code? Open in Web Editor NEW

15.0 8.0 3.0 9.33 MB

License: MIT License

Python 60.73% MATLAB 2.80% C++ 36.46%

3d-genome comparative-genomics gaussian-process machine-learning

phylo-hmrf's People

Contributors

Stargazers

Watchers

Forkers

spurshaw bowangxjtu wxx0316

phylo-hmrf's Issues

Error using example input

I tried to run with example input with command
python phylo_hmrf.py -n 20 -p $input_dir -r 1 --reload 0 --chromvec 21,22 --miter 100.
$input_dir to Phylo-HMRF-master/example_input.

But I got this error.

out of bound error! 1
out of bound error! 2
out of bound error! 3
ou_optimize_init likelihood -10.2005793894
number in the cluster 12511 8
out of bound error! 1
/home/cheyizhuo/project/cross_Fo_HiC/mycode/Phylo-HMRF-master/phylo_hmrf.py:1257: RuntimeWarning: divide by zero encountered in divide
ratio1 = lambda1/(2beta1)
/home/cheyizhuo/project/cross_Fo_HiC/mycode/Phylo-HMRF-master/phylo_hmrf.py:1271: RuntimeWarning: invalid value encountered in double_scalars
values[i,1] = ratio1[i](1-beta1_exp[i]**2) + values[p_idx[i],1]*(beta1_exp[i]**2)
Traceback (most recent call last):
File "/home/cheyizhuo/project/cross_Fo_HiC/mycode/Phylo-HMRF-master/phylo_hmrf.py", line 1760, in
opts.simu_version, opts.annotation, opts.reload, opts.dtype, opts.miter, opts.resolution, opts.quantile, opts.ref_species, opts.output)
File "/home/cheyizhuo/project/cross_Fo_HiC/mycode/Phylo-HMRF-master/phylo_hmrf.py", line 1738, in run
params_vec1, params_vec2, params_vecList, iter_id1, iter_id2, cost_vec, state_vec = tree1.fit_accumulate_test(samples, len_vec, threshold, filename, m_iter)
File "/home/cheyizhuo/project/cross_Fo_HiC/mycode/Phylo-HMRF-master/base.py", line 307, in fit_accumulate_test
self._init(X, lengths=lengths)
File "/home/cheyizhuo/project/cross_Fo_HiC/mycode/Phylo-HMRF-master/phylo_hmrf.py", line 246, in _init
self.init_ou_params = self.init_ou_param(X1[0:select_num1], init_label[0:select_num1], self.means)
File "/home/cheyizhuo/project/cross_Fo_HiC/mycode/Phylo-HMRF-master/phylo_hmrf.py", line 196, in _init_ou_param
cur_param, lik = self._ou_optimize_init(x1, mean_values[i])
File "/home/cheyizhuo/project/cross_Fo_HiC/mycode/Phylo-HMRF-master/phylo_hmrf.py", line 1438, in _ou_optimize_init
flag, params1 = self._ou_optimize_init_unit(X, mean_values)
File "/home/cheyizhuo/project/cross_Fo_HiC/mycode/Phylo-HMRF-master/phylo_hmrf.py", line 1493, in _ou_optimize_init_unit
constraints=con1, tol=1e-6, options={'disp': False})
File "/home/cheyizhuo/miniconda3/envs/py27/lib/python2.7/site-packages/scipy/optimize/_minimize.py", line 458, in minimize
constraints, callback=callback, **options)
File "/home/cheyizhuo/miniconda3/envs/py27/lib/python2.7/site-packages/scipy/optimize/slsqp.py", line 370, in _minimize_slsqp
raise ValueError("Objective function must return a scalar")
ValueError: Objective function must return a scalar

Hi,
Thanks for developing this cool tool.
The description of HiC input files is a bit vague. Is it possible detailed documentation and a small test data somewhere to make sure the scripts are actually working properly?

Thanks!

How do I know the correspondence between numbers and states?

Hi，
How can I relate the number to the state, for example, 1-NC−pan_low, 2-NC−pyg_high, how can I know the correspondence?

Thanks!

random : input of edges and branch length as newick?

Hi,
I know this is a bit random, but I stumbled here when a colleague asked me for help to understand your input format.
I was able to help him, however, I was thinking that your edges.txt and branch.txt could advangeously be replaced by a single tree in newick format, which is the accepted standard way of representing simple binary trees with branch lengths in the phylogenetic community.

Perhaps there are technical/model-based llimitations that would prevent this?

Anyhow, I just wished to try to help you improve on your software which seems, otherwise, quite useful.

cheers.

Hi-C contact frequency files - which genome to use?

Hi, thank you for developing the tool. I am confused as to which reference genome is being used when generating Hi-C contact frequency files of each species (to be stored in path_list.txt file).
"The normalized Hi-C contact frequency file on a chromosome could be extracted from the raw genome-wide Hi-C contact frequency file of the species using tools such as the Juicer Tools."
Here, the .hic file to use for extracting the Hi-C contact frequency - is it generated using the reference genome when aligning the Hi-C sequencing reads, or is it a file of the species the reads were generated from (non-reference)?

Can Phylo-HMRF do pairwise comparison?

Hi,
Very delicacy approach Phylo-HMRF has! I would like to try it in my research.
I have a question though, can Phylo-HMRF do comparison between two species? and how can I do it?

cheers.

Error running on example input

Hello,

I'm trying to run phyloHMRF on the example data provided in the repo with the example command provided:

python phylo_hmrf.py -n 20 -r 1 --reload 0 --chromvec 21,22 --miter 100 -p example_input/

however, I get the following error:

NameError: global name 'filename3' is not defined

Thank you,
Kathleen

About the synteny block

Hi,
Thanks for developing this wonderful tool.
I notice that this tool is based on synteny blocks between different species and these synteny blocks are identified by inferCARs (Ma et al., 2006, http://www.bx.psu.edu/miller_lab/car/). I know this software is also made by Pro.Ma. I wonder if you use the original version of inferCARs or a updated version? Because I find it difficult to find the input files of original inferCARs. For example, could you please tell me how do you find the net files and chain files for every chromosomes when you find synteny blocks between human and Chimpanzee (hg38 vs panTro5). Because I only find hg38.panTro5.all.chain.gz and hg38.panTro5.net.gz on http://hgdownload.soe.ucsc.edu/goldenPath/hg38/vsPanTro5/. But the inferCARs seems to require files like chr1.chain, chr1.net, chr2.chain, chr2.net etc.

Thanks for your help and I'm looking forward to your reply!

read_state_test.m error

I got this error message during the MATLAB stage:
Undefined function or variable 'color_map_sub'.
Error in read_state_test (line 29)
What do you think the reason for this error is? I would appreciate it very much if you could help me! Thanks.

Conda easy-ish installation guide

Dear Yang Yang,

This may be a bit random of an issue, but I thought I would like to simply share the following with the Phylo-HMRF community.

First of all, I honestly believe that Phylo-HMRF is a very nice tool, with a great potential. Congrats on the work and the idea!

I've spent a couple of days struggling to install the package and trying to make it work, thus I thought that it could be worth sharing this modest tutorial install guide... Not sure if it will be a universal solution, but it worked in our Linux cluster and could also serve as an inspiration. ;)

I thought that the easiest would be to put it into a conda environment. Note that some package versions are not exactly the same as the ones Yang Yang used because they were not available through conda and I wanted to stick to it as much as possible, but the test run with the example_input provided by the authors worked perfectly (~30 minutes on an 8 CPU, 24Gb machine).

# Download packages and setup a new conda environment
conda create -n phyloHMRF # create a new conda environment
conda activate phyloHMRF # activate it!
conda install python=2.7
conda install -c anaconda scikit-learn=0.19.0
conda install -c anaconda pandas=0.20.3
pip install medpy==0.3.0 # This had to be installed through pip
# conda install -c conda-forge scikit-image=0.12.3  # Not working; version not available.
# Substitute with:
pip install scikit-image==0.12.3 # that should do it :)

# Getting Phylo-HMRF
cd ~/software/ # Make or go to a directory where you would like to put the Phylo-HMRF package
git clone https://github.com/ma-compbio/Phylo-HMRF.git # git clone this very same repo!

# Installing the Python wrapper for pygco:
wget https://github.com/yujiali/pygco/archive/refs/heads/master.zip # Download the wrapper
unzip master.zip 
rm master.zip # clean

# As pointed out by the authors of Phylo-HMRF, the original source binary files from `pygco` are not present in repo above. 
# Luckily, Yang et al., have stored a copy of the files under Phylo-HMRF/gco_source/
# We can simply transfer those files to the pygco repo downloaded earlier, like this:
cp ~/software/Phylo-HMRF/gco_source/* ~/software/pygco-master/gco_source/

# Now we have to compile those pygco libs:
cd ~/software/pygco-master/
make all
make test_wrapper

# Add the current folder to your  $LD_LIBRARY_PATH, and also to the $PYTHONPATH
# You can add these lines to your bashrc init file 
# Substitute the path for your own path, so that the program can find those pygco libraries
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/people/juarod/software/pygco-master/ 
# The same for python to find the python wrapper functions 
export PYTHONPATH=$PYTHONPATH:/home/people/juarod/software/pygco-master/

# TEST THE BINARY OF THE WRAPPER
./test_wrapper
# You should see the following output:
labels = [ 0 2 2 1 ], energy=19
data energy=15, smooth energy=4

# NOW TEST THE PYTHON WRAPPER
# Uncomment the last lines in test.py invoking the __main__ call and run it like:
python test.py
# You should not be getting errors or output. Additionally, some images have been generated in the images/ folder.

# TIME TO TEST RUN Phylo-HMRF!:
cd /home/people/guard/software/Phylo-HMRF # Go to your Phylo-HMRF downloaded folder
conda activate phyloHMRF # activate the conda environment (if you have closed the session, or if you are working on another tab)
# Run the program!
python phylo_hmrf.py -n 20 -r 1 --reload 0 --chromvec 21,22 --miter 100 -p /home/people/juarod/software/Phylo-HMRF/example_input
# It is advisable to add the full path to the example_input folder, with the `-p` option, so that the program can always find the input files
# … And that should be all!

Hope someone finds it useful!
I am also attaching the yaml file, obtained from my work environment (seems like *.yaml extension files can't be uploaded in here, so just change the extension to that.)
phyloHMRF.txt

Thank you so much, Yang!

Cheers,
Juan

AttributeError: 'module' object has no attribute 'cut_general_graph'

Hello,

I am having trouble with the pygco installation. I attempted all of the various suggestions for installation including pip install gco-wrapper which appeared to work successfully. I installed the wrapper with the following:
pip install -r requirements.txt
python setup.py install
and this said it installed the dependencies successfully.

However, when I run the example for phylohmrf I keep getting the attached error.
Command:
python phylo_hmrf.py -n 20 -r 1 --reload 0 --chromvec 21,22 --miter 100 -p ./example_input/
phylohmrf_gco_error.txt

Any feedback into how to solve this issue would be appreciated.

Thank you,
Nicole

Compatibility with python v3?

Hello,

Thanks for developing this cool tool! I noticed it is tested on Python 2.7.15. Is it also compatible with Python 3? If not, are there plans to add that compatibility, given that Python 2.7 is being deprecated soon?

Thanks!
Kathleen

Error on the example input "AttributeError: 'bool' object has no attribute 'keys'"

Hi,

When I'm trying to test the code with the example data in the "Phylo-HMRF/example_input" directory with the command:
python ../phylo_hmrf.py -n 20 -r 1 --reload 0 --chromvec 21,22 --miter 100 ./

I got the following error messages:

estimate type 0
edge list loaded
[[0, 1], [1, 2], [1, 3], [3, 4], [4, 5], [4, 6], [3, 7]]
branch list loaded
[0.0, 32.0, 20.0, 6.0, 6.0, 6.0, 12.0]
species names loaded
gorGor4
path list loaded
['example_input/test_data/hic_gorGor4', 'example_input/test_data/hic_panTro5', 'example_input/test_data/hic_panPan2', 'example_input/test_data/hic_hg38']
21
/jdfssz1/ST_EARTH/P18Z10200N0122/zhouyang/software/Phylo-HMRF/utility.py:2575: FutureWarning: read_table is deprecated, use read_csv instead.
  data1 = pd.read_table(filename1,header=None,sep='\t')
934.0
File example_input/test_data/hic_gorGor4/chr21.50K.txt does not exist. Please check.
Traceback (most recent call last):
  File "../phylo_hmrf.py", line 1761, in <module>
    opts.simu_version, opts.annotation, opts.reload, opts.dtype, opts.miter, opts.resolution, opts.quantile, opts.ref_species, opts.output)
  File "../phylo_hmrf.py", line 1661, in run
    m_vec_list = utility.quantile_contact_vec(chrom_vec,resolution,ref_filename,filename_list,species)
  File "/jdfssz1/ST_EARTH/P18Z10200N0122/zhouyang/software/Phylo-HMRF/utility.py", line 2468, in quantile_contact_vec
    m_vec = quantile_contact(chrom,resolution,ref_filename,filename_list, species)
  File "/jdfssz1/ST_EARTH/P18Z10200N0122/zhouyang/software/Phylo-HMRF/utility.py", line 2482, in quantile_contact
    keys_vec = list(vec1.keys())
AttributeError: 'bool' object has no attribute 'keys'

Could you help look into it?

Besides, I'm planning to use the software to investigate the Hi-C change in a larger range of species. However, by mapping the Hi-C reads to human (as the reference), species closed to human have a higher mapping rate and more Hi-C contact reads, while some species distant to human have lower number of them. I'm wondering whether that matters, and what's the maximum phylogenetic distance (e.g. 30 million years) do you recommend to include when using the software?

Many thanks

Best,
Yang

ValueError: shape mismatch

Hello,

I have created all of the files to run phylo-hmrf with the correct formatting and file names. I am not using hg38 so I commented out lines 387-393 in utility.py, please let me know if this is not the correct approach. I also tried tried to run the software without commenting out these lines and the same error resulted.

The software starts running and then results in the error in the file attached. If you could lend feedback about what might be causing this error that would be greatly appreciated.

Thank you,
Nicole

I run the following command:
python phylo_hmrf.py -r 1 --resolution 5000 --chromvec 1 --ref_species dmel -p ./dmel_phylo/

phylohmrf_error.txt

ma-compbio / phylo-hmrf Goto Github PK

phylo-hmrf's People

Contributors

Stargazers

Watchers

Forkers

phylo-hmrf's Issues

Recommend Projects

Recommend Topics

Recommend Org