ma-compbio / higashi Goto Github PK

View Code? Open in Web Editor NEW

74.0 74.0 9.0 210.63 MB

single-cell Hi-C, scHi-C, Hi-C, 3D genome, nuclear organization, hypergraph

License: MIT License

Python 0.49% HTML 0.01% CSS 0.01% Jupyter Notebook 99.51%

3d-genome hypergraph machine-learning single-cell

higashi's People

Contributors

Stargazers

Watchers

Forkers

zhuakexi minheeparklab skambha6 sdontsay dylan-plummer skelviper accompany0313 ericli0419 jlchen5

higashi's Issues

ValueError: Found array with 1 feature(s) (shape=(250, 1)) while a minimum of 2 is required by TruncatedSVD

Hi!
I have simulated mouse data and I would like to perform a cell clustering using Higachi program.
But I always get this error when running the program. It seems like Temp objects do not contain any data.

generating start/end dict for chromosome extracting from data.txt 100%|████████████████████████████████████████████████████████████████████████████████| 39410250/39410250 [01:58<00:00, 332438.27it/s] generating contact maps for baseline data loaded 250 False creating matrices tasks: 100%|█████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.19it/s] total_feats_size 168 0%| | 0/1 [00:00<?, ?it/s]Done here 1 -1 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/fosam16/owner/Higashi/higashi/Higashi_wrapper.py", line 459, in process_data self.create_matrix() File "/home/fosam16/owner/Higashi/higashi/Higashi_wrapper.py", line 492, in create_matrix create_matrix(self.config) File "/home/fosam16/owner/Higashi/higashi/Process.py", line 717, in create_matrix temp1, c = generate_feats_one(temp[0], temp[1], size, length, c, qc_list[c]) File "/home/fosam16/owner/Higashi/higashi/Process.py", line 971, in generate_feats_one temp1 = TruncatedSVD(n_components=2, algorithm='randomized', n_iter=2).fit_transform(temp1) File "/home/fosam16/owner/myvenvhig/lib/python3.8/site-packages/sklearn/decomposition/_truncated_svd.py", line 218, in fit_transform X = self._validate_data(X, accept_sparse=["csr", "csc"], ensure_min_features=2) File "/home/fosam16/owner/myvenvhig/lib/python3.8/site-packages/sklearn/base.py", line 577, in _validate_data X = check_array(X, input_name="X", **check_params) File "/home/fosam16/owner/myvenvhig/lib/python3.8/site-packages/sklearn/utils/validation.py", line 918, in check_array raise ValueError( ValueError: Found array with 1 feature(s) (shape=(250, 1)) while a minimum of 2 is required by TruncatedSVD.

Please can you help me to figure out what I am doing wrong or what's the problem ?
Here is all about config file and mousse cells data

ValueError: setting an array element with a sequence.

Hi Dr. Zhang，

When I execute Higashi's higashi_model.train_for_imputation_nbr_0(), I get the following error:

Traceback (most recent call last):
File "/home/zzl/.conda/envs/higashi/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/ygc/Higashi/higashi/Higashi_wrapper.py", line 324, in one_thread_generate_neg
to_neighs = np.array(to_neighs)[:-1]
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2515,) + inhomogeneous part.

I tried to change the Numpy version, but when I changed to 1.22.0, 1.19.2, 1.24.6, the problem still did not resolve.
Could you help me solve this problem? Thank you very much

batch information not loaded?

Hi Ruochi,
I tried the new version of Higashi on the classical Lee et al. 2019 data, but find that the result still doesn't include the batch information. The zip file contains my json file and the pickle file. I think the batch is included clearly. Do you know what might be the error?

DataInfo.zip

Yours sincerely,
Siqi Shen

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (15361,) + inhomogeneous part.

Dear developers, I encounter this error when running higashi_model.train_for_imputation_nbr_0().

[...]/mamba_root/envs/higashi/lib/python3.9/site-packages/higashi-0.1.0a0-py3.9.egg/higashi/Higashi_wrapper.py", line 321, in one_thread_generate_neg
    to_neighs = np.array(to_neighs)[:-1]
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (15361,) + inhomogeneous part.
"""

Do you have any clues about what might be the problem? Maybe my version of Numpy is too recent?
Any advice is appreciated.

My script looks like this.

from higashi.Higashi_wrapper import *
from fasthigashi.FastHigashi_Wrapper import *
config = './../higashi_WGD/config_wgd_chr21.JSON'
higashi_model = Higashi(config)
import os

num_gpus = os.environ['CUDA_VISIBLE_DEVICES'].split(',').__len__()
print(num_gpus)
print(torch.cuda.is_available())

local_rank = 0 
torch.cuda.set_device(local_rank)

os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(f'{i}' for i in range(num_gpus))

print("Processing data...")
higashi_model.process_data()

print("Initializing...")
# Initialize the model
fh_model = FastHigashi(
    config_path=config,
    path2input_cache="../higashi_WGD/tmp/",
    path2result_dir="../higashi_WGD/tmp/",
    off_diag=100,  # 0-100th diag of the contact maps would be used.
    filter=False,  # fit the model on high quality cells, transform the rest
    do_conv=False,  # linear convolution imputation
    do_rwr=False,  # partial random walk with restart imputation
    do_col=False,  # sqrt_vc normalization
    no_col=False,
)  # force to not do sqrt_vc normalization

print("Preparing...")
# Pack from sparse mtx to tensors
fh_model.prep_dataset()


print("Running...")
fh_model.run_model(dim1=0.6, rank=16, n_iter_parafac=1, extra="WGD")

print("Preparing and training...")
higashi_model.prep_model()
# Stage 1 training
higashi_model.train_for_embeddings()

print("Preparing for inputation...")
higashi_model.train_for_imputation_nbr_0()
higashi_model.impute_no_nbr()

higashi_model.train_for_imputation_with_nbr()
higashi_model.impute_with_nbr()

Here is my conda env:

higashi.txt

Unrelated but maybe useful for you: I had to force the cuda device to be zero in get_free_gpu() to be used in a Slurm cluster. It oddly kept switching to GPU 1 when only a GPU was requested (GPU 0) - blocking execution.

Problem running Higashi on Ramani et al.

Hi Ruochi,

Hope you are doing well! I am trying to run Higashi on the Ramani et al. dataset with all the parameters in the JSON file default, but the resolution is 100Kb. I tried several times, but I keep getting the following errors,

Traceback (most recent call last):
File "/home/.../Higashi/higashi.py", line 19, in
higashi_model.train_for_imputation_nbr_0()
File "/home/.../Higashi/higashi/Higashi_wrapper.py", line 1350, in train_for_imputation_nbr_0
self.train_for_imputation_no_nbr()
File "/home/.../Higashi/higashi/Higashi_wrapper.py", line 1356, in train_for_imputation_no_nbr
self.higashi_model = torch.load(self.save_path + "_stage1_model", map_location=self.current_device)
File "/home/.../sdontalk/lib/python3.9/site-packages/torch/serialization.py", line 909, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/.../sdontalk/lib/python3.9/site-packages/torch/serialization.py", line 358, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/.../sdontalk/lib/python3.9/site-packages/torch/serialization.py", line 339, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/home/.../Ramani_100k/model/model.chkpt_stage1_model'

I used Higashi before, and the last time I run Higashi was about half a year ago, I haven't seen this error before. The old version code is now invalid for running because of many updates of other dependent packages, so I downloaded the newest version of Higashi. Do you have any idea why this error happens?

Thanks!

Predicting structures from embedding vector

Hi ruochiz,

Thank you for your wonderful tool. I started using the original Higashi API (not Fast-Higashi).
I followed the Ramani et al.ipynb tutorial and it runs smoothly.
After training, I want to generate predicted contact matrix of new samples.
Is there a way to feed coordinates of embedding vector to the higashi_model object to generate the corresponding contact matrices (virtual 'raw', 'impute no nbr' and 'impute with nbr' matrices')?

Donald

Cannot access to wiki/Higashi-config

Hi ruochiz,
It seem that the config section is missing in the wiki page. It report that "You do not have permission to update this wiki".

Error running simulated data

Hi, I run into the following error while trying to run some simulated dataset. We have several settings, and two settings gave us the same error regardless of parameters settings.

100%|██████████| 166314/166314 [00:00<00:00, 309719.43it/s]
generating start/end dict for chromosome
extracting from data.txt
generating contact maps for baseline
data loaded
166314 False
creating matrices tasks: 100%|██████████| 1/1 [00:00<00:00,  2.69it/s]
100%|██████████| 1/1 [00:00<00:00, 20.34it/s]
0.608: 100%|██████████| 300/300 [00:00<00:00, 300.39it/s]
100%|██████████| 1/1 [00:00<00:00, 46091.25it/s]
100%|██████████| 1/1 [00:00<00:00, 54471.48it/s]
total_feats_size 80
cpu_num 1
setting to gpu:0
training on data from: ['chr1']
total_sparsity_cell 0.8946236559139785
contractive loss
batch_size 256
Node type num [100  61] [100 161]
start making attribute
loss 0.6080093383789062 loss best 0.6089616417884827 epochs 299

initializing data generator
initializing data generator
First stage training
[ Epoch 0 of 60 ]
 - (Training) :   0%|          | 0/1000 [00:00<?, ?it/s]concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/.conda/envs/higashi/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/.conda/envs/higashi/lib/python3.9/site-packages/higashi-0.1.0a0-py3.9.egg/higashi/Higashi_wrapper.py", line 248, in one_thread_generate_neg
    raise EOFError
EOFError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Data/run_higashi.py", line 10, in <module>
    higashi_model.train_for_embeddings()
  File "/.conda/envs/higashi/lib/python3.9/site-packages/higashi-0.1.0a0-py3.9.egg/higashi/Higashi_wrapper.py", line 1336, in train_for_embeddings
    self.train(
  File "/.conda/envs/higashi/lib/python3.9/site-packages/higashi-0.1.0a0-py3.9.egg/higashi/Higashi_wrapper.py", line 1113, in train
    bce_loss, mse_loss, train_accu, auc1, auc2, str1, str2, train_pool, train_p_list = self.train_epoch(
  File "/.conda/envs/higashi/lib/python3.9/site-packages/higashi-0.1.0a0-py3.9.egg/higashi/Higashi_wrapper.py", line 914, in train_epoch
    batch_edge_big, batch_y_big, batch_edge_weight_big, batch_chrom_big, batch_to_neighs_big, chroms_in_batch = p.result()
  File "/.conda/envs/higashi/lib/python3.9/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "/.conda/envs/higashi/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
EOFError

The JSON input:

{
    "config_name": "test run",
    "input_format": "higashi_v1",
    "data_dir": "/Datasimba1_7k",
    "temp_dir": "/Data/simba1_7k/temp",
    "genome_reference_path": "/Data/test_run/simulation/chrom.size",
    "cytoband_path": "/Data/test_run/simulation/cytoband.txt",
    "chrom_list": [
        "chr1"
    ],
    "resolution": 1000000,
    "resolution_cell": 1000000,
    "minimum_distance": 2000000,
    "maximum_distance": -1,
    "local_transfer_range": 1,
    "dimensions": 30,
    "impute_list": [
        "chr1"
    ],
    "minimum_impute_distance": 0,
    "maximum_impute_distance": -1,
    "neighbor_num": 5,
    "call_tads": false,
    "embedding_name": "exp_zinb3",
    "cpu_num": 1,
    "gpu_num": 1,
    "optional_smooth": false,
    "optional_quantile": false,
    "rank_thres": 1,
    "loss_mode": "zinb",
    "random_walk": false,
    "UMAP_params": {
        "n_neighbors": 20
    }
}

I did a little debugging myself, and for the runs that failed (very early into training), len(neg_list) (line 248 ish in Higashi_wrapper.py) usually is <5. In runs that didn't fail, it's usually around 100.

Time consumption

Hi @ruochiz ,
I would like to try Higashi using public data. And is there any information about the time consumption of model training? Does an entry level PC(31GB, i7-4770, GTX750Ti) meet the hardware requirements？

Moudle import error

Sorry to bother you again.
I 've seccessfully installed higashi. When I try to run tutorials, it produces the error shown in the figure.
It seems that there is something wrong in process.py, how can I fix it.

Higahi stop at train_for_imputation_nbr_0 on both API and CLI.

Hi, ruochi! My higashi work smoothly until it comes with train for imputation, and there is no error or warning.

And here is my higashi experiment:
Package Version

asciitree 0.3.3
asttokens 2.0.5
attrs 21.4.0
backcall 0.2.0
bleach 5.0.0
bokeh 3.0.0.dev5
brotlipy 0.7.0
certifi 2021.10.8
cffi 1.15.0
charset-normalizer 2.0.4
click 8.1.2
cooler 0.8.11
cryptography 36.0.0
cycler 0.11.0
Cython 3.0.0a10
cytoolz 0.10.1
debugpy 1.5.1
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.4
entrypoints 0.4
executing 0.8.3
fastjsonschema 2.15.3
fbpca 1.0
fonttools 4.33.2
h5py 3.6.0
higashi 0.1.0a0
idna 3.3
importlib-metadata 4.11.3
importlib-resources 5.7.1
ipykernel 6.9.1
ipython 8.2.0
ipython-genutils 0.2.0
ipywidgets 7.7.0
jedi 0.18.1
Jinja2 3.1.1
joblib 1.1.0
jsonschema 4.4.0
jupyter-client 7.2.2
jupyter-core 4.9.2
jupyterlab-widgets 1.1.0
kiwisolver 1.4.2
llvmlite 0.38.0
MarkupSafe 2.0.1
matplotlib 3.5.1
matplotlib-inline 0.1.2
mistune 0.8.4
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
multiprocess 0.70.12.2
nbconvert 5.6.1
nbformat 5.3.0
nest-asyncio 1.5.5
notebook 5.7.11
numba 0.55.1
numpy 1.21.5
packaging 21.3
pandas 1.3.4
pandocfilters 1.5.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.0.1
pip 21.2.4
prometheus-client 0.14.1
prompt-toolkit 3.0.20
ptyprocess 0.7.0
pure-eval 0.2.2
pycparser 2.21
pyfaidx 0.6.4
Pygments 2.11.2
pynndescent 0.5.6
pyOpenSSL 22.0.0
pypairix 0.3.7
pyparsing 3.0.8
pyrsistent 0.18.0
PySocks 1.7.1
python-dateutil 2.8.2
pytz 2022.1
PyYAML 6.0
pyzmq 22.3.0
requests 2.27.1
scikit-learn 1.0.2
scipy 1.7.3
seaborn 0.11.2
Send2Trash 1.8.0
setuptools 61.2.0
simplejson 3.17.6
six 1.16.0
stack-data 0.2.0
terminado 0.13.3
testpath 0.6.0
threadpoolctl 3.1.0
toolz 0.11.2
torch 1.11.0
torchaudio 0.11.0
torchvision 0.12.0
tornado 6.1
tqdm 4.64.0
traitlets 5.1.1
typing_extensions 4.1.1
umap-learn 0.5.3
urllib3 1.26.9
wcwidth 0.2.5
webencodings 0.5.1
wheel 0.37.1
widgetsnbextension 3.6.0
xyzservices 2022.4.0
zipp 3.8.0

bug encountered

sorry to bother you again.
I got an exception like this when doing imputation:

[ Epoch 0 of 60 ]

(Training) : 0%| | 0/1000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/opt/service/Higashi-main/Code/main_cell.py", line 1164, in
load_first=False, save_embed=True, save_name="_stage1")
File "/opt/service/Higashi-main/Code/main_cell.py", line 549, in train
model, loss, training_data_generator, optimizer)
File "/opt/service/Higashi-main/Code/main_cell.py", line 136, in train_epoch
batch_edge_weight, batch_chrom, batch_to_neighs, y=batch_y)
File "/opt/service/Higashi-main/Code/main_cell.py", line 47, in forward_batch_hyperedge
pred, pred_var, pred_proba = model(x, (batch_chrom, batch_to_neighs))
File "/opt/service/miniconda3/envs/higishi_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/service/Higashi-main/Code/Higashi_backend/Modules.py", line 737, in forward
dynamic, static, attn = self.get_embedding(x, x_chrom)
File "/opt/service/Higashi-main/Code/Higashi_backend/Modules.py", line 704, in get_embedding
dynamic, static, attn = self.encode1(x, x, x_chrom, slf_attn_mask, non_pad_mask)
File "/opt/service/miniconda3/envs/higishi_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, *kwargs)
File "/opt/service/Higashi-main/Code/Higashi_backend/Modules.py", line 1136, in forward
dynamic = self.pff_n1(dynamic) # non_pad_mask
File "/opt/service/miniconda3/envs/higishi_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/service/Higashi-main/Code/Higashi_backend/Modules.py", line 858, in forward
output = self.w_stacki
File "/opt/service/miniconda3/envs/higishi_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/service/miniconda3/envs/higishi_env/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 298, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/service/miniconda3/envs/higishi_env/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 295, in _conv_forward
self.padding, self.dilation, self.groups)
TypeError: conv1d() received an invalid combination of arguments - got (Tensor, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:

(Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
didn't match because some of the arguments have invalid types: (Tensor, Parameter, Parameter, tuple, tuple, tuple, int)
(Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
didn't match because some of the arguments have invalid types: (Tensor, Parameter, Parameter, tuple, tuple, tuple, int)

I've done imputation on this dataset on another computer and it worked fine, but it goes wrong one this computer. Can you help me debug this problem?
thank you for your time!

General question

Hi Ruochi,
This is an excellent time to come across Higashi.
I have not yet started installing and working on the method..but I wanted to ask whether it will be possible to find subclusters(using TADs, compartments) in a single cell data set with ~100 cells? is that number too low?

Thanks!

scTAD calling

Hey @ruochiz ,
I am trying to call TADs and compartments on my imputed maps at 1mb resolution.
In the scTAD.py code, I use -n --window_ins 1000000 --window_tad 1000000.

Although the wiki says that

When including --neighbor or -n, the algorithm will call TADs on the imputed contact maps with neighboring cells' information utilized.

I get the error:

usage: scTAD.py [-h] [-c CONFIG] [-n NEIGHBOR] [--window_ins WINDOW_INS]
                [--window_tad WINDOW_TAD]
scTAD.py: error: argument -n/--neighbor: expected one argument

Also when I explicitly specify -n 5 the same as what I give "neighbor_num":5 in json file....I get the following error:

unable to open *_nbr_1_impute.hdf5

Any help will be great!!

Your TAD/compartments result figures are really useful and awesome!! Is there a tutorial available to create figures like those?

Higashi stuck on training at higashi_model.train_for_imputation_nbr_0() on SLURM system

Good morning,
Asking once more for your input on deploying Higashi!
I have set up a test run that I could efficiently run locally on my NVidia GPU (chr21, 1mln bp windows). Everything works on my laptop (16Gb VRAM, intel i7).
However, as I move the computation to our SLURM cluster, the process is stuck at higashi_model.train_for_imputation_nbr_0() like this:

Preparing for imputation...
100%|██████████| 1/1 [00:00<00:00, 3305.20it/s]
100%|██████████| 1/1 [00:00<00:00, 10591.68it/s]
pass_pseudo_id False
pass_pseudo_id False
Second stage training

[ Epoch 0 of 45 ]
 - (Training) :   0%|          | 0/1000

I have replicated my local environment on the cluster higashi_env.txt. I also contacted our cluster support, but it seems very hard to understand what is happening here, especially given that the process hangs and no error is thrown. I am using 16 CPUs per task and have access to 40GB of VRAM, so resource-wise, this should not be a problem. I checked if torch is working correctly, and torch.cuda.is_available() returns true.

As I mentioned in a previous issue, I had to force the chosen_id=0 in get_free_gpu() instances throughout the code in order to prevent unexpected behavior when checking available memory on multiple GPUs. I installed Higashi and Fast-Higashi, cloning the repo and running pip install .. Please let me know if you have any idea how I could debug this. I appreciate your responses a lot as I look forward to having imputed matrices with Higashi!

EDIT: our HPC expert suggested this function might be the stuck point, but I haven't checked myself

Higashi/higashi/Higashi_wrapper.py

Line 236 in 1333de2

def one_thread_generate_neg(edges_part, edges_chrom, edge_weight,

Abount parameter neighbor_num

Hi, I'm a little confused about the parameter neighbor_num in the JSON file. neighbor_num should control the number of neighboring cells to incorporate when making imputation. the wiki in Higashi-Usage Step3 says that

Train Higashi with cell-dependent GNN, but with k=0

but the wiki in Output-of-Higashi-main says that

The {k} can be either 1 or the {neighbor_num} parameter specified in the configuration file. When {k}=1, it represents the imputation results without using any neighboring cell information.

which does not agree with the wiki in Higashi-Usage and the paper.
Is it a mistake expression in Output-of-Higashi-main section? can I think the

{chrom_name}_{embedding_name}_nbr_1_impute.hdf5

I get to be the result of 0 nbr in the paper?

another question is that when nbr=0, does the hypergraph built for GNN use only the information from one cell when imputing this cell?

Thanks!

some question about json file

Hello:
In your test ipynb file [see https://github.com/ma-compbio/Higashi/blob/main/tutorials/4DN_sci-Hi-C_Kim%20et%20al.ipynb], i have some question.
1, What is the label_info.pickle. I have data.txt but i don't know how to make the pickle file.
2, In the json,what is the data in the data_dir and temp_dir.

Thank you for your reply.

Integration with deep sequenced bulk Hi-C

Hi ruochiz,
I just test Higashi on scHi-C data with low coverage per cell (~10,000 cis reads per cell), and found the cell diversity is low. Is it possible to combine scHi-C data with deep sequenced bulk Hi-C data to improve the result? BTW, adding bulk data as a single node seems to make no difference.
Thanks!

ERROE when run process.py: no config file

Hi !
When I'm running Process.py python Process.py I got an error like this:
No such file or directory: './config.JSON'
Do I need to construct the file on my own or do I miss some important steps that used to generate the JSON file ?
How can I get this JSON file ?

Thank you very much.

higashi.Higashi_backend.Modules import error

Hi, I installed higashi in conda mode, and the following errors always occur when I use this tool for data analysis. How can I solve this problem?

Uploading higashi.ipynb…

error when running scTAD.py

Hello,
I get the following error when running scTAD.py with default (just python Higashi/higashi/scTAD.py -c config.JSON)

However, if I run at --window_ins 2000000 --window_tad 2000000 it seems to work. Could you let me know why this is the case? Do i need to increase the resolution from previous steps? I set my resolution in config file as follows:

Thank you!!

Some problem about color

Hi
When i change the UMAP plot color, it not change. So i don't know why. In my label_info.pickle, it have cell information.
Thank you for your reply.

quick question

Hey @ruochiz ,
So in the output embedding file {embedding_name}_0_origin.npy , the rows are the cells in the same ordering as the cell_id we give(say 0,1,2,....,#cell-1) to generate the data.txt file right??

Stop with OSError when run "higashi_model.train_for_imputation_nbr_0()"

Hi Ruochiz,

Higashi run very well without any errors when the resolution was 1M (JSON file option "resolution") in my CentOS 7 system.

However, when the resolution increased, no matter which resolution, there was always an OSError as follows after the step "higashi_model.train_for_imputation_nbr_0()":

[ Epoch 42 of 45 ]

(Train) bce: 0.3479, mse: 0.0000, acc: 96.450 %, pearson: 0.571, spearman: 0.634, elapse: 97.359 s
(Valid) bce: 2.7560, acc: 97.025 %,pearson: 0.187, spearman: 0.635,elapse: 0.296 s
no improve: 1
[ Epoch 43 of 45 ]
(Train) bce: 0.3542, mse: 0.0000, acc: 96.321 %, pearson: 0.557, spearman: 0.633, elapse: 96.495 s
(Valid) bce: 3.0346, acc: 96.726 %,pearson: 0.123, spearman: 0.633,elapse: 0.355 s
no improve: 2
[ Epoch 44 of 45 ]
(Train) bce: 0.3466, mse: 0.0000, acc: 96.488 %, pearson: 0.599, spearman: 0.634, elapse: 99.352 s
(Valid) bce: 3.6074, acc: 97.016 %,pearson: 0.157, spearman: 0.636,elapse: 0.356 s
no improve: 3
- (Validation) : 0%| | 0/10 [00:00<?, ?it/s]Traceback (most recent call last):
  File "", line 1, in
  File "/data/yufan/biotools/anaconda/anaconda2023/envs/higashi/lib/python3.9/site-packages/higashi/Higashi_wrapper.py", line 1367, in train_for_imputation_nbr_0
  self.train(
  File "/data/yufan/biotools/anaconda/anaconda2023/envs/higashi/lib/python3.9/site-packages/higashi/Higashi_wrapper.py", line 1141, in train
  valid_bce_loss, valid_accu, valid_auc1, valid_auc2, _, _ = self.eval_epoch(validation_data_generator)
  File "/data/yufan/biotools/anaconda/anaconda2023/envs/higashi/lib/python3.9/site-packages/higashi/Higashi_wrapper.py", line 994, in eval_epoch
  pool = ProcessPoolExecutor(max_workers=cpu_num)
  File "/data/yufan/biotools/anaconda/anaconda2023/envs/higashi/lib/python3.9/concurrent/futures/process.py", line 658, in init
  self._result_queue = mp_context.SimpleQueue()
  File "/data/yufan/biotools/anaconda/anaconda2023/envs/higashi/lib/python3.9/multiprocessing/context.py", line 113, in SimpleQueue
  return SimpleQueue(ctx=self.get_context())
  File "/data/yufan/biotools/anaconda/anaconda2023/envs/higashi/lib/python3.9/multiprocessing/queues.py", line 340, in init
  self._reader, self._writer = connection.Pipe(duplex=False)
  File "/data/yufan/biotools/anaconda/anaconda2023/envs/higashi/lib/python3.9/multiprocessing/connection.py", line 527, in Pipe
  fd1, fd2 = os.pipe()
  OSError: [Errno 24] Too many open files

Would you please let me know the reason of issue?

Thanks a lot.

Yufan (Harry) Zhou

Higashi_vis set up

Hi Ruochi,

Thanks for developing the wonder tools: Higashi. I'm testing the visualization part with Higashi_vis.

As instructed, I first created directory below,
.
├── Higashi
│ ├── higashi
│ └── config_dir
└── └── visual_config.JSON

{
"config_list": [
{"path:/scHiC/Higashi/Ramani"},
]
}

Then I ran the bokeh connected command as instructed
bokeh serve --port=$PORT1 --address=localhost --allow-websocket-origin=*:$PORT1 Higashi_vis/
ERROR: No 'main.py' or 'main.ipynb' in path to /scHiC/Higashi/higashi/Higashi_vis

Did I miss something in this tutorial? What files should be in the Higashi_vis? Thank you.

Problem solved

I cannot install Higashi through conda. My computer doesn't have gpu.

My system is linux without gpu,

create a conda env named higashi
conda install python==3.9
conda install pytorch torchvision torchaudio cpuonly -c pytorch
conda install -c ruochiz higashi
Here comes the error:

What are the configure options mean?

I'm so glad to know Higashi and Fast-Higashi.
But I can't run this tools because I don't know argument of configure.json file
There are any information about configure in github.
Also in Higashi article, there are some information of configure and almost of all didn't described (as I view)

Is there any default value and the reason in test configure file?
I want to know what the "-1" value means (False?) and why you set some resolution, distance and other values.
I want to see all the explanations for my greed, but if it's difficult, I want to know about the main options

And in Fast-Higashi, there is "vis_palette" option.
Is it right that it can be used when I know what clusters exist?
Then, is it impossible using this option in Higashi?

conda install not working

attempting to install on MacBook Pro with OS 10.15.7 I get the error:

PackagesNotFoundError: The following packages are not available from current channels:

higashi

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

Generating a hdf5 file for storing the coassayed signals

Hi,

Do you have a script including the processing of Lee et al methylation data and how it is used to construct a co-assayed signal m_i per cell in the joint analysis?

Thanks.

NameError: name 'neg_num' is not defined

Hi Ruochi,

I followed the tutorial of Higashi on Ramani et al. and encountered the following error message:

Traceback (most recent call last):
  File "/Users/user/opt/anaconda3/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/Users/user/Desktop/Higashi/Higashi/higashi/Higashi_wrapper.py", line 239, in one_thread_generate_neg
    if neg_num == 0:
NameError: name 'neg_num' is not defined

It seems the neg_num global variable is not defined anywhere else. How can I address this problem?

Thanks.

Error in fh_model.prep_dataset() "Pack from sparse mtx to tensors"

Hi, @ruochiz
I met some problems when packing data from sparse matrices to sparse tensors. What can I do to solve this problem? (In fact, I met another problem before (KeyError: "batch id"), so I comment out the code when constructing config_info: "batch_id": "batch id")

data.txt was downloaded from https://drive.google.com/drive/u/0/folders/1SuzqQ_9dliAmTb-fGprFnN3aZrfWS-Fg

Script:

workdir = "/data/home/ruanlab/xiongguangzhou/01.Clustering/02.FastHigashi/"

import pickle
import numpy as np

label_info = {'name': np.arange(1000), 'age': np.ones(1000)}
pickle.dump(label_info, open(datadir+"label_info.pickle", "wb"))

import pandas as pd
data = pd.read_csv(datadir+"data.txt", sep = "\t", nrows = 5)

config = workdir+"config_m3c_pfc_500Kb.JSON"
config_info = {
    "data_dir": datadir,
    "structured": True,
    "input_format": 'higashi_v1',
    "temp_dir": datadir+"Temp",
    "genome_reference_path": datadir+"hg19.chrom.sizes.txt",
    "cytoband_path": datadir+"cytoBand_hg19.txt",
    "chrom_list": ["chr1", "chr2", "chr3", "chr4", "chr5", 
                   "chr6", "chr7", "chr8", "chr9", "chr10", 
                   "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", 
                   "chr17", "chr18", "chr19", "chr20", "chr21", "chr22"],
    "resolution": 500000,
    "resolution_cell": 500000,
    "resolution_fh": [500000],
    "embedding_name": "test",
    "minimum_distance": 500000,
    "maximum_distance": -1,
    "local_transfer_range": 0,
    "loss_mode": "zinb",
    "dimensions": 128,
    "impute_list": ["chr1", "chr2", "chr3", "chr4", "chr5", 
                   "chr6", "chr7", "chr8", "chr9", "chr10", 
                   "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", 
                   "chr17", "chr18", "chr19", "chr20", "chr21", "chr22"],
    "neighbor_num": 5,
    "cpu_num": 10,
    "gpu_num": 8,
    #"batch_id": "batch id",
    "embedding_epoch": 60,
    "correct_be_impute": True
}

import json
with open(config, "w") as f:
    json.dump(config_info, f, indent = 6)

from higashi.Higashi_wrapper import *
from fasthigashi.FastHigashi_Wrapper import *

higashi_model = Higashi(config)
higashi_model.process_data()

fh_model = FastHigashi(config_path = config,
                       path2input_cache = datadir+"Temp",
                       path2result_dir = workdir, 
                       off_diag = 100,
                       filter = False,
                       do_conv = False,
                       do_rwr = False, 
                       do_col = False,
                       no_col = False)

fh_model.prep_dataset(batch_norm = False)

Both fh_model.prep_dataset() and fh_model.prep_dataset(batch_norm = False), I have tried

Error:

Version:

List of packages in environment: "/data/home/ruanlab/xiongguangzhou/software/micromamba/mambaforge/envs/fasthigashi"

  Name                 Version       Build                        Channel           
──────────────────────────────────────────────────────────────────────────────────────
  _libgcc_mutex        0.1           conda_forge                  conda-forge       
  _openmp_mutex        4.5           2_gnu                        conda-forge       
  abseil-cpp           20211102.0    hd4dd3e8_0                   anaconda/pkgs/main
  arrow-cpp            8.0.0         py39h60b952e_0               anaconda/pkgs/main
  asciitree            0.3.3         py_2                         anaconda/pkgs/main
  aws-c-common         0.4.57        he6710b0_1                   anaconda/pkgs/main
  aws-c-event-stream   0.1.6         h2531618_5                   anaconda/pkgs/main
  aws-checksums        0.1.9         he6710b0_0                   anaconda/pkgs/main
  aws-sdk-cpp          1.8.185       hce553d0_0                   anaconda/pkgs/main
  bedtools             2.30.0        h7d7f7ad_2                   bioconda          
  biopython            1.78          py39h7f8727e_0               anaconda/pkgs/main
  blas                 1.0           mkl                          anaconda/pkgs/main
  bokeh                3.2.1         py39h2f386ee_0               anaconda/pkgs/main
  boost-cpp            1.73.0        h7f8727e_12                  anaconda/pkgs/main
  bottleneck           1.3.7         py39h389d5f1_0               conda-forge       
  brotli               1.0.9         he6710b0_2                   anaconda/pkgs/main
  brotli-python        1.0.9         py39h5a03fae_9               conda-forge       
  bzip2                1.0.8         h7b6447c_0                   anaconda/pkgs/main
  c-ares               1.19.0        h5eee18b_0                   anaconda/pkgs/main
  ca-certificates      2023.05.30    h06a4308_0                   anaconda/pkgs/main
  certifi              2023.7.22     pyhd8ed1ab_0                 conda-forge       
  cffi                 1.15.1        py39h5eee18b_3               anaconda/pkgs/main
  charset-normalizer   3.2.0         pyhd8ed1ab_0                 conda-forge       
  click                8.0.4         py39h06a4308_0               anaconda/pkgs/main
  cloudpickle          2.2.1         py39h06a4308_0               anaconda/pkgs/main
  contourpy            1.0.5         py39hdb19cb5_0               anaconda/pkgs/main
  cooler               0.9.2         pyh7cba7a3_0                 bioconda          
  cudatoolkit          10.2.89       hfd86e86_1                   anaconda/pkgs/main
  curl                 7.88.1        h37d81fd_2                   anaconda/pkgs/main
  cycler               0.11.0        pyhd3eb1b0_0                 anaconda/pkgs/main
  cython               3.0.0         py39h5eee18b_0               anaconda/pkgs/main
  cytoolz              0.12.0        py39h5eee18b_0               anaconda/pkgs/main
  dask                 2023.6.0      py39h06a4308_0               anaconda/pkgs/main
  dask-core            2023.6.0      py39h06a4308_0               anaconda/pkgs/main
  dbus                 1.13.18       hb2f20db_0                   anaconda/pkgs/main
  dill                 0.3.6         py39h06a4308_0               anaconda/pkgs/main
  distributed          2023.6.0      py39h06a4308_0               anaconda/pkgs/main
  expat                2.4.9         h6a678d5_0                   anaconda/pkgs/main
  fasthigashi          0.1.1         py_0                         ruochiz           
  fbpca                1.0           py_0                         conda-forge       
  filelock             3.9.0         py39h06a4308_0               anaconda/pkgs/main
  fontconfig           2.14.1        hef1e5e3_0                   anaconda/pkgs/main
  fonttools            4.25.0        pyhd3eb1b0_0                 anaconda/pkgs/main
  freetype             2.12.1        h4a9f257_0                   anaconda/pkgs/main
  fsspec               2023.4.0      py39h06a4308_0               anaconda/pkgs/main
  gflags               2.2.2         he6710b0_0                   anaconda/pkgs/main
  giflib               5.2.1         h5eee18b_3                   anaconda/pkgs/main
  glib                 2.69.1        he621ea3_2                   anaconda/pkgs/main
  glog                 0.5.0         h2531618_0                   anaconda/pkgs/main
  gmp                  6.2.1         h295c915_3                   anaconda/pkgs/main
  gmpy2                2.1.2         py39heeb90bb_0               anaconda/pkgs/main
  grpc-cpp             1.46.1        h33aed49_1                   anaconda/pkgs/main
  gst-plugins-base     1.14.1        h6a678d5_1                   anaconda/pkgs/main
  gstreamer            1.14.1        h5eee18b_1                   anaconda/pkgs/main
  h5py                 3.7.0         py39h737f45e_0               anaconda/pkgs/main
  hdf5                 1.10.6        h3ffc7dd_1                   anaconda/pkgs/main
  heapdict             1.0.1         pyhd3eb1b0_0                 anaconda/pkgs/main
  higashi              0.1.1a1       py_0                         ruochiz           
  htslib               1.14          h9093b5e_0                   bioconda          
  icu                  58.2          he6710b0_3                   anaconda/pkgs/main
  idna                 3.4           pyhd8ed1ab_0                 conda-forge       
  importlib-metadata   6.0.0         py39h06a4308_0               anaconda/pkgs/main
  importlib_resources  5.2.0         pyhd3eb1b0_1                 anaconda/pkgs/main
  intel-openmp         2021.4.0      h06a4308_3561                anaconda/pkgs/main
  jinja2               3.1.2         py39h06a4308_0               anaconda/pkgs/main
  joblib               1.2.0         py39h06a4308_0               anaconda/pkgs/main
  jpeg                 9e            h5eee18b_1                   anaconda/pkgs/main
  keyutils             1.6.1         h166bdaf_0                   conda-forge       
  kiwisolver           1.4.4         py39h6a678d5_0               anaconda/pkgs/main
  krb5                 1.20.1        h568e23c_1                   anaconda/pkgs/main
  lcms2                2.12          h3be6417_0                   anaconda/pkgs/main
  ld_impl_linux-64     2.38          h1181459_1                   anaconda/pkgs/main
  libblas              3.9.0         12_linux64_mkl               conda-forge       
  libboost             1.73.0        h28710b8_12                  anaconda/pkgs/main
  libcblas             3.9.0         12_linux64_mkl               conda-forge       
  libclang             10.0.1        default_hb85057a_2           anaconda/pkgs/main
  libcurl              7.88.1        h91b91d3_2                   anaconda/pkgs/main
  libdeflate           1.7           h27cfd23_5                   anaconda/pkgs/main
  libedit              3.1.20221030  h5eee18b_0                   anaconda/pkgs/main
  libev                4.33          h7f8727e_1                   anaconda/pkgs/main
  libevent             2.1.12        h8f2d780_0                   anaconda/pkgs/main
  libffi               3.4.4         h6a678d5_0                   anaconda/pkgs/main
  libgcc-ng            12.2.0        h65d4601_19                  conda-forge       
  libgfortran-ng       13.1.0        h69a702a_0                   conda-forge       
  libgfortran5         13.1.0        h15d22d2_0                   conda-forge       
  libgomp              12.2.0        h65d4601_19                  conda-forge       
  liblapack            3.9.0         12_linux64_mkl               conda-forge       
  libllvm10            10.0.1        hbcb73fb_5                   anaconda/pkgs/main
  libnghttp2           1.52.0        ha637b67_1                   anaconda/pkgs/main
  libnsl               2.0.0         h7f98852_0                   conda-forge       
  libopenblas          0.3.21        h043d6bf_0                   anaconda/pkgs/main
  libpng               1.6.37        hbc83047_0                   anaconda/pkgs/main
  libpq                12.15         h37d81fd_1                   anaconda/pkgs/main
  libprotobuf          3.20.3        he621ea3_0                   anaconda/pkgs/main
  libsqlite            3.42.0        h2797004_0                   conda-forge       
  libssh2              1.10.0        h37d81fd_2                   anaconda/pkgs/main
  libstdcxx-ng         12.2.0        h46fd767_19                  conda-forge       
  libthrift            0.15.0        h0d84882_2                   anaconda/pkgs/main
  libtiff              4.2.0         hecacb30_2                   anaconda/pkgs/main
  libuuid              2.38.1        h0b41bf4_0                   conda-forge       
  libwebp              1.2.4         h11a3e52_1                   anaconda/pkgs/main
  libwebp-base         1.2.4         h5eee18b_1                   anaconda/pkgs/main
  libxcb               1.15          h7f8727e_0                   anaconda/pkgs/main
  libxkbcommon         1.0.1         hfa300c1_0                   anaconda/pkgs/main
  libxml2              2.9.14        h74e7548_0                   anaconda/pkgs/main
  libxslt              1.1.35        h4e12654_0                   anaconda/pkgs/main
  libzlib              1.2.13        h166bdaf_4                   conda-forge       
  llvmlite             0.36.0        py39h612dafd_4               anaconda/pkgs/main
  locket               1.0.0         py39h06a4308_0               anaconda/pkgs/main
  lz4                  4.3.2         py39h5eee18b_0               anaconda/pkgs/main
  lz4-c                1.9.4         h6a678d5_0                   anaconda/pkgs/main
  markupsafe           2.1.1         py39h7f8727e_0               anaconda/pkgs/main
  matplotlib           3.7.0         py39h06a4308_0               anaconda/pkgs/main
  matplotlib-base      3.7.0         py39h417a72b_0               anaconda/pkgs/main
  mkl                  2021.4.0      h06a4308_640                 anaconda/pkgs/main
  mkl-service          2.4.0         py39h7e14d7c_0               conda-forge       
  mkl_fft              1.3.1         py39h0c7bc48_1               conda-forge       
  mkl_random           1.2.2         py39hde0f152_0               conda-forge       
  mpc                  1.1.0         h10f8cd9_1                   anaconda/pkgs/main
  mpfr                 4.0.2         hb69a4c5_1                   anaconda/pkgs/main
  mpmath               1.3.0         py39h06a4308_0               anaconda/pkgs/main
  msgpack-python       1.0.3         py39hd09550d_0               anaconda/pkgs/main
  multiprocess         0.70.14       py39h06a4308_0               anaconda/pkgs/main
  munkres              1.1.4         py_0                         anaconda/pkgs/main
  ncurses              6.4           h6a678d5_0                   anaconda/pkgs/main
  networkx             3.1           py39h06a4308_0               anaconda/pkgs/main
  ninja                1.10.2        h06a4308_5                   anaconda/pkgs/main
  ninja-base           1.10.2        hd09550d_5                   anaconda/pkgs/main
  nspr                 4.35          h6a678d5_0                   anaconda/pkgs/main
  nss                  3.89.1        h6a678d5_0                   anaconda/pkgs/main
  numba                0.53.1        py39ha9443f7_0               anaconda/pkgs/main
  numexpr              2.8.4         py39he184ba9_0               anaconda/pkgs/main
  numpy                1.24.3        py39h14f4228_0               anaconda/pkgs/main
  numpy-base           1.24.3        py39h31eccc5_0               anaconda/pkgs/main
  openssl              1.1.1v        h7f8727e_0                   anaconda/pkgs/main
  opt_einsum           3.3.0         pyhd3eb1b0_1                 anaconda/pkgs/main
  orc                  1.7.4         hb3bc3d3_1                   anaconda/pkgs/main
  packaging            23.1          pyhd8ed1ab_0                 conda-forge       
  pairix               0.3.7         py39h3d4b85c_5               bioconda          
  pandas               2.0.3         py39h40cae4c_1               conda-forge       
  partd                1.2.0         pyhd3eb1b0_1                 anaconda/pkgs/main
  pcre                 8.45          h295c915_0                   anaconda/pkgs/main
  pillow               9.4.0         py39h6a678d5_0               anaconda/pkgs/main
  pip                  23.2.1        pyhd8ed1ab_0                 conda-forge       
  platformdirs         3.10.0        pyhd8ed1ab_0                 conda-forge       
  ply                  3.11          py39h06a4308_0               anaconda/pkgs/main
  pooch                1.7.0         pyha770c72_3                 conda-forge       
  psutil               5.9.0         py39h5eee18b_0               anaconda/pkgs/main
  pyarrow              8.0.0         py39h992f0b0_0               anaconda/pkgs/main
  pybedtools           0.9.0         py39hd65a603_2               bioconda          
  pycparser            2.21          pyhd3eb1b0_0                 anaconda/pkgs/main
  pyfaidx              0.7.2.1       pyh7cba7a3_1                 bioconda          
  pynndescent          0.5.10        py39h06a4308_0               anaconda/pkgs/main
  pyparsing            3.0.9         py39h06a4308_0               anaconda/pkgs/main
  pyqt                 5.15.7        py39h6a678d5_1               anaconda/pkgs/main
  pyqt5-sip            12.11.0       py39h6a678d5_1               anaconda/pkgs/main
  pysam                0.17.0        py39h051187c_0               bioconda          
  pysocks              1.7.1         pyha2e5f31_6                 conda-forge       
  python               3.9.16        h7a1cb2a_2                   anaconda/pkgs/main
  python-dateutil      2.8.2         pyhd3eb1b0_0                 anaconda/pkgs/main
  python-lmdb          1.4.1         py39h6a678d5_0               anaconda/pkgs/main
  python-tzdata        2023.3        pyhd8ed1ab_0                 conda-forge       
  python_abi           3.9           1_cp39                       conda-forge       
  pytorch              1.8.0         py3.9_cuda10.2_cudnn7.6.5_0  pytorch           
  pytz                 2023.3        pyhd8ed1ab_0                 conda-forge       
  pyvcf3               1.0.3         pyhdfd78af_0                 bioconda          
  pyyaml               6.0           py39h5eee18b_1               anaconda/pkgs/main
  qt-main              5.15.2        h327a75a_7                   anaconda/pkgs/main
  qt-webengine         5.15.9        hd2b0992_4                   anaconda/pkgs/main
  qtwebkit             5.212         h4eab89a_4                   anaconda/pkgs/main
  re2                  2022.04.01    h295c915_0                   anaconda/pkgs/main
  readline             8.2           h5eee18b_0                   anaconda/pkgs/main
  requests             2.31.0        pyhd8ed1ab_0                 conda-forge       
  scikit-learn         1.3.0         py39h1128e8f_0               anaconda/pkgs/main
  scipy                1.11.1        py39h6183b62_0               conda-forge       
  seaborn              0.12.2        py39h06a4308_0               anaconda/pkgs/main
  setuptools           68.0.0        pyhd8ed1ab_0                 conda-forge       
  simplejson           3.17.6        py39h7f8727e_0               anaconda/pkgs/main
  sip                  6.6.2         py39h6a678d5_0               anaconda/pkgs/main
  six                  1.16.0        pyhd3eb1b0_1                 anaconda/pkgs/main
  snappy               1.1.9         h295c915_0                   anaconda/pkgs/main
  sortedcontainers     2.4.0         pyhd3eb1b0_0                 anaconda/pkgs/main
  sqlite               3.41.2        h5eee18b_0                   anaconda/pkgs/main
  sympy                1.11.1        py39h06a4308_0               anaconda/pkgs/main
  tbb                  2020.3        hfd86e86_0                   anaconda/pkgs/main
  tblib                1.7.0         pyhd3eb1b0_0                 anaconda/pkgs/main
  threadpoolctl        2.2.0         pyh0d69192_0                 anaconda/pkgs/main
  tk                   8.6.12        h1ccaba5_0                   anaconda/pkgs/main
  toml                 0.10.2        pyhd3eb1b0_0                 anaconda/pkgs/main
  toolz                0.12.0        py39h06a4308_0               anaconda/pkgs/main
  tornado              6.3.2         py39h5eee18b_0               anaconda/pkgs/main
  tqdm                 4.65.0        py39hb070fc8_0               anaconda/pkgs/main
  typing-extensions    4.7.1         hd8ed1ab_0                   conda-forge       
  typing_extensions    4.7.1         pyha770c72_0                 conda-forge       
  tzdata               2023c         h71feb2d_0                   conda-forge       
  umap-learn           0.5.3         py39h06a4308_0               anaconda/pkgs/main
  urllib3              2.0.4         pyhd8ed1ab_0                 conda-forge       
  utf8proc             2.6.1         h27cfd23_0                   anaconda/pkgs/main
  wheel                0.41.0        pyhd8ed1ab_0                 conda-forge       
  xyzservices          2022.9.0      py39h06a4308_1               anaconda/pkgs/main
  xz                   5.2.10        h5eee18b_1                   anaconda/pkgs/main
  yaml                 0.2.5         h7b6447c_0                   anaconda/pkgs/main
  zict                 2.2.0         py39h06a4308_0               anaconda/pkgs/main
  zipp                 3.11.0        py39h06a4308_0               anaconda/pkgs/main
  zlib                 1.2.13        h166bdaf_4                   conda-forge       
  zstd                 1.5.2         hfc55251_7                   conda-forge

Error running Ramani data

Hello,

After I installed higashi, I downloaded the ramani dataset (this one), json file and tried to run the analysis.

config = "config_ramani.JSON"
higashi_model = Higashi(config)
higashi_model.process_data()

And get error:

generating start/end dict for chromosome
extracting from data.txt
generating contact maps for baseline
data loaded
4110311 False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gdstantonlab/mxw010/.conda/envs/higashi/lib/python3.9/site-packages/higashi-0.1.0a0-py3.9.egg/higashi/Higashi_wrapper.py", line 459, in process_data
    self.create_matrix()
  File "/home/gdstantonlab/mxw010/.conda/envs/higashi/lib/python3.9/site-packages/higashi-0.1.0a0-py3.9.egg/higashi/Higashi_wrapper.py", line 492, in create_matrix
    create_matrix(self.config)
  File "/home/gdstantonlab/mxw010/.conda/envs/higashi/lib/python3.9/site-packages/higashi-0.1.0a0-py3.9.egg/higashi/Process.py", line 549, in create_matrix
    bin_adj = pseudo_bulk()
  File "/home/gdstantonlab/mxw010/.conda/envs/higashi/lib/python3.9/site-packages/higashi-0.1.0a0-py3.9.egg/higashi/Process.py", line 539, in pseudo_bulk
    bin_adj = csr_matrix((temp_weight_mask, (
  File "/home/gdstantonlab/mxw010/.conda/envs/higashi/lib/python3.9/site-packages/scipy-1.7.3-py3.9-linux-x86_64.egg/scipy/sparse/compressed.py", line 54, in __init__
    other = self.__class__(coo_matrix(arg1, shape=shape,
  File "/home/gdstantonlab/mxw010/.conda/envs/higashi/lib/python3.9/site-packages/scipy-1.7.3-py3.9-linux-x86_64.egg/scipy/sparse/coo.py", line 196, in __init__
    self._check()
  File "/home/gdstantonlab/mxw010/.conda/envs/higashi/lib/python3.9/site-packages/scipy-1.7.3-py3.9-linux-x86_64.egg/scipy/sparse/coo.py", line 283, in _check
    raise ValueError('row index exceeds matrix dimensions')
ValueError: row index exceeds matrix dimensions

question about cell order

Hi Ruochi,
I am a little bit confused about the order of cells in fast higashi data. In fetch_cell_embedding the default is restore_order=False, and in the wiki it mentions cells with high quality / pass qc filtering would be put in front of 'bad' cells.

Now if I only want to check the embedding of those pass qc cells, I do get_qc() to obtained those who pass qc: kept, read_count_all = wrapper.get_qc()

Does the order in kept the same order of input data, or has it been reordered?

Best,
Yang

RuntimeError: received 0 items of ancdata

Dear developers, I was running Ramani's code, and I replaced the data with my own. But I encountered the following error：

`[ Epoch 28 of 45 ]

(Train) bce: 0.2533, mse: 0.0000, acc: 95.492 %, pearson: 0.852, spearman: 0.609, elapse: 27.909 s
(Valid) bce: 0.2033, acc: 98.862 %,pearson: 0.956, spearman: 0.644,elapse: 0.089 s
no improve: 1
[ Epoch 29 of 45 ]
(Train) bce: 0.2529, mse: 0.0000, acc: 95.531 %, pearson: 0.854, spearman: 0.609, elapse: 26.552 s
(Valid) bce: 0.2068, acc: 98.737 %,pearson: 0.952, spearman: 0.643,elapse: 0.094 s
no improve: 2
[ Epoch 30 of 45 ]
(Train) bce: 0.2528, mse: 0.0000, acc: 95.534 %, pearson: 0.854, spearman: 0.609, elapse: 26.419 s
(Valid) bce: 0.2070, acc: 98.739 %,pearson: 0.952, spearman: 0.644,elapse: 0.086 s
no improve: 3
[ Epoch 31 of 45 ]
(Train) bce: 0.2528, mse: 0.0000, acc: 95.545 %, pearson: 0.854, spearman: 0.609, elapse: 27.653 s
(Valid) bce: 0.2084, acc: 98.711 %,pearson: 0.951, spearman: 0.643,elapse: 0.095 s
- (Validation) : 10%|█ | 1/10 [00:08<01:12, 8.09s/it]concurrent.futures.process._RemoteTraceback:
  '''
  Traceback (most recent call last):
  File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/concurrent/futures/process.py", line 387, in wait_result_broken_or_wakeup
  result_item = result_reader.recv()
  File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/multiprocessing/connection.py", line 251, in recv
  return _ForkingPickler.loads(buf.getbuffer())
  File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 355, in rebuild_storage_fd
  fd = df.detach()
  File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/multiprocessing/resource_sharer.py", line 58, in detach
  return reduction.recv_handle(conn)
  File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/multiprocessing/reduction.py", line 189, in recv_handle
  return recvfds(s, 1)[0]
  File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/multiprocessing/reduction.py", line 164, in recvfds
  raise RuntimeError('received %d items of ancdata' %
  RuntimeError: received 0 items of ancdata
  '''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/data02/home/scv9147/run/24.3.21-higashi_K562_HAP1_500K.py", line 17, in
higashi_model.train_for_imputation_nbr_0()
File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/site-packages/higashi/Higashi_wrapper.py", line 1367, in train_for_imputation_nbr_0
self.train(
File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/site-packages/higashi/Higashi_wrapper.py", line 1141, in train
valid_bce_loss, valid_accu, valid_auc1, valid_auc2, _, _ = self.eval_epoch(validation_data_generator)
File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/site-packages/higashi/Higashi_wrapper.py", line 1005, in eval_epoch
batch_x, batch_y, batch_w, batch_chrom, batch_to_neighs, chroms_in_batch = p.result()
File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/data02/home/scv9147/run/miniconda3/envs/higashi/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.`

How do I solve this problem?

npy file not found error

Hi, I am trying to use the Process.py file, but during "generating node attributes", it shows error that "chr1_cell_adj.npy" file not found. I guess this file is generated during the process of running the code, because I didn't see it in README. Do you know what might cause the error?

Below is the full message:

Code: python Process.py -c /p/keles/schic/volumeA/Ren2019/higashi_files/data.json

scale_factor 1
cpu_num 56
max bin 100000.0
min bin 1
generating node attributes
Traceback (most recent call last):
File "Process.py", line 403, in generate_feats
temp = np.load(os.path.join(temp_dir, "%s_cell_%s.npy" % (c, ext_str)), allow_pickle=True).item()
File "/u/s/s/sshen82/miniconda3/envs/higashi/lib/python3.7/site-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/p/keles/schic/volumeA/Ren2019/higashi_files/temp/chr1_cell_adj.npy'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "Process.py", line 533, in
generate_feats(optional_smooth_flag)
File "Process.py", line 405, in generate_feats
temp = np.load(os.path.join(temp_dir, "%s_cell_%s.npy" % (c, ext_str)), allow_pickle=True)
File "/u/s/s/sshen82/miniconda3/envs/higashi/lib/python3.7/site-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/p/keles/schic/volumeA/Ren2019/higashi_files/temp/chr1_cell_adj.npy'

The main_cell.py is so slow

Hello
when i run main_cell.py ,it is very very very slow. I start run main_cell.py at Monday this week, but the result as flow until now.

[ Epoch 29 of 80 ]

(Training) BCE: 0.348 MSE: 0.719 Loss: 0.349 norm_ratio: 0.00: 32%|▎| 321/1000 [44:21<1:30:05, 7.96s/it]

So i want to know how to more faster. Thank you very much.

Problem on running FastHigashi on Ramani et al. dataset

Hi Ruochi! I used Higashi on the Ramani et al. data you provided in your tutorial before, so I am trying to run the same dataset with FastHigashi again to see the improvement. I copied the first several lines of codes you provided in the Lee et al. tutorial, here below is the code for running FastHigashi,

Code

from higashi.Higashi_wrapper import *
from fasthigashi.FastHigashi_Wrapper import *
config = '.../ram_data/config_Ramani_1m.JSON'
higashi_model = Higashi(config)

higashi_model.process_data()

Initialize the model

fh_model = FastHigashi(config_path=config,
path2input_cache=".../ram_fast",
path2result_dir=".../ram_fast",
off_diag=100, # 0-100th diag of the contact maps would be used.
filter=False, # fit the model on high quality cells, transform the rest
do_conv=False,# linear convolution imputation
do_rwr=False, # partial random walk with restart imputation
do_col=False, # sqrt_vc normalization
no_col=False) # force to not do sqrt_vc normalization

Pack from sparse mtx to tensors

fh_model.prep_dataset()

fh_model.run_model(dim1=.6,
rank=256,
n_iter_parafac=1,
extra="")

And next is the content of the config file I provided to the program,

Config file

config_info = {
"data_dir": ".../ram_data",
"input_format": 'higashi_v1',
"temp_dir": ".../ram_fast",
"genome_reference_path": ".../4DN_data/hg19.chrom.sizes.txt",
"cytoband_path": ".../4DN_data/cytoBand.txt",
"chrom_list": ["chr1", "chr2","chr3","chr4","chr5",
"chr6","chr7","chr8","chr9","chr10",
"chr11","chr12","chr13","chr14","chr15",
"chr16","chr17","chr18","chr19","chr20",
"chr21","chr22","chrX"],
"resolution": 1000000,
"resolution_cell": 1000000,
"resolution_fh": [1000000],
"embedding_name": "exp_zinb3",
"minimum_distance": 2000000,
"maximum_distance": -1,
"local_transfer_range": 1,
"loss_mode": "zinb",
"dimensions": 64,
"impute_list":["chr1", "chr2","chr3","chr4","chr5",
"chr6","chr7","chr8","chr9","chr10",
"chr11","chr12","chr13","chr14","chr15",
"chr16","chr17","chr18","chr19","chr20",
"chr21","chr22","chrX"],
"neighbor_num": 4,
"cpu_num": 10,
"gpu_num": 2,
"embedding_epoch":60,
}

Problem

The problem is, at the iteration step when running the pipeline, I got many nans, as follows,

Starting iteration 0
1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan; 1, nan;
PARAFAC2 re=9.902 takes 5.5s

I don't see this kind of thing in your tutorial, and I got a quite bad embedding plot as follows,

I don't know why was that, I was wondering you have any idea? Thanks!

"chrom_list" error

Hey Ruochi,
Previously I tried using Higashi using the config file(copied below), but I am getting the following error:

Traceback (most recent call last):
  File "Process.py", line 627, in <module>
    chrom_list = config['chrom_list']
TypeError: list indices must be integers or slices, not str

Any help with that will be great!

Note: I am using the same chromosome name in all input files.

config.json file:

[[
    {
    "data_dir":"./64CSE_1",
    "structured":false,
    "temp_dir":"./Temp",
    "genome_reference_path": "./64CSE_1/chrom_mm10.sizes",
    "cytoband_path": "./64CSE_1/cytoBand.txt",
    "chrom_list":["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chrX","chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19"],
    "resolution" : 1000000,
    "resolution_cell": 1000000,
	"local_transfer_range":1,
	"dimensions":64,
	"loss_mode":"classification",
	"rank_thres":1,
	"embedding_epoch":60,
	"no_nbr_epoch":45,
	"with_nbr_epoch":30,
	"embedding_name":"test_1",
	"impute_list":["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chrX","chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19"],
	"minimum_distance":1000000,
	"maximum_distance":-1,
	"neighbor_num":5,
	"impute_no_nbr":false,
	"impute_with_nbr":true,
	"cpu_num":-1,
	"gpu_num":4,
	"UMAP_params":{"n_neighbors":30,"min_dist":0.3},
	"TSNE_params":{"n_neighbors":15},
	"random_walk":false
	}
]]

FastHigashi wrapper.prep_dataset: 'int' object has no attribute 'shape'

Hi Ruochi,
Thank you so much for the great packages!
I am recently trying FastHigashi on our single cell HiC dataset (I ran the analysis with Higashi before and the result was good). But when running wrapper.prep_dataset I encountered an issue:

Loading one npy file for check:

I assume chrXXX_sparse_adj.npy should store sparse matrixs but seems that my first element is 0... Any hints for this?

Thanks again,
Yang

Conda install doesn't work as expected

I tried to install higashi with
'conda install -c ruochiz higashi',
but get an error feature:/linux-64::__glibc==2.27=0 feature:|@/linux-64::__glibc==2.27=0

I don't know how to solve this problem. Or which version of glibc should I use。

Higashi gets error during higashi_model.process_data()

Hi, ruochi! I'm trying to run Higashi at a low resolution, like 10kb.
It seems trigger a concurrent.futures.processBrokenFrocessPool error when creating matrices tasks.
I tried to run higashi from the CLI instead of the API, but the error persisted as below.

Then, I tried to change "cpu" in configure from -1 to 1, and a new memory error occurs as below.

Do you know how to solve this problem when creating matrices tasks in higashi_model.process_data()?

Thanks!

Some question about json file

Illegal instruction

Hi Ruochi,
I was trying the new version of Higashi, but when it comes to the first stage of training, it shows "illegal instruction". Do you know the reason? It seems that torch package is fine, since when I open python and directly load it, it runs normally.

About parameter k in JSON file

Hi, sorry I have a question again. In your description, you said that "Train Higashi with cell-dependent GNN, but with k=k in the configuration JSON When {START_STEP} is 1, the program would execute step 1,2,3 sequentially." However, I didn't see k in the tutorial of json file. What do I need to input if I only want the first step?

By the way, I find that when running the code, it will run for at least two times, one with epoch of 20, and one with epoch of 120 (maybe there are others after this, but I didn't continue it), is this step 1, 2, and 3?

EDIT:
After reading your code, I think the second question is solved. However, the first one is still unclear. There is another question that can I change the epoch? Since it is not changing much on my dataset. Or on your experience, is this only happening on my own dataset, or is this just that accuracy won't change much among different epochs.

wrapper.fast_process_data() - method does not exist

Dear developers, I am trying to run your notebook tutorials and encountered this issue with Fast Higashi. I am on the Ramani notebook, but it seems to be a shared problem.
The method does not seem to exist (also checked with help(wrapper)). Could you please advise?
I installed FastHigashi from conda as recommended, and also checked for updates.

In addition, I found other minor mistakes in the Ramani notebook, like ‘batch_id’ and not batch, and higashi_model instead of higashi. Hope this can help.

Higahi stop at train_for_imputation_nbr_0 on both API and CLI.

Hi, ruochi. Recently I've installed higashi through conda in another linux(ubuntu 20) successfully, but failed in my own linux(ubuntu 22), so I wonder if the question is caused by my ubuntu 22?

About batch effect

Hi, I have one question on the batch effect. In the pickle file, we can input the batch for the data, but when I look at the output of the data, the batch effect is not eliminated. I looked at the paper, and it doesn't talk about batch effect removal, and I just want to make sure if it is removed in your model.

higashi.process_data() won't finish

Hello! I'm trying to process ~5000 cells using higashi. The fast higashi model works on my data, but I'd like to do downstream TAD/ other analysis, so I was trying to initialize the higashi model using higashi.process_data() but it won't finish even after a day of running. I used 128 GB of memory, 2 GPU, and 16 CPU, and it always gets stuck in the middle of create_matrix -- is there anything I could do to fix this?

Thank you so much!

RuntimeError: CUDA out of memory.

Hi, thanks for this great tool! However, I encountered a probem of memory management. Is there any parameters that I can change batch size or input size? I have partitioned around 400GiB memory for 500Kb training and 20Kb imputation, but still raise this error. Looking forward to your reply!

ma-compbio / higashi Goto Github PK

higashi's People

Contributors

Stargazers

Watchers

Forkers

higashi's Issues

Code

Initialize the model

Pack from sparse mtx to tensors

Config file

Problem

Recommend Projects

Recommend Topics

Recommend Org