Comments (21)
Thank you for your interest in Higashi! When using it through the CLI mode, did it just hang like this (stuck at 0% without any error), or it would quit with an error information? If it's the former one, could you help to attach the log when you kill the process (ctrl+c), such that I can try to figure which process is hanging? Thanks!
from higashi.
This is the error, or do you need the complete log file?
- (Training) : 0%| | 0/1000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/biogenger/Biosoftwares/Higashi/higashi/main_cell.py", line 1362, in
train(higashi_model,
File "/home/biogenger/Biosoftwares/Higashi/higashi/main_cell.py", line 653, in train
bce_loss, mse_loss, train_accu, auc1, auc2, str1, str2, train_pool, train_p_list = train_epoch(
File "/home/biogenger/Biosoftwares/Higashi/higashi/main_cell.py", line 124, in train_epoch
for p in as_completed(train_p_list):
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.9/concurrent/futures/_base.py", line 245, in as_completed
waiter.event.wait(wait_timeout)
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.9/threading.py", line 574, in wait
signaled = self._cond.wait(timeout)
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.9/threading.py", line 312, in wait
waiter.acquire()
KeyboardInterrupt
Exception ignored in: <module 'threading' from '/home/biogenger/miniconda3/envs/higashi/lib/python3.9/threading.py'>
Traceback (most recent call last):
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.9/threading.py", line 1411, in _shutdown
atexit_call()
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.9/concurrent/futures/process.py", line 95, in _python_exit
t.join()
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.9/threading.py", line 1029, in join
self._wait_for_tstate_lock()
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.9/threading.py", line 1045, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt:
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.9/multiprocessing/popen_fork.py", line 27, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
from higashi.
Hi, ruochi. I wonder if the bug is related with the pytorch version? Or I actually did not install higashi through git successfully?
from higashi.
I don't think it has to do with torch version as 1.11.0 is sth I have tested on. The deadlock seems to be triggered by the multiprocessing part. I will run some test on my end. Meanwhile could you share the config.JSON file you created for this run? Thx.
from higashi.
{
"config_name": "Cere-24-20220416",
"data_dir": "/media/biogenger/D/Projects/CZP/Cere-24-20220416/7_higashi_input",
"input_format": "higashi_v1",
"structured": "true",
"temp_dir": "/media/biogenger/D/Projects/CZP/Cere-24-20220416/8_higashi_out",
"genome_reference_path": "/media/biogenger/D/Projects/CZP/Cere-24-20220416/GRCm39.chr.sizes.txt",
"cytoband_path": "/media/biogenger/D/Projects/CZP/Cere-24-20220416/GRCm39_cytoband.txt",
"chrom_list": ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19"],
"resolution": 1000000,
"resolution_cell": 1000000,
"local_transfer_range": 1,
"dimensions": 64,
"loss_mode": "zinb",
"rank_thres": 1,
"embedding_epoch": 80,
"no_nbr_epoch": 80,
"with_nbr_epoch": 60,
"embedding_name": "Cere-24-20220416_zinb",
"impute_list": ["chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19"],
"minimum_distance": 1000000,
"maximum_distance": -1,
"neighbor_num": 5,
"cpu_num": -1,
"gpu_num": 0,
"UMAP_params": {"n_neighbors": 20}
}
from higashi.
And my python version is 3.9.0. :)
from higashi.
Hi, I just updated the code base (specifically the main_cell.py file). Could you try to set the cpu_num as 1, run Higashi with the CLI approach (python higashi/main_cell.py -c ../...JSON -s 2)? The -s 2 will make sure the program starts at the training for imputation step. Setting cpu_num = 1 in the JSON file will disable the multiprocessing. Let's see if there will be any error without using multiprocessing. If it hangs again, interrupt it and attach the logs please. Thx.
from higashi.
It seems work, ruochi.
0%| | 0/24 [00:00<?, ?it/s]
100%|██████████| 24/24 [00:00<00:00, 412554.49it/s]
0%| | 0/24 [00:00<?, ?it/s]
100%|██████████| 24/24 [00:00<00:00, 521571.48it/s]
0%| | 0/24 [00:00<?, ?it/s]
25%|██▌ | 6/24 [00:00<00:00, 57.71it/s]
100%|██████████| 24/24 [00:00<00:00, 123.71it/s]
0%| | 0/24 [00:00<?, ?it/s]
100%|██████████| 24/24 [00:00<00:00, 759.27it/s]
- (Training) : 0%| | 0/1000 [00:00<?, ?it/s]
- (Training) : 0%| | 1/1000 [00:00<04:23, 3.79it/s]
- (Training) BCE: 0.797 MSE: 0.000 Loss: 0.797 norm_ratio: 0.00: 0%| | 2/1000 [00:00<03:47, 4.39it/s]
- (Training) BCE: 0.879 MSE: 0.000 Loss: 0.879 norm_ratio: 0.00: 0%| | 3/1000 [00:00<03:49, 4.34it/s]
- (Training) BCE: 0.870 MSE: 0.000 Loss: 0.870 norm_ratio: 0.00: 0%| | 4/1000 [00:00<03:52, 4.29it/s]
- (Training) BCE: 0.818 MSE: 0.000 Loss: 0.818 norm_ratio: 0.00: 0%| | 5/1000 [00:01<03:39, 4.53it/s]
- (Training) BCE: 0.781 MSE: 0.000 Loss: 0.781 norm_ratio: 0.00: 1%| | 6/1000 [00:01<03:46, 4.38it/s]
- (Training) BCE: 0.775 MSE: 0.000 Loss: 0.775 norm_ratio: 0.00: 1%| | 7/1000 [00:01<03:50, 4.32it/s]
- (Training) BCE: 0.822 MSE: 0.000 Loss: 0.822 norm_ratio: 0.00: 1%| | 8/1000 [00:01<03:42, 4.46it/s]
- (Training) BCE: 0.766 MSE: 0.000 Loss: 0.766 norm_ratio: 0.00: 1%| | 9/1000 [00:02<03:46, 4.37it/s]
- (Training) BCE: 0.827 MSE: 0.000 Loss: 0.827 norm_ratio: 0.00: 1%| | 10/1000 [00:02<03:41, 4.48it/s]
- (Training) BCE: 0.849 MSE: 0.000 Loss: 0.849 norm_ratio: 0.00: 1%| | 11/1000 [00:02<03:47, 4.34it/s]
- (Training) BCE: 0.834 MSE: 0.000 Loss: 0.834 norm_ratio: 0.00: 1%| | 12/1000 [00:02<03:56, 4.18it/s]
- (Training) BCE: 0.748 MSE: 0.000 Loss: 0.748 norm_ratio: 0.00: 1%|▏ | 13/1000 [00:03<04:08, 3.96it/s]
- (Training) BCE: 0.856 MSE: 0.000 Loss: 0.856 norm_ratio: 0.00: 1%|▏ | 14/1000 [00:03<03:58, 4.13it/s]
But what if I wanna use multi cpu?
from higashi.
And I test it with cpu:-1, the same error occurs.
from higashi.
That's... unexpected... the cpu=1 is just used to debug... I thought the error would persist. It's just easier to debug without multiprocessing. What if you do cpu:2 or cpu:3? Would that trigger the error?
from higashi.
Yeap... I tried cpu=2,8, and that trigger the same error, but cpu=1 can work.
from higashi.
Let me try to run the code on my cpu server and get back to you. If cpu=1 can work then it has nothing to do with the data itself. I have sth that I suspect might be the reason though. Will get back with more details.
from higashi.
I found that it actually created multi process, but the process seemed sleeping.
from higashi.
Hi, ruochi. How is the question solved?
from higashi.
Sorry for the late reply. I was on a trip. I tested it on the cpu machine I have (linux). The multiprocessing seems to be working fine. I am planning to test it on a windows PC. The configuration of the environment takes a while as I never used that PC to run python program before. I will post an update later.
from higashi.
Hi,ruochi. My computer is linux as well, I wonder if I did not install higashi successfully actually?
Recently I met with some problems more,
1.when I set cpu=1 and run CLI,
the .err file is
0%| | 0/19 [00:00<?, ?it/s]
100%|██████████| 19/19 [00:00<00:00, 520861.28it/s]
0%| | 0/19 [00:00<?, ?it/s]
100%|██████████| 19/19 [00:00<00:00, 664098.13it/s]
Traceback (most recent call last):
File "main_cell.py", line 1328, in
checkpoint = torch.load(save_path+"_stage1", map_location=current_device)
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.7/site-packages/torch/serialization.py", line 699, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.7/site-packages/torch/serialization.py", line 231, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/biogenger/miniconda3/envs/higashi/lib/python3.7/site-packages/torch/serialization.py", line 212, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/media/biogenger/D/Projects/CZP/Cere-24-20220416/8_higashi_out/model/model.chkpt_stage1'
the part of .log file is
layer_norm1.weight True torch.Size([64])
layer_norm1.bias True torch.Size([64])
layer_norm2.weight True torch.Size([64])
layer_norm2.bias True torch.Size([64])
extra_proba.w_stack.0.weight True torch.Size([4, 41])
extra_proba.w_stack.0.bias True torch.Size([4])
extra_proba.w_stack.1.weight True torch.Size([1, 4])
extra_proba.w_stack.1.bias True torch.Size([1])
extra_proba2.w_stack.0.weight True torch.Size([4, 41])
extra_proba2.w_stack.0.bias True torch.Size([4])
extra_proba2.w_stack.1.weight True torch.Size([1, 4])
extra_proba2.w_stack.1.bias True torch.Size([1])
extra_proba3.w_stack.0.weight True torch.Size([4, 41])
extra_proba3.w_stack.0.bias True torch.Size([4])
extra_proba3.w_stack.1.weight True torch.Size([1, 4])
extra_proba3.w_stack.1.bias True torch.Size([1])
attribute_dict_embedding.weight False torch.Size([4826, 20])
params to be trained 738082
initializing data generator
initializing data generator
2.when I set cpu=1 in jupyter notebook
It seems higashi broke out in the imputation? I'm confused about the error of str and int, because the imputation process had already run for a time.
from higashi.
These two are triggered by different reasons. For the first one, it's caused by that there is not stage 1 model trained for that JSON. If you didn't trained the model before when using CLI mode, you should do python main_cell.py -c xxx -s 1 instead of -s 2
For the second one, the error is triggered by that the cytoband file you provided contains str in the "start" column. Could you attach your cytoband file here for reference? I can push a fix soon to make the code more compatible when encountering str in the "start" column, but it would be helpful to see why would there be a str.
from higashi.
OK, here is my cytoband file.
GRCm39_cytoband.txt
from higashi.
Ah. I know, it's because the first line #chrom, chromStart, chromEnd are interpreted as the content not the header. Delete the first line, the code should be fine. The cytoband file I downloaded from UCSD doesn't contain header and that's why I thought it wouldn't have the header by default. I can add some code to make sure the program ignore line that start with #.
from higashi.
OK~thanks
from higashi.
I just added some code to support a new parameter in the JSON file. If you set "cpu_num_torch": -1, but "cpu_num":1. The code should still utilizes multiprocessing for pytorch training, but only one cpu process for generating training batches. This is a temporary solution, and is not as optimized as the original version. But since I cannot replicate the error on my end. I would have to guess what triggers the error, which could take a while.
I will close this issue for now. But if I have more updates, I will posted it here.
from higashi.
Related Issues (20)
- Generating a hdf5 file for storing the coassayed signals HOT 2
- ValueError: Found array with 1 feature(s) (shape=(250, 1)) while a minimum of 2 is required by TruncatedSVD HOT 16
- Some question about json file
- some question about json file HOT 1
- NameError: name 'neg_num' is not defined HOT 10
- FastHigashi wrapper.prep_dataset: 'int' object has no attribute 'shape' HOT 9
- Some problem about color HOT 5
- question about cell order HOT 5
- Problem solved
- Error running Ramani data HOT 2
- higashi.process_data() won't finish HOT 20
- higashi.Higashi_backend.Modules import error HOT 5
- error when running scTAD.py HOT 1
- Error running simulated data
- The main_cell.py is so slow HOT 5
- Problem running Higashi on Ramani et al. HOT 5
- What are the configure options mean?
- Stop with OSError when run "higashi_model.train_for_imputation_nbr_0()" HOT 3
- Error in fh_model.prep_dataset() "Pack from sparse mtx to tensors" HOT 2
- ERROE when run process.py: no config file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from higashi.