koszullab / hic-box Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 6.0 88 KB

GUI-based pipeline for Hi-C data processing and visualization

Python 99.82% Shell 0.18%

hic-box's People

Stargazers

Watchers

Forkers

cerebis than9th baudrly bagavi wangzhennan14 smediterranea

hic-box's Issues

Unable to run main.py + suggestions

Good afternoon!

I am trying to explore and use HiC-box to build a contact map from a metagenomics sample from which we have both a shotgun sequencing library and a HiC library availables, and hopefully extend its analysis with GRAAL. However, I am running some trouble with the first steps of the installation, more specifically, running main.py script.

When I call python main.py,
I get this error: Unable to access the X Display, is $DISPLAY set properly?

I have done some research and I think this is an error related to wxpython module, but I have tried to install several versions of it and I have not been able to solve it. Could you point me out how to proceed here? Any idea is very welcome!

Apart from this, I have a couple of suggestions to add in the initial documentation:

Specify that this requires a Python2.7 environment (maybe a bit obvious, but several users already rely on Python3)
Add cython (as it is required by mirnylib) and wxpython to the python dependencies required by HiC-Box, as it is required by wx module (here is where I suspect I am having my issues). Both of them can be installed easiliy through conda

conda install -c anaconda wxpython cython

Nothing else, I hope to hear from you soon, and I will be very happy to provide you with more feedback if this keeps working! Thanks for the development, the tool looks promising!

Best,
Juanma

Insufficient documentation

I'd like to use HiC-Box to prepare my data for genome finishing using GRAAL, but I have a number of questions and points to raise about the documentation.

The main page doesn't describe what the software does.
There is no link from the HiC-Box page to the GRAAL page, or from the GRAAL page to HiC-Box. It is not apparent that the two softwares are designed to work together, even though they do.
There is no guidance for people who have already mapped and/or processed their sequencing reads and want to start HiC-Box downstream of the mapping step using, for example, a bam file or bowtie output.
There is no description of the advanced parameters or guidance as to how to use them. The README says "tweak if needed" but doesn't say how to determine when tweaking is needed. What is "Total reads length"? Can it not handle reads of different lengths, for example from different experiments, or due to trimming? What is "Tag length"? My reads don't have a tag. Is HiC-Box going to try and trim 6 bp off anyway?
Upon running "python main.py" a window pops up that prompts the user for reads in fastq format. Obviously 3C/Hi-C data is paired-end, and hence there are two fastq files--for read one and for read two--but there is only one box and it apparently only accepts one filename argument. This presents a problem, and I don't know how to proceed. The advanced settings box has a "Paired wise FASTQ" option that can be checked, which I presume relates to this, as well as a "Length paired wise FASTQ" option that is set to 3 by default. I don't know what this means. Does it mean that the FASTQ reads are meant to be supplied in interleaved format in groups of three? If so, this calls for pre-processing for which no instructions are given. Also is it able to handle multiple fastq files for each read? Bowtie simply accepts a comma-separated list, but its unclear what HiC-Box expects. Can it handle gzipped files? Also unclear.
The instructions say to build a pyramid, but GRAAL also has a pyramid building step. Is this redundant? At which step am I supposed to stop with HiC-Box? Instructions are unclear.
A comment related to point 3: If HiC-Box just took bowtie output as its input, there would be no need to ask many of these questions, since bowtie is already well-documented. One problem is HiC-Box packages the functionality of bowtie in an obscure way (a "black box"). Is this necessary? If so, it would be beneficial to explain what is does and why (again related to point 1).

I'm fully aware that in a research environment it's difficult to keep the documentation up to speed with the latest projects--If my comments here seem long-winded it's because I'm trying to help by giving thorough feedback. That said, I'd appreciate any advice/updates you can give. Thanks!

ValueError during alignment

when I run the HiC-Box alignment, I receive the following error:

[...]
start computing biases...
data exist. loading...
done
data collect_done
 numpy size shape =  (0,)
 numpy gc shape =  (0,)
 np vect size shape =  (95581,)
 np vect gc shape =  (95581,)
Traceback (most recent call last):
  File "main.py", line 278, in OnAlign
    ncpu=self.ncpu)
  File "/Python/HiC-Box/analysis_main.py", line 110, in analyze
    hic_bank.gc_size_bias()
  File "/Python/HiC-Box/hic_exp.py", line 165, in gc_size_bias
    mat_gc, mat_size,steps_gc,steps_length =  hic_analysis.gc_size_bias(self.folder_analysis,self.dict_fragments, self.fragments_contacts_file)
  File "/Python/HiC-Box/hic_analysis.py", line 203, in gc_size_bias
    size_min = numpy_size.min()
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 29, in _amin
    return umr_minimum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation minimum which has no identity

Could you help me figure out what went wrong here?

bug related to bowtie indexing

I'm getting the error shown below. The proximal cause seems to be that hic_exp.py populates the dictionary dict_fragments from a file in the main output directory named "list_contig_names.txt". The reason for the error is the file in my directory is empty, thus the dictionary is empty, and on line 76 we therefore get a key error. I can see on line 39 that this file is generated from a call to bowtie2-inspect -n . The problem for me is there is no genome index.

This is where some more detailed documentation would come in handy. Am I supposed to generate the index myself? If so, this needs to be explicitly stated in the readme. Also what is "Bowtie folder" in the advanced options? When I enter stuff into the advanced options and click "apply" it gives the warning message "Could not find bowtie folder". This also needs to be explained.

Also, as an urelated side-note, 22270 seconds (i.e. 6 hours) seems like a long time for just making a restriction map of a 1.2 Gb genome. This should probably be optimized. I'm guessing your problem is the biopython restriction module. In the past I've noticed that some (but not all) of those functions were super slow, to the point that I just used regular expressions as a workaround. Just an idea.

Restriction map generated in 22270.876373 s
filling list of contigs ..
[]
filling dictionnary of fragments ...
Traceback (most recent call last):
File "main.py", line 278, in OnAlign
ncpu=self.ncpu)
File "/home/tom/Desktop/HiC-Box-master/analysis_main.py", line 102, in analyze
len_paired_wise_fastq)
File "/home/tom/Desktop/HiC-Box-master/hic_exp.py", line 76, in init
dict_fragments[a_tmp[1]].append(int(a_tmp[0]))
KeyError: 'gi|526059867|ref|NW_004823088.1| Melopsittacus undulatus
unplaced genomic scaffold, Melopsittacus_undulatus_6.3
budgerigar_v6.3_scf900160251875, whole genome shotgun sequence'

Non-ASCII characters and faulty import

In the current version the tool can't be launched. There are two reasons.

The script pyramid_sparse.py contains non ASCII characters, namely this recuring line:
p = ProgressBar('blue', width=20, block='▣', empty='□')
All of these lines are commented out anyway, so deleting them fixes the problem.
The script main.py imports weave from scipy. However, weave became its own project, so it needs to imported directly, otherwise it can't be found.

koszullab / hic-box Goto Github PK

hic-box's People

Stargazers

Watchers

Forkers

hic-box's Issues

Unable to run main.py + suggestions

Insufficient documentation

ValueError during alignment

bug related to bowtie indexing

Non-ASCII characters and faulty import

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent