koszullab / hic-box Goto Github PK
View Code? Open in Web Editor NEWGUI-based pipeline for Hi-C data processing and visualization
GUI-based pipeline for Hi-C data processing and visualization
Good afternoon!
I am trying to explore and use HiC-box to build a contact map from a metagenomics sample from which we have both a shotgun sequencing library and a HiC library availables, and hopefully extend its analysis with GRAAL. However, I am running some trouble with the first steps of the installation, more specifically, running main.py script.
When I call python main.py
,
I get this error: Unable to access the X Display, is $DISPLAY set properly?
I have done some research and I think this is an error related to wxpython
module, but I have tried to install several versions of it and I have not been able to solve it. Could you point me out how to proceed here? Any idea is very welcome!
Apart from this, I have a couple of suggestions to add in the initial documentation:
Specify that this requires a Python2.7 environment (maybe a bit obvious, but several users already rely on Python3)
Add cython
(as it is required by mirnylib
) and wxpython
to the python dependencies required by HiC-Box, as it is required by wx
module (here is where I suspect I am having my issues). Both of them can be installed easiliy through conda
conda install -c anaconda wxpython cython
Nothing else, I hope to hear from you soon, and I will be very happy to provide you with more feedback if this keeps working! Thanks for the development, the tool looks promising!
Best,
Juanma
I'd like to use HiC-Box to prepare my data for genome finishing using GRAAL, but I have a number of questions and points to raise about the documentation.
The main page doesn't describe what the software does.
There is no link from the HiC-Box page to the GRAAL page, or from the GRAAL page to HiC-Box. It is not apparent that the two softwares are designed to work together, even though they do.
There is no guidance for people who have already mapped and/or processed their sequencing reads and want to start HiC-Box downstream of the mapping step using, for example, a bam file or bowtie output.
There is no description of the advanced parameters or guidance as to how to use them. The README says "tweak if needed" but doesn't say how to determine when tweaking is needed. What is "Total reads length"? Can it not handle reads of different lengths, for example from different experiments, or due to trimming? What is "Tag length"? My reads don't have a tag. Is HiC-Box going to try and trim 6 bp off anyway?
Upon running "python main.py" a window pops up that prompts the user for reads in fastq format. Obviously 3C/Hi-C data is paired-end, and hence there are two fastq files--for read one and for read two--but there is only one box and it apparently only accepts one filename argument. This presents a problem, and I don't know how to proceed. The advanced settings box has a "Paired wise FASTQ" option that can be checked, which I presume relates to this, as well as a "Length paired wise FASTQ" option that is set to 3 by default. I don't know what this means. Does it mean that the FASTQ reads are meant to be supplied in interleaved format in groups of three? If so, this calls for pre-processing for which no instructions are given. Also is it able to handle multiple fastq files for each read? Bowtie simply accepts a comma-separated list, but its unclear what HiC-Box expects. Can it handle gzipped files? Also unclear.
The instructions say to build a pyramid, but GRAAL also has a pyramid building step. Is this redundant? At which step am I supposed to stop with HiC-Box? Instructions are unclear.
A comment related to point 3: If HiC-Box just took bowtie output as its input, there would be no need to ask many of these questions, since bowtie is already well-documented. One problem is HiC-Box packages the functionality of bowtie in an obscure way (a "black box"). Is this necessary? If so, it would be beneficial to explain what is does and why (again related to point 1).
I'm fully aware that in a research environment it's difficult to keep the documentation up to speed with the latest projects--If my comments here seem long-winded it's because I'm trying to help by giving thorough feedback. That said, I'd appreciate any advice/updates you can give. Thanks!
when I run the HiC-Box alignment, I receive the following error:
[...]
start computing biases...
data exist. loading...
done
data collect_done
numpy size shape = (0,)
numpy gc shape = (0,)
np vect size shape = (95581,)
np vect gc shape = (95581,)
Traceback (most recent call last):
File "main.py", line 278, in OnAlign
ncpu=self.ncpu)
File "/Python/HiC-Box/analysis_main.py", line 110, in analyze
hic_bank.gc_size_bias()
File "/Python/HiC-Box/hic_exp.py", line 165, in gc_size_bias
mat_gc, mat_size,steps_gc,steps_length = hic_analysis.gc_size_bias(self.folder_analysis,self.dict_fragments, self.fragments_contacts_file)
File "/Python/HiC-Box/hic_analysis.py", line 203, in gc_size_bias
size_min = numpy_size.min()
File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 29, in _amin
return umr_minimum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation minimum which has no identity
Could you help me figure out what went wrong here?
I'm getting the error shown below. The proximal cause seems to be that hic_exp.py populates the dictionary dict_fragments from a file in the main output directory named "list_contig_names.txt". The reason for the error is the file in my directory is empty, thus the dictionary is empty, and on line 76 we therefore get a key error. I can see on line 39 that this file is generated from a call to bowtie2-inspect -n . The problem for me is there is no genome index.
This is where some more detailed documentation would come in handy. Am I supposed to generate the index myself? If so, this needs to be explicitly stated in the readme. Also what is "Bowtie folder" in the advanced options? When I enter stuff into the advanced options and click "apply" it gives the warning message "Could not find bowtie folder". This also needs to be explained.
Also, as an urelated side-note, 22270 seconds (i.e. 6 hours) seems like a long time for just making a restriction map of a 1.2 Gb genome. This should probably be optimized. I'm guessing your problem is the biopython restriction module. In the past I've noticed that some (but not all) of those functions were super slow, to the point that I just used regular expressions as a workaround. Just an idea.
Restriction map generated in 22270.876373 s
filling list of contigs ..
[]
filling dictionnary of fragments ...
Traceback (most recent call last):
File "main.py", line 278, in OnAlign
ncpu=self.ncpu)
File "/home/tom/Desktop/HiC-Box-master/analysis_main.py", line 102, in analyze
len_paired_wise_fastq)
File "/home/tom/Desktop/HiC-Box-master/hic_exp.py", line 76, in init
dict_fragments[a_tmp[1]].append(int(a_tmp[0]))
KeyError: 'gi|526059867|ref|NW_004823088.1| Melopsittacus undulatus
unplaced genomic scaffold, Melopsittacus_undulatus_6.3
budgerigar_v6.3_scf900160251875, whole genome shotgun sequence'
In the current version the tool can't be launched. There are two reasons.
The script pyramid_sparse.py
contains non ASCII characters, namely this recuring line:
p = ProgressBar('blue', width=20, block='▣', empty='□')
All of these lines are commented out anyway, so deleting them fixes the problem.
The script main.py
imports weave from scipy. However, weave became its own project, so it needs to imported directly, otherwise it can't be found.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.