Git Product home page Git Product logo

hic-box's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hic-box's Issues

Unable to run main.py + suggestions

Good afternoon!

I am trying to explore and use HiC-box to build a contact map from a metagenomics sample from which we have both a shotgun sequencing library and a HiC library availables, and hopefully extend its analysis with GRAAL. However, I am running some trouble with the first steps of the installation, more specifically, running main.py script.

When I call python main.py,
I get this error: Unable to access the X Display, is $DISPLAY set properly?

I have done some research and I think this is an error related to wxpython module, but I have tried to install several versions of it and I have not been able to solve it. Could you point me out how to proceed here? Any idea is very welcome!

Apart from this, I have a couple of suggestions to add in the initial documentation:

  1. Specify that this requires a Python2.7 environment (maybe a bit obvious, but several users already rely on Python3)

  2. Add cython (as it is required by mirnylib) and wxpython to the python dependencies required by HiC-Box, as it is required by wx module (here is where I suspect I am having my issues). Both of them can be installed easiliy through conda

conda install -c anaconda wxpython cython

Nothing else, I hope to hear from you soon, and I will be very happy to provide you with more feedback if this keeps working! Thanks for the development, the tool looks promising!

Best,
Juanma

Insufficient documentation

I'd like to use HiC-Box to prepare my data for genome finishing using GRAAL, but I have a number of questions and points to raise about the documentation.

  1. The main page doesn't describe what the software does.

  2. There is no link from the HiC-Box page to the GRAAL page, or from the GRAAL page to HiC-Box. It is not apparent that the two softwares are designed to work together, even though they do.

  3. There is no guidance for people who have already mapped and/or processed their sequencing reads and want to start HiC-Box downstream of the mapping step using, for example, a bam file or bowtie output.

  4. There is no description of the advanced parameters or guidance as to how to use them. The README says "tweak if needed" but doesn't say how to determine when tweaking is needed. What is "Total reads length"? Can it not handle reads of different lengths, for example from different experiments, or due to trimming? What is "Tag length"? My reads don't have a tag. Is HiC-Box going to try and trim 6 bp off anyway?

  5. Upon running "python main.py" a window pops up that prompts the user for reads in fastq format. Obviously 3C/Hi-C data is paired-end, and hence there are two fastq files--for read one and for read two--but there is only one box and it apparently only accepts one filename argument. This presents a problem, and I don't know how to proceed. The advanced settings box has a "Paired wise FASTQ" option that can be checked, which I presume relates to this, as well as a "Length paired wise FASTQ" option that is set to 3 by default. I don't know what this means. Does it mean that the FASTQ reads are meant to be supplied in interleaved format in groups of three? If so, this calls for pre-processing for which no instructions are given. Also is it able to handle multiple fastq files for each read? Bowtie simply accepts a comma-separated list, but its unclear what HiC-Box expects. Can it handle gzipped files? Also unclear.

  6. The instructions say to build a pyramid, but GRAAL also has a pyramid building step. Is this redundant? At which step am I supposed to stop with HiC-Box? Instructions are unclear.

  7. A comment related to point 3: If HiC-Box just took bowtie output as its input, there would be no need to ask many of these questions, since bowtie is already well-documented. One problem is HiC-Box packages the functionality of bowtie in an obscure way (a "black box"). Is this necessary? If so, it would be beneficial to explain what is does and why (again related to point 1).

I'm fully aware that in a research environment it's difficult to keep the documentation up to speed with the latest projects--If my comments here seem long-winded it's because I'm trying to help by giving thorough feedback. That said, I'd appreciate any advice/updates you can give. Thanks!

ValueError during alignment

when I run the HiC-Box alignment, I receive the following error:

[...]
start computing biases...
data exist. loading...
done
data collect_done
 numpy size shape =  (0,)
 numpy gc shape =  (0,)
 np vect size shape =  (95581,)
 np vect gc shape =  (95581,)
Traceback (most recent call last):
  File "main.py", line 278, in OnAlign
    ncpu=self.ncpu)
  File "/Python/HiC-Box/analysis_main.py", line 110, in analyze
    hic_bank.gc_size_bias()
  File "/Python/HiC-Box/hic_exp.py", line 165, in gc_size_bias
    mat_gc, mat_size,steps_gc,steps_length =  hic_analysis.gc_size_bias(self.folder_analysis,self.dict_fragments, self.fragments_contacts_file)
  File "/Python/HiC-Box/hic_analysis.py", line 203, in gc_size_bias
    size_min = numpy_size.min()
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 29, in _amin
    return umr_minimum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation minimum which has no identity

Could you help me figure out what went wrong here?

bug related to bowtie indexing

I'm getting the error shown below. The proximal cause seems to be that hic_exp.py populates the dictionary dict_fragments from a file in the main output directory named "list_contig_names.txt". The reason for the error is the file in my directory is empty, thus the dictionary is empty, and on line 76 we therefore get a key error. I can see on line 39 that this file is generated from a call to bowtie2-inspect -n . The problem for me is there is no genome index.

This is where some more detailed documentation would come in handy. Am I supposed to generate the index myself? If so, this needs to be explicitly stated in the readme. Also what is "Bowtie folder" in the advanced options? When I enter stuff into the advanced options and click "apply" it gives the warning message "Could not find bowtie folder". This also needs to be explained.

Also, as an urelated side-note, 22270 seconds (i.e. 6 hours) seems like a long time for just making a restriction map of a 1.2 Gb genome. This should probably be optimized. I'm guessing your problem is the biopython restriction module. In the past I've noticed that some (but not all) of those functions were super slow, to the point that I just used regular expressions as a workaround. Just an idea.

Restriction map generated in 22270.876373 s
filling list of contigs ..
[]
filling dictionnary of fragments ...
Traceback (most recent call last):
File "main.py", line 278, in OnAlign
ncpu=self.ncpu)
File "/home/tom/Desktop/HiC-Box-master/analysis_main.py", line 102, in analyze
len_paired_wise_fastq)
File "/home/tom/Desktop/HiC-Box-master/hic_exp.py", line 76, in init
dict_fragments[a_tmp[1]].append(int(a_tmp[0]))
KeyError: 'gi|526059867|ref|NW_004823088.1| Melopsittacus undulatus
unplaced genomic scaffold, Melopsittacus_undulatus_6.3
budgerigar_v6.3_scf900160251875, whole genome shotgun sequence'

Non-ASCII characters and faulty import

In the current version the tool can't be launched. There are two reasons.

  1. The script pyramid_sparse.py contains non ASCII characters, namely this recuring line:
    p = ProgressBar('blue', width=20, block='▣', empty='□')
    All of these lines are commented out anyway, so deleting them fixes the problem.

  2. The script main.py imports weave from scipy. However, weave became its own project, so it needs to imported directly, otherwise it can't be found.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.