Git Product home page Git Product logo

bleties's Introduction

Basic Long-read Enabled Toolkit for Interspersed DNA Elimination Studies (BleTIES)

BLETIES logo

DOI Bioconda License

BleTIES is a tool for prediction and targeted assembly of internally eliminated sequences (IESs) in ciliate genomes, using single-molecule long read sequencing. The design and name of the software was inspired by ParTIES.

The required inputs are a ciliate MAC genome assembly, and a long read sequencing library (PacBio subreads or error-corrected CCS reads, or Nanopore reads) mapped onto that assembly. The mapper should report a valid CIGAR string and NM tag (for number of mismatches) per aligned read. The mapping is assumed to be accurate.

Read more in our paper.

Installation

Install released version with Conda

The released versions are distributed via Bioconda, and can be installed with Conda:

# Create new environment called "bleties"
conda create -c conda-forge -c bioconda -n bleties bleties
# Activate environment
conda activate bleties
# Check version and view help message
bleties --version
bleties --help
# Run tests
python -m unittest -v bleties.TestModule

Install development version

If you want to test the latest development version, clone this Git repository, then install with pip.

Dependencies are specified as a Conda environment YAML file env.yaml. Create a Conda environment with the specified dependencies, then install with pip:

git clone [email protected]:Swart-lab/bleties.git
cd bleties
conda env create -f env.yaml -n bleties_dev
conda activate bleties_dev
pip install .

Run tests after installation:

python -m unittest -v bleties.TestModule

Usage

For help, use the -h or --help option, with or without the subworkflow names:

bleties --help
bleties milraa --help
bleties miser --help
bleties milret --help
bleties milcor --help
bleties miltel --help
bleties insert --help

The main module for IES reconstruction is MILRAA. At the moment, it only handles non-scrambled IESs.

Two scripts are available for plotting visualizations of MILRAA and MILCOR results: milcor_plot.py and milraa_plot.py.

Additional useful scripts are in the folder scripts/

Refer to the full documentation for more information.

Citations

BleTIES is research software. Please cite us if you use the software in a publication.

bleties's People

Contributors

kbseah avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

bleties's Issues

Change `ta_pointer_start` and `ta_pointer_end` reporting

These are presently reported for both junction and segment IES features.

However for junction features, which are zero-length, ta_pointer_start equals start coordinate if the IES is bounded by a TA repeat. This results in coordinate for ta_pointer_start being correct for python zero-based numbering system but not for GFF.

For junction features, ta_pointer_end equals ta_pointer_start, which doesn't make sense either.

Maybe just have a ta_offset attribute that states how many bases the start/end coordinates need to be adjusted. This would then be robust against any change in coordinate systems and also be suitable for both insertions and deletions.

Milraa failed in subread mode

Error message:

Traceback (most recent call last):
  File "/ebio/abt2_projects/ag-swart-blepharisma/data/bleties/envs/bleties_env/bin/bleties", line 349, in <module>
    args.func(args)
  File "/ebio/abt2_projects/ag-swart-blepharisma/data/bleties/envs/bleties_env/lib/python3.7/site-packages/bleties/main.py", line 114, in milraa
    args.min_break_coverage, args.min_del_coverage)
  File "/ebio/abt2_projects/ag-swart-blepharisma/data/bleties/envs/bleties_env/lib/python3.7/site-packages/bleties/Milraa.py", line 1425, in reportPutativeIesInsertSubreads
    rname, gffpos, gffpos, consseq, breakpointid)
  File "/ebio/abt2_projects/ag-swart-blepharisma/data/bleties/envs/bleties_env/lib/python3.7/site-packages/bleties/Milraa.py", line 895, in reportAdjustPointers
    self._refgenome[ctg], ins_start, ins_end, consseq, breakpointid)
  File "/ebio/abt2_projects/ag-swart-blepharisma/data/bleties/envs/bleties_env/lib/python3.7/site-packages/bleties/Milraa.py", line 263, in getPointers
    while iesseq[i] == seq[end+i]:
  File "/ebio/abt2_projects/ag-swart-blepharisma/data/bleties/envs/bleties_env/lib/python3.7/site-packages/Bio/Seq.py", line 323, in __getitem__
    return self._data[index]
IndexError: string index out of range

BleTIES v0.1.4
With Contig_9 and Contig_14 in test data

Bin reads by mapping to transcriptome

Like MILCOR but with transcriptome as reference instead, e.g. for use case where no reference genome is available.

Distinguish introns from IESs by GT/AG boundaries, and 100% "retention" of introns (vs. IESs, if the library is only an enrichment).

However splicing may result in erroneous indel calls; for minimap2 mapper, "splice" mode is not suitable for our needs because it assumes RNAseq reads are being mapped onto genomic reference, not the other way around. What we could do is to compare predicted intron positions on different isoforms, if available.

Retention scores systematically underestimated

Both MILRET and MILRAA count only reads that completely span an IES and either contain an insert or do not. If inserts/IESs are too long, the mapping will have a clip instead of an insert. For short IESs this is negligible, but as IESs are longer, a greater proportion of mapped IES+ reads will have clips, but these are currently ignored in calculating the retention score.

Milraa string index out of range

I tried using Milraa (both released version and development version) on Nanopore reads and the Oxytricha Mac genome from https://knot.math.usf.edu/mds_ies_db/downloads.html. I used the --secondary=no and --MD options as instructed when first aligning using minimap2. However, I get a "String index out of range error" as follows:

Traceback (most recent call last):
File ".../.conda/envs/bleties/bin/bleties", line 386, in
args.func(args)
File ".../.conda/envs/bleties/lib/python3.7/site-packages/bleties/main.py", line 152, in milraa
args.subreads_cons_len_threshold)
File ".../.conda/envs/bleties/lib/python3.7/site-packages/bleties/Milraa.py", line 1486, in reportPutativeIesInsertSubreads
rname, gffpos, gffpos, consseq, breakpointid)
File ".../.conda/envs/bleties/lib/python3.7/site-packages/bleties/Milraa.py", line 917, in reportAdjustPointers
self._refgenome[ctg], ins_start, ins_end, consseq, breakpointid)
File ".../.conda/envs/bleties/lib/python3.7/site-packages/bleties/Milraa.py", line 263, in getPointers
while iesseq[i] == seq[end+i]:
File ".../.conda/envs/bleties/lib/python3.7/site-packages/Bio/SeqRecord.py", line 455, in getitem
return self.seq[index]
File ".../.conda/envs/bleties/lib/python3.7/site-packages/Bio/Seq.py", line 323, in getitem
return self._data[index]
IndexError: string index out of range.

Mask selected regions in MILRAA

Add user option to define regions that are to be skipped in initial MILRAA step. For example with GFF file of low complexity regions, telomere regions, because these tend to attract poor quality mappings.

Alternative is to filter the MILRAA GFF output afterwards with a mask definition file.

Multi-segment CDS entries not handled by Insert module

Describe the bug
Insert module does not handle multi-line GFF3 entries, e.g. a CDS that is split across several lines (separated by introns), but where each line shares the same ID attribute. The Gff class in SharedFunctions has to be updated to handle multi-line Gff entries. This was not considered initially because IES entries are always single-line.

Please attach/copy the following:

NA

  • The command that resulted in the error
  • The log file from the BleTIES run (default name: bleties.log)
  • Any error messages that appeared on the console

Version info (please complete the following information):

  • OS: Linux
  • BleTIES version 0.1.10-alpha
  • How you installed BleTIES: Conda

Order of execution of MILRET

It's not clear from the README or program help, whether the normal operation of MILRET should be to provide it with the gff output from MILRAA, or to run it without this - please clarify.

MILRAA report deletions for subreads

Currently, subreads mode only reports inserts relative to reference. CCS mode reports both insertions and deletions relative to the reference.

MILRET muscle version number regexp returning error

Running bleties with the MILRAA option gives:

File "..miniconda3/envs/bleties/lib/python3.7/site-packages/bleties/main.py", line 35, in check_env
    dep_vers['muscle'] = re.search(r"v\S+", dep_vers['muscle']).group(0)
AttributeError: 'NoneType' object has no attribute 'group'

Running the command:

muscle -version

muscle 5.1.linux64 []
Built Feb 24 2022 03:16:15

"Junction" in MILRAA gff3 output clashes with "junction" from mapped reads

I couldn't see the "junction" annotations when I imported them into Geneious. I could however see the ones imported along with the mapped reads from the BAM file. Can we chose a different name for these annotations (e.g. "IES_junction")? I relabelled my "junction" annotations, and was able to see them...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.