Git Product home page Git Product logo

rnanue's People

Contributors

riasc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rnanue's Issues

Error when running RNAnue without control sample

Hi @ChristopherAdelmann,

I found the following issue when control data is not provided --ctrls = (left empty in the config file). The preproc works but when applying the following steps like align this reports the following errors:

[2024-06-27 10:28:30] ctrls: "/data/results_20160829/preproc/ctrls" [2024-06-27 10:28:30] ### ERROR - "/data/results_20160829/preproc/ctrls" has not been found in the filesystem!

The reason for that is that in Data.cpp the ctrls and trtms fields are just forwarded to the GetGroupsPath function.

RNAnue/src/Data.cpp

Lines 37 to 38 in 995cb80

GroupsPath groups = getGroupsPath(ctrlsPath, trtmsPath);
getCondition(groups);

So you might want to include a check for all the steps to make sure that --ctrls in provided.. like so?

  fs::path ctrlsPath = "";
  if(params["ctrls"].as<std::string>() != "") {
      ctrlsPath = fs::path(params["outdir"].as<std::string>()) / "preproc/ctrls";
  }
  fs::path trtmsPath = fs::path(params["outdir"].as<std::string>()) / "preproc/trtms";

Also when substituting:

RNAnue/src/Data.cpp

Lines 136 to 137 in 995cb80

std::cout << helper::getTime() << "### ERROR - " << group.second << " has not been found in the filesystem!\n";
exit(EXIT_FAILURE);

with:

std::cout << "has not been found in the filesystem! ### ERROR ### \n";

The output should appear a bit nicer... e.g.,

[2024-06-27 10:28:30] ctrls: "/data/results_20160829/preproc/ctrls" has not been found in the filesystem! ### ERROR ###

Preproc Error SLURM

When using the image on singularity with SLURM I'm getting a terminate recusive error.

Minimal dataset/example for testing

Dear Richard (I believe you are the only developer, right?), I've been trying to run RNAnue on the provided container unsuccessfully.

  • I have set up a params.cfg file, and I am running the container via apptainer (previously called singularity):
apptainer exec --bind /data/results/RNAnue:/mnt/output/ --bind /data/reads:/mnt/data/reads --bind /data/cfg_files:/mnt/cfg_files --bind /data/genomes:/mnt/data/genomes rnanue_latest.sif /RNAnue/build/RNAnue allign  --config /mnt/cfg_files/RNAnue/params.cfg
  • my params.cfg file looks a bit like this:
### GENERAL
readtype = SE # paired-end (PE) or single-end (SE)

# absolute path of dirs containing the raw reads (additional dir for each library)
trtms = /mnt/data/reads
outdir = /mnt/output

threads = 48 # number of threads 
quality = 20 # lower limit for the average quality (Phred Quality Score) of the reads
mapquality = 20 # lower limit for the average quality (Phred Quality Score) of the alignment
minlen = 20 # minimum length of the reads
splicing = 0 # include splicing (=1) or not (=0)

### ALIGNMENT (forwarded to segemehl.x)
dbref = /mnt/data/genomes/genome.fasta
accuracy = 90 # min percentage of matches per read in semi-global alignment
minfragsco = 15 # min score of a spliced fragment 
minfraglen = 15 # min length of a spliced fragment
minsplicecov = 80 # min coverage for spliced transcripts
exclclipping = 0 # exclude soft clipping from 

### SPLIT READ CALLING 
sitelenratio = 0.0
cmplmin = 0.0 # complementarity cutoff - consider only split reads that exceed cmplmin
nrgmax = 0 # hybridization energy cutoff - consider only split reads that fall beneath nrgmax

### CLUSTERING
clust = 1 # clustering of the split reads can either be omitted (=0) or included (=1)
clustdist = 0 # minimum distance between clusters

### ANALYIS
# specify the annotations of the organism of interest (optional)
# features = /Users/riasc/Documents/work/projects/RNAnue/build/ecoli_BL21_DE3.gff # GFF3 feature file

# OUTPUT
stats = 1 # produce a statistics of the libraries
outcnt = 1 # (additionally) produce a count table as output
outjgf = 1 # (additionally) produce a JSON graph file for visualization
  • I get the following error message (which I can't quite debug on my own):
RNAnue v0.1.0 - Detect RNA-RNA interactions from Direct-Duplex-Detection (DDD) data.
reads the data to align
skipping preprocessing
terminate called after throwing an instance of 'boost::wrapexcept<boost::bad_any_cast>'
  what():  boost::bad_any_cast: failed conversion using boost::any_cast
Aborted

Would you be able to help me run the tool? Maybe you could provide a minimal dataset for testing?

Errors when running the RNAnue compiled from source

Hi, Rich

I create a separate issue for the running the non-docker RNanue build (compiled from source).
I am encountering the following errors.

  1. RNAnue complete -c crashes.
    Call:

RNAnue complete -c /data/meyer/egor/influenza_splash/2cimpl/RNAnue_pipeline/params_clean_bin
Output:
terminate called after throwing an instance of 'boost::wrapexceptboost::property_tree::ptree_bad_path'
what(): No such node (complete)

I attach the full stderr and stdout for this error.
The config file "params_clean_bin" is also attached.

  1. Clustering step produces no output. Only empty folders are created.

Call:

(base) [esemenc@max028:/data/egor/influenza_splash/2cimpl/RNAnue_pipeline/updated] $ RNAnue clustering -c /data/egor/influenza_splash/2cimpl/RNAnue_pipeline/params_clean_bin

Output:

after parameters
RNAnue v0.1.0 - Detect RNA-RNA interactions from Direct-Duplex-Detection (DDD) data.
reads the data to align
*** collect the data
### WARNING - this call runs RNAnue without control data
### WARNING - "--ctrls" has not been set
trtms: "/data/egor/influenza_splash/2cimpl/RNAnue_pipeline/rnanue-output/align/trtms"
...has been found in the filesystem!
wsn1_a
create Base
cluster the split reads
create clustering object
InAndOut
"/data/egor/influenza_splash/2cimpl/RNAnue_pipeline/rnanue-output" already exists.
"/data/egor/influenza_splash/2cimpl/RNAnue_pipeline/rnanue-output/clustering" already exists.
"/data/egor/influenza_splash/2cimpl/RNAnue_pipeline/rnanue-output/clustering/trtms" already exists.
"/data/egor/influenza_splash/2cimpl/RNAnue_pipeline/rnanue-output/clustering/trtms/wsn1_a" already exists.
"Neeeeinnnnnn" - Biggi
There are no errors in stderr

  1. This is not a big issue, but probably still unwanted behavior. The --help and -h can not be shown if subprogram and config files are not specified

(base) [esemenc@max028:/data/egor/influenza_splash/2cimpl/RNAnue_pipeline/updated] $ RNAnue -h
after parameters
configuration fileparams.cfgcould not be opened!
(base) [esemenc@max028:/data/egor/influenza_splash/2cimpl/RNAnue_pipeline/updated] $ RNAnue --help
after parameters

But works fine with the following call:

RNAnue --subcall clustering -c /data/egor/influenza_splash/2cimpl/RNAnue_pipeline/params_clean_bin -h

Attached files:
clustreing.stdout.txt
complete.stderr.txt
complete.stdout.txt
params_clean_bin.txt

Could you provide any test data that could be used to troubleshoot the clustering step? Currently, it does not output completion status or error messages that I can use to figure out the problem. There is a place for something wrong about my input data.

Docker Image launch errors

I tried running Docker image available at Docker Hub under tag: v0.2.
I have not been able to run the latest version because I do not have a machine with arm64/v8.

I have tried 2 options:

sudo docker run -v ./requirements/:/tmp/ cobirna/rnanue:v0.2 /RNAnue/build/RNAnue preproc --config tmp/params.cfg

and

sudo docker run -v ./requirements/:/tmp/ cobirna/rnanue:v0.2 /RNAnue/build/RNAnue complete --config tmp/params.cfg

For "complete" run ,no software is started at all, only an output folder is created.

RNAnue v0.2.0 - Detect RNA-RNA interactions from Direct-Duplex-Detection (DDD) data.
[2024-05-30 12:33:57] Create directory to store the results (specified via --outdir)
[2024-05-30 12:33:57] The directory "/tmp/output/" already exists
[2024-05-30 12:33:57] The directory "/tmp/output/complete" already exists
[2024-05-30 12:33:57] The directory "/tmp/output/tmp" already exists

For "preproc" run ,I am obtaining boost library error:

RNAnue v0.2.0 - Detect RNA-RNA interactions from Direct-Duplex-Detection (DDD) data.
terminate called after throwing an instance of 'boost::wrapexcept<boost::bad_any_cast>'
  what():  boost::bad_any_cast: failed conversion using boost::any_cast

My params.cfg file is:

### GENERAL
readtype = SE # paired-end (PE) or single-end (SE)

# absolute path of dirs containing the raw reads (additional dir for each library)
trtms = /tmp/data_check/ # treatments 
#ctrls = /Users/riasc/Documents/work/projects/RNAnue/rawreads_100k/ctrls/ # controls 
outdir = /tmp/output/ # dir 

threads = 1 # number of threads 
quality = 20 # lower limit for the average quality (Phred Quality Score) of the reads
mapquality = 20 # lower limit for the average quality (Phred Quality Score) of the alignment
minlen = 20 # minimum length of the reads
splicing = 0 # include splicing (=1) or not (=0)

### DATA PREPROCESSING 
preproc = 1 # preprocessing of the reads can be either omitted (=0) or included (=1) 
modetrm = 1 # mode of the trimming: only 5' (=0) and 3' (=1) or both (=2) 
# sequence preceeding 5'-end (N for arbitrary bp) in .fa format
adpt5 =  
# sequence succeeding 3'-end (N for arbitrary bp) in fa. format
adpt3 = /RNAnue/build/adapters3.fa 
wtrim = 0 # on whether (=1) or not (=0) to include window quality trimming
# rate of mismatches allowed when aligning adapters with read sequence 
mmrate = 0.1 # e.g., 0.1 on a sequence length of 10 results in
wsize = 3 # window size 
minovlps = 5 # minimum overlaps required when merging paired-end reads

### ALIGNMENT (forwarded to segemehl.x)
dbref = /tmp/genome_gencode/human/GRCh38.p13.genome.fa
accuracy = 90 # min percentage of matches per read in semi-global alignment
minfragsco = 15 # min score of a spliced fragment 
minfraglen = 15 # min length of a spliced fragment
minsplicecov = 80 # min coverage for spliced transcripts
exclclipping = 0 # exclude soft clipping from 

### SPLIT READ CALLING 
sitelenratio = 0.0
cmplmin = 0.0 # complementarity cutoff - consider only split reads that exceed cmplmin
nrgmax = 0 # hybridization energy cutoff - consider only split reads that fall beneath nrgmax

### CLUSTERING
clust = 1 # clustering of the split reads can either be omitted (=0) or included (=1)
clustdist = 0 # minimum distance between clusters

### ANALYIS
# specify the annotations of the organism of interest (optional)
features = /tmp/genome_gencode/human/gencode.v42.chr_patch_hapl_scaff.annotation.gff3 # GFF3 feature file

# OUTPUT
stats = 1 # produce a statistics of the libraries
outcnt = 1 # (additionally) produce a count table as output
outjgf = 1 # (additionally) produce a JSON graph file for visualization

This is further confirmation that, unfortunately, the problem is not related to the transformation of the Docker container into a Singularity image file.

Running RNAnue from container

Hello,

Thank you for creating the pipeline.

I am encountering problems when trying to run the RNAnue from the docker container
I have PE reads and use the config file created from the template provided here. The file is attached

When I run:
singularity exec rnanue_latest.sif /RNAnue/build/RNAnue subcall -c /data/meyer/egor/RNAnue_pipeline/params.cfg

I got the following output whtout downstream errors/warnings or other output.

	after parameters
	RNAnue v0.1.0 - Detect RNA-RNA interactions from Direct-Duplex-Detection (DDD) data.
	create Base
	"Ja gut aehh..." - Randy OR  <some random citation>
  • Since it does not output any error log information, I cannot troubleshoot the problem of determine its source.

  • Could you provide some examples of how one can run the pipeline partially. i.e command lines for different parts and list of the mandatory options for each? Or the any tips on configuring the test run that one could make?

  • The call of RNAnue subcall --config on github page is the only example of usage so far. I believe some docu on running the separate parts of RNAnue pipeline might be generally usefull for people.

  • I also had problem with unrecognized parameters in the original config template file.
    Unless corresponding lines are not removed/commented from the config file, running
    /RNAnue/build/RNAnue subcall -c /data/meyer/egor/RNAnue_pipeline/params.cfg
    produces the errors like:

    after parameters
    unrecognised option 'mapquality'
    unrecognised option 'wsize'
    unrecognised option 'exclclipping'
    

Here is some details on my configuration:

  • The HPC evrironment that I am using has the Sigularity container toolset. It should be comptible with the docker. Invoking the RNAnue from container seems to be responding, i.e I can read the help messages and recieve errors when the options lines are purposly made incorrect.

  • The gff and fna files that I use are downloaded from ncbi database and not changed ecxept '3' to '.gff' has been added to make fileformat '.gff3'

  • The structure of paired-end reads (one PE sample, no controls) is following

	/data/egor/RNAnue_pipeline/paired/trtms/R1.fastq
	/data/egor/RNAnue_pipeline/paired/trtms/R1.fastq

In the config file:
 _trtms = /data/egor/RNAnue_pipeline/paired/trtms_ 

I would greately appreciate any help

params.cfg.txt

Integrate multi splits

Integrating multi-splits while accounting for mapping uncertainty (e.g., EM algorithm) may enhance the overall yield and prediction accuracy.

Fails to build from source

I'm using boost 1.77.0, segemehl 0.3.4, viennarna 2.4.18, seqan 3.0.3, and GCC 10.
Unfortunately RNAnue fails to build:

In file included from /gnu/store/gi3jk5wmnh6m7y55hqsd9jkwm1vvg4fh-seqan-3.0.3/include/seqan3/io/detail/record.hpp:19,
                 from /gnu/store/gi3jk5wmnh6m7y55hqsd9jkwm1vvg4fh-seqan-3.0.3/include/seqan3/io/sam_file/input.hpp:34,
                 from /gnu/store/gi3jk5wmnh6m7y55hqsd9jkwm1vvg4fh-seqan-3.0.3/include/seqan3/io/sam_file/all.hpp:37,
                 from /gnu/store/gi3jk5wmnh6m7y55hqsd9jkwm1vvg4fh-seqan-3.0.3/include/seqan3/io/alignment_file/all.hpp:16,
                 from /tmp/guix-build-rnanue-0.1.0-2.f8696dd.drv-0/source/source/../include/SplitReadCalling.hpp:26,
                 from /tmp/guix-build-rnanue-0.1.0-2.f8696dd.drv-0/source/source/SplitReadCalling.cpp:1:
/gnu/store/gi3jk5wmnh6m7y55hqsd9jkwm1vvg4fh-seqan-3.0.3/include/seqan3/io/record.hpp:354:30: note: declared here
  354 | SEQAN3_DEPRECATED_310 auto & get(record<field_types, field_ids> & r)
      |                              ^~~
/tmp/guix-build-rnanue-0.1.0-2.f8696dd.drv-0/source/source/SplitReadCalling.cpp:180:35: error: no matching function for call to ‘seqan3::cigar::cigar(<brace-enclosed initializer list>)’
  180 |                     seqan3::cigar cigarElement{cigarOpSize, cigarOp};
      |                                   ^~~~~~~~~~~~
In file included from /gnu/store/gi3jk5wmnh6m7y55hqsd9jkwm1vvg4fh-seqan-3.0.3/include/seqan3/alphabet/cigar/cigar.hpp:19,
                 from /gnu/store/gi3jk5wmnh6m7y55hqsd9jkwm1vvg4fh-seqan-3.0.3/include/seqan3/io/sam_file/detail/cigar.hpp:22,
                 from /gnu/store/gi3jk5wmnh6m7y55hqsd9jkwm1vvg4fh-seqan-3.0.3/include/seqan3/io/sam_file/format_bam.hpp:25,
                 from /gnu/store/gi3jk5wmnh6m7y55hqsd9jkwm1vvg4fh-seqan-3.0.3/include/seqan3/io/sam_file/all.hpp:34,
                 from /gnu/store/gi3jk5wmnh6m7y55hqsd9jkwm1vvg4fh-seqan-3.0.3/include/seqan3/io/alignment_file/all.hpp:16,
                 from /tmp/guix-build-rnanue-0.1.0-2.f8696dd.drv-0/source/source/../include/SplitReadCalling.hpp:26,
                 from /tmp/guix-build-rnanue-0.1.0-2.f8696dd.drv-0/source/source/SplitReadCalling.cpp:1:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.