Git Product home page Git Product logo

anansnake's Introduction

Anansnake

Link seq2science output to ANANSE with 2 sample tables and one config file.

Installation

mamba create -n anansnake -c bioconda anansnake

Don't forget to activate the conda environment with mamba activate anansnake.

Running anansnake on the example data

The anansnake github repository contains an example folder which can be downloaded to try the workflow. Check it's README for details!

Running anansnake

Anansnake works with seq2science in- & output: The RNA- and ATAC-seq samples.tsv files are the same you've used for seq2science, with one addition (see below). The counts tables are output files without any changes.

The RNA- and ATAC-seq samples are combined via a shared column in the samples.tsv files. In the example data, this is the anansnake column. Which conditions from the anansnake column are compared is set in the config.yaml file, under contrasts.

For files and settings & command line examples you can check out the example folder.

Troubleshooting

ANANSE can take tonnes of memory. If your machine freezes, reduce the number of threads or mem_mb.

anansnake's People

Contributors

siebrenf avatar jgasmits avatar

Stargazers

 avatar

Watchers

Simon van Heeringen avatar

Forkers

jgasmits

anansnake's Issues

error of ggplot2

Hi.

I have being facing this error for many times with different data.

There is a log file under deseq2 folder:

Error: packageggplot2could not be loaded
In addition: Warning message:
packageggplot2was built under R version 4.2.3 
Execution halted

I am not sure which environment it refers to because I have installed ggplot2 in the anansanke evn of conda.

Here is the message from command console.

Activating conda environment: .snakemake/conda/ce7d794d0661510a0c65df6077b773c0_
[Fri Mar 24 20:34:59 2023]
Error in rule deseq2:
    jobid: 9
    input: /mnt/d/32_publication_scANANSE/scANANSE/analysis/samplefile.tsv, /mnt/d/32_publication_scANANSE/scANANSE/analysis/RNA_Counts.tsv
    output: /mnt/d/32_publication_scANANSE/scANANSE/analysis/deseq2/hg38-anansesnake_CD4-Naive_average.diffexp.tsv
    log: /mnt/d/32_publication_scANANSE/scANANSE/analysis/deseq2/log_anansesnake_CD4-Naive_average.txt (check log file(s) for error details)
    shell:

        outdir=$(dirname /mnt/d/32_publication_scANANSE/scANANSE/analysis/deseq2/hg38-anansesnake_CD4-Naive_average.diffexp.tsv)

        # for the log
        mkdir -p $outdir

        deseq2science         anansesnake_CD4-Naive_average         /mnt/d/32_publication_scANANSE/scANANSE/analysis/samplefile.tsv         /mnt/d/32_publication_scANANSE/scANANSE/analysis/RNA_Counts.tsv         $outdir         --assembly hg38         > /mnt/d/32_publication_scANANSE/scANANSE/analysis/deseq2/log_anansesnake_CD4-Naive_average.txt 2>&1

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Errors with the real run of the example provided

Hello @JGASmits,

Although the dry run appears to have worked the real one did not.
Here is the log file:

(anansnake) iMac-Pro:anansnake pediatrics$ less Complete log: .snakemake/log/2023-08-31T143302.657342.snakemake.log
Complete: No such file or directory
log:: No such file or directory
Press RETURN to continue

plot_type : png

Resources
mem_mb : 48000
_cores : 12
deseq2 : 1

Conditions
group2 :
RNA-seq samples: ['1k-cell-1', '1k-cell-2', 'GSM1483740']
ATAC-seq samples: ['GSM3756606', 'GSM3756607', 'GSM3756608']
group1 :
RNA-seq samples: ['128-cell-1', '128-cell-2', 'GSM1483739']
ATAC-seq samples: ['GSM3756599', 'GSM3756600']

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 12
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=48000, deseq2=1
Job stats:
job count min threads max threads


all 1 1 1
binding 2 1 1
deseq2 2 1 1
influence 2 1 1
maelstrom 1 12 12
motif2factors 1 12 12
network 2 1 1
pfmscorefile 1 12 12
plot 2 1 1
total 14 1 12

Select jobs to execute...

[Thu Aug 31 14:33:05 2023]
rule motif2factors:
input: /Users/pediatrics/anansnake/GRCz11
output: /Users/pediatrics/anansnake/example/outdir/gimme/GRCz11.gimme.vertebrate.v5.0.pfm
log: /Users/pediatrics/anansnake/example/outdir/gimme/log_GRCz11_m2f.txt
jobid: 5
reason: Missing output files: /Users/pediatrics/anansnake/example/outdir/gimme/GRCz11.gimme.vertebrate.v5.0.pfm
threads: 12
resources: tmpdir=/var/folders/2c/zzjsgs_53vqflzjl28hf1x7r0000gn/T

Activating conda environment: .snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_
Activating conda environment: .snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_
[Thu Aug 31 14:33:36 2023]
Error in rule motif2factors:
jobid: 5
input: /Users/pediatrics/anansnake/GRCz11
output: /Users/pediatrics/anansnake/example/outdir/gimme/GRCz11.gimme.vertebrate.v5.0.pfm
log: /Users/pediatrics/anansnake/example/outdir/gimme/log_GRCz11_m2f.txt (check log file(s) for error message)
conda-env: /Users/pediatrics/anansnake/.snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_

RuleException:
CalledProcessError in line 24 of /Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/gimme.smk:
Command 'source /Users/pediatrics/anaconda3/envs/anansnake/bin/activate '/Users/pediatrics/anansnake/.snakemake/conda/3f88efe941f72bcdb4d5867b0d6db92f_'; set -euo pipefail; python /Users/pediatrics/anansnake/.snakemake/scripts/tmp5kcimtt8.motif2factors.py' returned non-zero exit status 1.
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/gimme.smk", line 24, in __rule_motif2factors
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-08-31T143302.657342.snakemake.log

Your input would be much appreciated!

Thank you!

run gets stuck in binding step (anansesnake)

i submit a job based on the anansesnake and doent seem like it's progressing. This is the log file of the job after 120 hours on 6 cores. Similar was after 40 hours using 40 cores.

Config
rna_samples            : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/samplefile.tsv
rna_tpms               : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/TPM.tsv
rna_counts             : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/RNA_Counts.tsv
atac_samples           : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/samplefile.tsv
atac_counts            : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv
genome                 : hg38
result_dir             : /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6
contrasts              : ['anansesnake_B-cells_average', 'anansesnake_agg-type_average', 'anansesnake_NK-cells_average', 'anansesnake_Myeloid-cells_average']
database               : gimme.vertebrate.v5.0
jaccard                : 0.1
edges                  : 500000
padj                   : 0.05
plot_type              : png
tmp_dir                : None

Resources
mem_mb                 : 60000
_cores                 : 6
deseq2                 : 1

Conditions
B-cells                :
  RNA-seq samples:  ['B-cells']
  ATAC-seq samples: ['B-cells']
average                :
  RNA-seq samples:  ['average']
  ATAC-seq samples: ['average']
agg-type               :
  RNA-seq samples:  ['agg-type']
  ATAC-seq samples: ['agg-type']
NK-cells               :
  RNA-seq samples:  ['NK-cells']
  ATAC-seq samples: ['NK-cells']
Myeloid-cells          :
  RNA-seq samples:  ['Myeloid-cells']
  ATAC-seq samples: ['Myeloid-cells']

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 6
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=60000, deseq2=1
Job stats:
job              count    min threads    max threads
-------------  -------  -------------  -------------
all                  1              1              1
binding              5              1              1
influence            4              1              1
maelstrom            1              6              6
motif2factors        1              6              6
network              5              1              1
pfmscorefile         1              6              6
plot                 4              1              1
total               22              1              6

Select jobs to execute...

[Mon Mar 25 18:18:31 2024]
rule motif2factors:
    input: /beegfs/desy/user/nourisaj/genomes/hg38
    output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
    log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/log_hg38_m2f.txt
    jobid: 5
    reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
    threads: 6
    resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
[Mon Mar 25 18:18:44 2024]
Finished job 5.
1 of 22 steps (5%) done
Select jobs to execute...

[Mon Mar 25 18:18:45 2024]
rule maelstrom:
    input: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm, /beegfs/desy/user/nourisaj/genomes/hg38
    output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38-maelstrom
    log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/log_hg38_maelstrom.txt
    jobid: 25
    reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38-maelstrom; Input files updated by another job: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
    threads: 6
    resources: tmpdir=/tmp, mem_mb=40000

Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
[Mon Mar 25 19:20:17 2024]
Finished job 25.
2 of 22 steps (9%) done
Select jobs to execute...

[Mon Mar 25 19:20:17 2024]
rule pfmscorefile:
    input: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm, /beegfs/desy/user/nourisaj/genomes/hg38
    output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv
    log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/log_hg38_pfmscorefile.txt
    jobid: 6
    reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv; Input files updated by another job: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
    threads: 6
    resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/b43fe9c7085f26662cb0116147fff2a2_
[Mon Mar 25 19:56:53 2024]
Finished job 6.
3 of 22 steps (14%) done
Select jobs to execute...

[Mon Mar 25 19:56:53 2024]
rule binding:
    input: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/Peak_Counts.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv, /beegfs/desy/user/nourisaj/genomes/hg38
    output: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/binding/Myeloid-cells.h5
    log: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/binding/log_Myeloid-cells.txt
    jobid: 23
    benchmark: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/benchmarks/binding_Myeloid-cells.txt
    reason: Missing output files: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/binding/Myeloid-cells.h5; Input files updated by another job: /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/pfmscorefile.tsv, /beegfs/desy/user/nourisaj/op_multiomics_grn/output/infer/ananse6/gimme/hg38.gimme.vertebrate.v5.0.pfm
    wildcards: condition=Myeloid-cells
    resources: tmpdir=/tmp, mem_mb=40000

Activating conda environment: .snakemake/conda/d744163a4690c04ba52f3bf00737fc7a_
slurmstepd: error: *** JOB 6311320 ON max-wn050 CANCELLED AT 2024-03-30T18:18:31 DUE TO TIME LIMIT ***```

Why does anansnake need to create a new env every time it runs?

Thanks for the work.

I am running the example provided. But I have one question: Can't it create .snakemake folder?

I checked the folder, and there are conda envs inside. The Downloading and installing remote packages process can take more than 1 hour. And then it is most likely that it will throw an error, saying something like one R package is missing.

This has made me very blue because I do not which packages are going to be used.

image

image

Problem with running the example file

@siebrenf @JGASmits
Hi,
I'm encountering problems with running the example file.
The error reads as follows:

could not find 'rna_samples' in /Users/pediatrics/Desktop/R_projects/scANANSE/example/rna_samples.tsv
SystemExit in line 29 of /Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/configuration.smk:
1
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/Snakefile", line 4, in
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/configuration.smk", line 29, in

Can either of you help me get past this problem.

Installation issues

Hello!

Is it absolutely essential to use mamba for this purpose?

I was trying to incorporate anansnake to run scANANSE but ran into the following error:

Error in rule maelstrom:
jobid: 40
input: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/Peak_Counts.tsv, /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/mm10.gimme.vertebrate.v5.0.pfm, /Users/pediatrics/Desktop/R_projects/scANANSE/data/mm10
output: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/mm10-maelstrom
log: /Users/pediatrics/Desktop/R_projects/scANANSE/analysis/gimme/log_mm10_maelstrom.txt (check log file(s) for error message)
conda-env: /Users/pediatrics/.snakemake/conda/52b8cb6f4a78ac804c0afe54b1ecb2c2_

RuleException:
CalledProcessError in line 80 of /Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/gimme.smk:
Command 'source /Users/pediatrics/anaconda3/envs/anansnake/bin/activate '/Users/pediatrics/.snakemake/conda/52b8cb6f4a78ac804c0afe54b1ecb2c2_'; set -euo pipefail; python /Users/pediatrics/.snakemake/scripts/tmpj_rlppgz.maelstrom.py' returned non-zero exit status 1.
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/site-packages/anansnake/rules/gimme.smk", line 80, in __rule_maelstrom
File "/Users/pediatrics/anaconda3/envs/anansnake/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-08-23T152751.937220.snakemake.log

Can you help me understand where the problem lies?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.