Git Product home page Git Product logo

fetchngs's Introduction

nf-core/fetchngs

GitHub Actions CI Status GitHub Actions Linting StatusAWS CICite with Zenodonf-test

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

Get help on SlackFollow on TwitterFollow on MastodonWatch on YouTube

Introduction

nf-core/fetchngs is a bioinformatics pipeline to fetch metadata and raw FastQ files from both public databases. At present, the pipeline supports SRA / ENA / DDBJ / GEO ids (see usage docs).

nf-core/fetchngs metro map

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

ids.csv:

SRR9984183
SRR13191702
ERR1160846
ERR1109373
DRR028935
DRR026872

Each line represents a database id. Please see next section for supported ids.

Now, you can run the pipeline using:

nextflow run nf-core/fetchngs \
   -profile <docker/singularity/.../institute> \
   --input ids.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Supported ids

Via a single file of ids, provided one-per-line (see example input file) the pipeline performs the following steps:

SRA / ENA / DDBJ / GEO ids

  1. Resolve database ids back to appropriate experiment-level ids and to be compatible with the ENA API
  2. Fetch extensive id metadata via ENA API
  3. Download FastQ files:
    • If direct download links are available from the ENA API:
      • Fetch in parallel via wget and perform md5sum check (--download_method ftp; default).
      • Fetch in parallel via aspera-cli and perform md5sum check. Use --download_method aspera to force this behaviour.
    • Otherwise use sra-tools to download .sra files and convert them to FastQ. Use --download_method sratools to force this behaviour.
  4. Collate id metadata and paths to FastQ files in a single samplesheet

Pipeline output

The columns in the output samplesheet can be tailored to be accepted out-of-the-box by selected nf-core pipelines (see usage docs), these currently include:

To see the the results of a test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/fetchngs was originally written by Harshil Patel (@drpatelh) from Seqera Labs, Spain and Jose Espinosa-Carrasco (@JoseEspinosa) from The Comparative Bioinformatics Group at The Centre for Genomic Regulation, Spain. Support for download of sequencing reads without FTP links via sra-tools was added by Moritz E. Beber (@Midnighter) from Unseen Bio ApS, Denmark.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #fetchngs channel (you can join with this invite).

Citations

If you use nf-core/fetchngs for your analysis, please cite it using the following doi: 10.5281/zenodo.5070524

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

fetchngs's People

Contributors

adamrtalbot avatar alaindomissy avatar daisyhan97 avatar davidecarlson avatar drpatelh avatar edmundmiller avatar ejseqera avatar ewels avatar jahdoos avatar jfy133 avatar joseespinosa avatar kevinmenden avatar maxibor avatar maxulysse avatar mcmandr avatar midnighter avatar nf-core-bot avatar nicolae06 avatar ojziff avatar robsyme avatar sateeshperi avatar sirclockalot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fetchngs's Issues

Add support for prefetch argument `--max-size`

Description of feature

I was trying to download some data, and apparently one of the files was 'too big' for the sra tools prefetch thingy.

Seems like the solution is given in the message. I will try specifying it with a custom modules.conf, but if it works I think it would be good to add inbuilt support ๐Ÿ‘

Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR059917)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR059917)` terminated with an error exit status (3)

Command executed:

  eval "$(vdb-config -o n NCBI_SETTINGS | sed 's/[" ]//g')"
  if [[ ! -f "${NCBI_SETTINGS}" ]]; then
      mkdir -p "$(dirname "${NCBI_SETTINGS}")"
      printf '/LIBS/GUID = "44fc8155-3f0b-4ef8-a7c2-6d375100ae27"\n/libs/cloud/report_instance_identity = "true"\n' > "${NCBI_SETTINGS}"
  fi
  
  retry_with_backoff.sh prefetch \
       \
      --progress \
      SRR059917
  
  vdb-validate SRR059917
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH":
      sratools: $(prefetch --version 2>&1 | grep -Eo '[0-9.]+')
  END_VERSIONS

Command exit status:
  3

Command output:
  
  2021-12-13T11:41:44 prefetch.2.11.0: 1) 'SRR059917' (34GB) is larger than maximum allowed: skipped 
  
  Download of some files was skipped because they are too large
  You can change size download limit by setting
  --min-size and --max-size command line arguments

Command error:
  WARNING: While bind mounting '/mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70:/mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70': destination is already in the mount point list
  2021-12-13T11:41:44 prefetch.2.11.0 warn: Maximum file size download limit is 20GB 
  2021-12-13T11:41:44 vdb-validate.2.11.0 info: 'SRR059917' could not be found

Work dir:
  /mnt/archgen/microbiome_misc/denisova_sediment_blocks/03-data/public/work/79/4c5787dfdd11d80fd9f6e06dbf0a70

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

sra_ids_to_runinfo.py: command not found

Description of the bug

I can't run the workflow using an SRA ID. The sample input file is literally just a one line ID.

Command used and terminal output

nextflow run nf-core/fetchngs -profile singularity --outdir fastq --input samples.csv

Relevant files

samples.csv:
PRJNA656253

Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (PRJNA656253)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (PRJNA656253)` terminated with an error exit status (127)

Command executed:

  echo PRJNA656253 > id.txt
  sra_ids_to_runinfo.py \
      id.txt \
      PRJNA656253.runinfo.tsv \


  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  127

Command output:
  (empty)

Command error:
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_TMPDIR as environment variable will not be supported in the future, use APPTAINERENV_TMPDIR instead
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_DEBUG as environment variable will not be supported in the future, use APPTAINERENV_NXF_DEBUG instead
  .command.sh: line 4: sra_ids_to_runinfo.py: command not found

Work dir:
  /fast/users/knighto_c/scratch/tmp/nf_work/2c/76d70604fca24cc45d7498ea45e8da

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

System information

N E X T F L O W ~ version 22.10.6
HPC

Allow a prepared ENA metadata samplesheet as input

Description of feature

I've created a branch https://github.com/nf-core/fetchngs/tree/skip-runinfo where one can skip the first step. So instead of providing a list of IDs for which metadata is fetched, provide the metadata directly as input and skip that step. It works but was just a quick hack. If this is a generally desirable feature I can code it up in a better way. Maybe using input_type == 'metadata' instead of a new skip_runinfo parameter.

Final samplesheet is not created when --skip_fastq_download is provided

Description of the bug

The --skip_fastq_download flag implies that only the fastq files will not be downloaded. When using the --skip_fastq_download and --nf_core_pipeline rnaseq together, the fastq files are not downloaded (expected), but sample sheet is not created (unexpected).

Command used and terminal output

nextflow run nf-core/fetchngs -profile docker --nf_core_pipeline rnaseq --outdir test --input temp.txt --max_cpus 2 --max_memory 5.GB --skip_fastq_download

Relevant files

temp.txt

System information

N E X T F L O W ~ version 22.04.5
Launching https://github.com/nf-core/fetchngs [nostalgic_meitner] DSL2 - revision: b79cde2 [master]

add --save_merged_fastq for merging technical repeats to fetchngs

Description of feature

Hi nfcore/fetchngs team,

It would be awesome to add --save_merged_fastq for merging technical repeats to fetchngs. Would be very useful for downstream programs that cant deal with technical repeats.

Often easy to identify technical repeats from GEO because although technical repeats each have unique run_accession they share the same experiment_alias.

Many thanks,
Oliver

Change input mimetype to csv

Description of feature

This will allow using Tower Datasets with the pipeline as only tsv / csv is currently supported.

[ERROR] Please provide a valid value for --ena_metadata_fields!

Description of the bug

Over the past few days I have encountered an issue when running nextflow run nf-core/fetchngsthat is related to the accepted values for --ena_metadata_fields. No changes were made in the script or parameter files between successful and incomplete runs.

It appears that the hard coded preset metadata fields are no longer found when referencing the list online. The current list of accepted values has possibly changed as quite a few of the preset metadata fields are no longer in the list. Accepted ena metadata values as defined in get_ena_fields().

Hard coded metadata from ra_ids_to_runinfo.py
ENA_METADATA_FIELDS = ( "run_accession", "experiment_accession", "sample_accession", "secondary_sample_accession", "study_accession", "secondary_study_accession", "submission_accession", "run_alias", "experiment_alias", "sample_alias", "study_alias", "library_layout", "library_selection", "library_source", "library_strategy", "library_name", "instrument_model", "instrument_platform", "base_count", "read_count", "tax_id", "scientific_name", "sample_title", "experiment_title", "study_title", "sample_description", "fastq_md5", "fastq_bytes", "fastq_ftp", "fastq_galaxy", "fastq_aspera", )

Command used and terminal output

$ nextflow run nf-core/fetchngs -r 1.9 --input data/acc/acc_all.csv --input_type 'sra' --outdir output/ -profile docker 


Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SRR13499410)` terminated with an error exit status (1)

Command executed:

  echo SRR13499410 > id.txt
  sra_ids_to_runinfo.py \
      id.txt \
      SRR13499410.runinfo.tsv \
  
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  [ERROR] Please provide a valid value for --ena_metadata_fields!
  Provided values = accession,run_accession,experiment_accession,sample_accession,secondary_sample_accession,study_accession,secondary_study_accession,parent_study,submission_accession,run_alias,experiment_alias,sample_alias,study_alias,library_layout,library_selection,library_source,library_strategy,library_name,instrument_model,instrument_platform,base_count,read_count,tax_id,scientific_name,sample_title,experiment_title,study_title,description,sample_description,fastq_md5,fastq_bytes,fastq_ftp,fastq_galaxy,fastq_aspera
  Accepted values = aligned,altitude,assembly_quality,assembly_software,bam_aspera,bam_bytes,bam_ftp,bam_galaxy,bam_md5,base_count,binning_software,bio_material,bisulfite_protocol,broad_scale_environmental_context,broker_name,cage_protocol,cell_line,cell_type,center_name,checklist,chip_ab_provider,chip_protocol,chip_target,collected_by,collection_date,completeness_score,contamination_score,control_experiment,country,cultivar,culture_collection,depth,dev_stage,dnase_protocol,ecotype,elevation,environment_biome,environment_feature,environment_material,environmental_medium,environmental_sample,experiment_accession,experiment_alias,experiment_target,experiment_title,experimental_factor,experimental_protocol,extraction_protocol,faang_library_selection,fastq_aspera,fastq_bytes,fastq_ftp,fastq_galaxy,fastq_md5,file_location,first_created,first_public,germline,hi_c_protocol,host,host_body_site,host_genotype,host_gravidity,host_growth_conditions,host_phenotype,host_sex,host_status,host_tax_id,identified_by,instrument_model,instrument_platform,investigation_type,isolate,isolation_source,last_updated,lat,library_construction_protocol,library_gen_protocol,library_layout,library_max_fragment_size,library_min_fragment_size,library_name,library_pcr_isolation_protocol,library_prep_date,library_prep_date_format,library_prep_latitude,library_prep_location,library_prep_longitude,library_selection,library_source,library_strategy,local_environmental_context,location,location_end,location_start,lon,marine_region,mating_type,ncbi_reporting_standard,nominal_length,nominal_sdev,pcr_isolation_protocol,ph,project,protocol_label,read_count,read_strand,restriction_enzyme,restriction_enzyme_target_sequence,restriction_site,rna_integrity_num,rna_prep_3_protocol,rna_prep_5_protocol,rna_purity_230_ratio,rna_purity_280_ratio,rt_prep_protocol,run_accession,run_alias,run_date,salinity,sample_accession,sample_alias,sample_capture_status,sample_collection,sample_description,sample_material,sample_prep_interval,sample_prep_interval_units,sample_storage,sample_storage_processing,sample_title,sampling_campaign,sampling_platform,sampling_site,scientific_name,secondary_project,secondary_sample_accession,secondary_study_accession,sequencing_date,sequencing_date_format,sequencing_location,sequencing_longitude,sequencing_method,sequencing_primer_catalog,sequencing_primer_lot,sequencing_primer_provider,serotype,serovar,sex,specimen_voucher,sra_aspera,sra_bytes,sra_ftp,sra_galaxy,sra_md5,status,strain,study_accession,study_alias,study_title,sub_species,sub_strain,submission_accession,submission_tool,submitted_aspera,submitted_bytes,submitted_format,submitted_ftp,submitted_galaxy,submitted_host_sex,submitted_md5,submitted_read_type,tag,target_gene,tax_id,taxonomic_classification,taxonomic_identity_marker,temperature,tissue_lib,tissue_type,transposase_protocol,variety

Relevant files

No response

System information

No response

SRA_IDS_TO_RUNINFO fails due to bad request

Description of the bug

I think this is related to issue #98 where the storage mode of files changed in the SRA. Several DRR and SRR accessions fail to be retrieved. Since this information is still in the ENA FTP I fixed it buy just bypassing the SRA efetch request and directly use the ENA FTP for this. I think it would be reasonable to try the SRA (since most of my accessions worked) and fall back to the ENA FTP if the SRA efetch request fails instead of exiting the process

Command used and terminal output

nextflow run nf-core/fetchngs ... --force_sratools_download

Error printed for some SRR and DRR accessions:

[ERROR] The server couldn't fulfill the request.
[ERROR] Status: 400 Bad Request. Both list of IDs and query_key are empty

Relevant files

No response

System information

No response

"Local executor requires the use of POSIX compatible file system" with latest Nextflow

Description of the bug

Running with nf 23.04.0 (in Tower) produces errors like:

Uploading local `bin` scripts folder to s3://nf-tower-workdir/fetchngs/YOUR_ID/tmp/bd/205133e4da5bfe13eb99139d65b8e1/bin
Local executor requires the use of POSIX compatible file system โ€” offending work directory path: s3://nf-tower-workdir/fetchngs/YOUR_ID

... before any tasks start.

Pinning Nextflow back like:

export NXF_VER=22.10.7

... allows the workflow to run.

Command used and terminal output

No response

Relevant files

No response

System information

Nextflow Tower Cloud, AWS Batch compute environment

Error when using sratools: This sra toolkit installation has not been configured

Description of the bug

First of all, love this tool. Thank you for putting this together.

I'm running into an issue when using the --force_sratools_download command. It runs without issue until the prefetch step, at which point every task fails with the error listed below.

Do I need to configure this on my hpc, or is this an actual bug?

Command used and terminal output

nextflow run nf-core/fetchngs \
-r 1.5 \
--input '/projects/b1042/MisharinLab/rgrant/220226_Zhang_reanalysis/220226_Zhang_BMDM_SRR_Acc_List.txt' \
-profile nu_genomics \
--nf_core_pipeline rnaseq \
--force_sratools_download

> This sra toolkit installation has not been configured.
> Before continuing, please run: vdb-config --interactive
> For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/
> Failed attempt 1 of 5. Retrying in 1 s.
> This sra toolkit installation has not been configured.
> Before continuing, please run: vdb-config --interactive
> For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/
> Failed attempt 2 of 5. Retrying in 2 s.
> This sra toolkit installation has not been configured.
> Before continuing, please run: vdb-config --interactive
> For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/
> Failed attempt 3 of 5. Retrying in 4 s.
> This sra toolkit installation has not been configured.
> Before continuing, please run: vdb-config --interactive
> For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/
> Failed attempt 4 of 5. Retrying in 8 s.
> This sra toolkit installation has not been configured.
> Before continuing, please run: vdb-config --interactive
> For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/
> Failed after 5 attempts.

Relevant files

220226_Zhang_BMDM_SRR_Acc_List.txt

System information

Nextflow: version 21.10.5 build 5658
Hardware: HPC
Executor: slurm
OS: Red Hat Enterprise Linux Server 7.5 (Maipo)
Container: singularity version 3.8.1
Fetchngs version: 1.5

Not able to fetch metadata for ERR* ids associated with ArrayExpress

Description of feature

The current implementation in the pipeline assumes that all ERR* ids are available via the ENA API. We fetch the metadata using the URL below:
https://www.ebi.ac.uk/ena/portal/api/filereport?accession=ERR9539214&result=read_run&fields=run_accession%2Cexperiment_accession

However, this URL doesn't work for ERR9539214 and returns the error below from this code in the pipeline:

Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (ERR9539214)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (ERR9539214)` terminated with an error exit status (1)

Command executed:

  echo ERR9539214 > id.txt
  sra_ids_to_runinfo.py \
      id.txt \
      ERR9539214.runinfo.tsv \
  
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  [ERROR] There is no content for id ERR9539214. Maybe you lack the right permissions?

On closer inspection, this id is associated with an ArrayExpress experiment for E-MTAB-11611 and if you click the link to E-MTAB-11611.sdrf.txt in that page we get all of the metadata for ERR9539214 as well as other ids associated with that submission. This includes FTP links for direct download which work when you try to download locally so the data isn't restricted to download even though it is patient data e.g.

wget -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR953/004/ERR9539214/ERR9539214_1.fastq.gz

ArrayExpress has it's own API as documented here and you can access all of the metadata for samples associated with an ArrayExpress id in XML format with the URL below:
https://www.ebi.ac.uk/arrayexpress/xml/v3/experiments/E-MTAB-11611/samples

Still abit puzzled as to why we can't get any metadata for ERR9539214 from the ENA API because that it's a native id for that database.

We should look into this further and add support for ArrayExpress ids too.

Error when using sratools: This sra toolkit installation has not been configured

Description of the bug

First of all, love this tool. Thank you for putting this together.

I'm running into an issue when using the --force_sratools_download command. It runs without issue until the prefetch step, at which point every task fails with the following error:

Command used and terminal output

No response

Relevant files

No response

System information

No response

Deal with accessions with non-existing files

Description of feature

Hi,

In SRA some of the run accessions have no associated files.
For example bioproject PRJEB18755 has several runs that are total ghosts: ERR2013571, ERR2013572, ERR2013573, ..., while other are fine.
When these ghost accessions are provided in the input, the pipeline will first retry:

[60/81e7b9] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (ERR2013613)` failed -- Execution is retried (2)

and then terminate with errors:

Command error:
  [ERROR] There is no content for id ERR2013581. Maybe you lack the right permissions?

Of course one thing that can be done is to filter first these entries before feeding to the pipeline, but it will be great if these errors can be ignored.
Or maybe there is an option like that already that I am missing?
Thanks for any info on that, it will be extremely helpful to be able to easily deal with it!

A DataflowVariable can only be assigned once

Description of the bug

Hi,

I experienced weird problem when trying to download the dataset.
Pipeline was going OK, but gets kicked out at some point with error

A DataflowVariable can only be assigned once. Use bind() to allow for equal values to be passed into already-bound variables.

Same is happening after -resume.
What might be the problem here?
I used --force_sratools_download true as only SRA is available

Command used and terminal output

# Command
nextflow run ~/pipelines/fetchngs/main.nf \
--input s3://<bucket>/PRJDB4176_Yachida_NatMed_2019/SRR_Acc_List.txt \
--outdir s3://<bucket>/PRJDB4176_Yachida_NatMed_2019/ \
--force_sratools_download true \
-profile docker \
-work-dir s3://<bucket>/work/ \
-c awsbatch.config

# Config
aws.region          = 'ap-southeast-1'
aws.batch.cliPath   = '/home/ec2-user/miniconda/bin/aws'
process.executor    = 'awsbatch'
process.queue       = '<queue_name>'

params {
    nf_core_pipeline        = 'taxprofiler'

    // Max
    max_cpus   = 12
    max_memory = '47.GB'
    max_time   = '10.h'
}

process {
  withName: SRA_IDS_TO_RUNINFO {
    errorStrategy = 'ignore'
  }
}


# Output
[bf/d23b8b] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (DRR162776)                                                           [100%] 80 of 80, cached: 80 โœ”
[d5/48f4ea] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP (80)                                                                  [100%] 80 of 80, cached: 80 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                                                            -
[9b/3a9b61] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_SRATOOLSNCBISETTINGS (ncbi-settings) [100%] 1 of 1, cached: 1 โœ”
[dd/fb9afb] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOLS_PREFETCH (DRR162776)               [100%] 80 of 80, cached: 80 โœ”
[de/281c46] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOLS_FASTERQDUMP (DRX153395_DRR162776)  [100%] 58 of 58, cached: 27, failed: 1
[ff/675871] process > NFCORE_FETCHNGS:SRA:SRA_TO_SAMPLESHEET (DRX120436_DRR127692)                                                 [100%] 57 of 57, cached: 27
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_MERGE_SAMPLESHEET                                                                    -
[-        ] process > NFCORE_FETCHNGS:SRA:MULTIQC_MAPPINGS_CONFIG                                                                  -
[-        ] process > NFCORE_FETCHNGS:SRA:CUSTOM_DUMPSOFTWAREVERSIONS                                                              -
Execution cancelled -- Finishing pending tasks before exit
WARN: Unable to get file attributes file: s3://bm-ks-nextflow-workdir/Art/2023-04-06_PRJDB4176/work/c0/3cfaf4f3e6178473eafb2345965a26/versions.yml -- Cause: com.amazonaws.AbortedException:
WARN: Unable to get file attributes file: s3://bm-ks-nextflow-workdir/Art/2023-04-06_PRJDB4176/work/9b/3a9b61d58df2eb3a2ae563006c13f3/versions.yml -- Cause: com.amazonaws.AbortedException:
WARN: Unable to get file attributes file: s3://bm-ks-nextflow-workdir/Art/2023-04-06_PRJDB4176/work/ab/1ad01b84ab8707e25d3d1630fb4a28/versions.yml -- Cause: com.amazonaws.AbortedException:
WARN: Unable to get file attributes file: s3://bm-ks-nextflow-workdir/Art/2023-04-06_PRJDB4176/work/c0/3cfaf4f3e6178473eafb2345965a26/versions.yml -- Cause: com.amazonaws.AbortedException:
WARN: Got an interrupted exception while taking agent result | java.lang.InterruptedException
Error executing process > 'NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOLS_FASTERQDUMP (DRX120481_DRR127737)'

Caused by:
  Task failed to start - DockerTimeoutError: Could not transition to created; timed out after waiting 4m0s

Command executed:

  export NCBI_SETTINGS="$PWD/user-settings.mkfg"

  fasterq-dump \
       \
      --threads 6 \
      --outfile DRX120481_DRR127737 \
      DRR127737

  pigz \
       \
      --no-name \
      --processes 6 \
      *.fastq

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOLS_FASTERQDUMP":
      sratools: $(fasterq-dump --version 2>&1 | grep -Eo '[0-9.]+')
      pigz: $( pigz --version 2>&1 | sed 's/pigz //g' )
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Work dir:
  s3://bm-ks-nextflow-workdir/Art/2023-04-06_PRJDB4176/work/d8/07dca0e134f49630ced4795526a976

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`


Unexpected error [AbortedException]


A DataflowVariable can only be assigned once. Use bind() to allow for equal values to be passed into already-bound variables.


[AWS BATCH] Waiting jobs reaper to complete (7 jobs to be terminated)

Relevant files

nextflow.log

System information

  • Nextflow version: 22.10.5 build 5840
  • Hardware: Cloud
  • Executor: AWS Batch
  • Container engine: Docker
  • Linux (Ubuntu for host submission, Amazon Linux2 for containers)
  • Version of nf-core/fetchngs: 1.9 (cloned today)

fetchngs fails if FTP is blocked

Hello!

I'm working from an institute where FTP access has been restricted. Due to this, fetchngs is not able to download any sequences and keeps failling.

Would it be possible to add a 'http' download option? Right now I'm downloading the data using prefetch which uses http and it works just fine.

Can SRA responses be trapped and reported in a user-friendly manner?

Description of feature

Running:

nextflow run nf-core/fetchngs -r 1.7 --input list.txt --outdir sra -profile docker

Pipeline fails at NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO:

  [ERROR] The server couldn't fulfill the request.
  [ERROR] Status: 429 Too Many Requests

as the error is buried in the failed task report, it can be a bit intimidating so I was wondering if this kind of problems can be trapped and reported from the workflow itself (e.g. messages from the pipeline like parameter validation)

  • nextflow 21.10.6 build 5660
  • Cloud VM (OpenStack)
  • Executor: local
  • OS: "Ubuntu 20.04.3 LTS"
  • Container: Docker
  • fetchngs 1.7

Errors downloading scrnaseq data using Amazon Genomics CLI (version 1.5.2)

Description of the bug

Hello-
Iโ€™m fairly new to the #fetchngs workflow, and am trying to use it to download and prepare a specific 10X-format single cell data set (BioProject PRJNA589484). This is a rather large data set, with 8 samples that each are about 70-80 GB in size. Iโ€™m working with the AGC (aws genomics command-line-interface) to make this go, and am a bit stymied still.Iโ€™ve worked around time out problems on the FTP download and also max file size on the SRATOOL override by using custom configuration file with these settings
withName: โ€˜SRA_FASTQ_FTPโ€™ {
ext.args = โ€˜--retry 5 --continue-at - --max-time 12000โ€™
}
withName: โ€˜SRATOOLS_PREFETCHโ€™ {
ext.args = โ€˜-X 100000000โ€™
}
The problem that Iโ€™m now having is getting the split of data to occur and then successfully return the files to the target s3 bucket. The default behavior has been to return only one file for each sample. As these are quite large, I havenโ€™t pulled them off of the s3 bucket yet to test the exact content. Iโ€™ve made some progress now by again modifying the custom nextflow.config with the following:
withName: โ€˜SRATOOLS_FASTERQDUMPโ€™ {
ext.args = โ€˜--split-files --include-technicalโ€™
}
Iโ€™m pretty sure that the โ€œinclude-technicalโ€ got me to the successful split, but now the workflow is crashing with error. I was able to find the successfully split files in the working directory.
In tracing through the log file, the specific error that is happening is this:
Nov-22 09:32:05.506 [Actor Thread 8] ERROR nextflow.extension.OperatorEx - @unknown
groovy.lang.MissingMethodException: No signature of method: java.util.ArrayList.getName() is applicable for argument types: () values: []
Possible solutions: getAt(groovy.lang.Range), getAt(java.lang.String), getAt(int), getAt(int), getAt(java.lang.String), getAt(java.util.Collection)
So my questions are
am i missing something that would make this work correctly?
has anyone else used this workflow to pull scRNAseq data successfully in any context?
is there anything I need to be aware of in running it with AGC that might also be causing issues?
Should I also add this as an issue on the github page for this workflow?
Thanks in advance for any help that can be provided

Here is the cextflow config contents
process {
withName: 'SRA_FASTQ_FTP' {
ext.args = '--retry 5 --continue-at - --max-time 12000'
}
withName: 'SRATOOLS_PREFETCH' {
ext.args = '-X 100000000'
}
withName: 'SRATOOLS_FASTERQDUMP' {
ext.args = '--split-files --include-technical'
}
}

Here is the Manifest.json contents
{
"mainWorkflowURL": "https://github.com/nf-core/fetchngs.git",
"inputFileURLs": [
"inputs.json"
],
"engineOptions": "-r 1.8 -c ./project/nextflow.config -profile docker"
}

Command used and terminal output

agc workflow run fetchngs -c fetchCtx1

The process runs to the point of fasterq-dump, but then gives an error.  The nextflow log will be included here

Relevant files

Archive.zip

System information

Amazon Genomics CLI, version 1.5.2, which runs Nextflow 22.04.3
Run with docker container engine
Executor awsbatch, through agc
Version 1.8 of nf-core/fetchngs

Test fails with singularity profile

Description of the bug

Can't complete test running version 1.7. Do I need to try 1.8dev?

Command used and terminal output

nextflow run nf-core/fetchngs -profile test,singularity --outdir test_outdir

Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SRR12848126)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SRR12848126)` terminated with an error exi
t status (1)

Command executed:

  echo SRR12848126 > id.txt
  sra_ids_to_runinfo.py \
      id.txt \
      SRR12848126.runinfo.tsv \
  
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: While bind mounting '/erdos/elavelle/vigna/work/f7/81d73260a94454ca3e8486b3d42153:/erdos/elavelle/vigna/work/f7/81d73260a94454ca3e8486b3d42153': destination is already in the mount point list
  [ERROR] No matches found for database id SRR12848126!
  Line: 'SRR12848126'

Relevant files

No response

System information

Nextflow version 22.04.3
Singularity version 3.8.7-1.el8
Linux HPC

Conda is ok but singularity failed

Description of the bug

Hi I'm trying to run a first nfcore workflow using fetchngs.

I managed to run it with conda : https://nfcore.slack.com/archives/CE6SDBX2A/p1674655464394899

But if fails when I switch to singularity

Command used and terminal output

echo "ERR9851248" > "${OUTDIR}/input.txt"


cat << EOF > "${OUTDIR}/sge.cfg"
process {
beforeScript = "hostname 1>&2"
executor="sge"
clusterOptions = "-S /bin/bash -q max-24h.q"
cache = "lenient"
penv = "smp"
}
EOF

OK with conda

nextflow -c /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/2023/20230125.nfcore/sge.cfg run nf-core/fetchngs -profile test,conda -work-dir /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/2023/20230125.nfcore/work --input /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/2023/20230125.nfcore/input.txt --outdir /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/2023/20230125.nfcore/output -resume

but singularity doesn't work:

nextflow -c /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/2023/20230125.nfcore/sge.cfg run nf-core/fetchngs -profile test,singularity -work-dir /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/2023/20230125.nfcore/work --input /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/2023/20230125.nfcore/input.txt --outdir /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/2023/20230125.nfcore/output -resume

output

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (ERR9851248)` terminated with an error exit status (1)

Command executed:

  echo ERR9851248 > id.txt
  sra_ids_to_runinfo.py \
      id.txt \
      ERR9851248.runinfo.tsv \
  
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_TMP as environment variable will not be supported in the future, use APPTAINERENV_TMP instead
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_TMPDIR as environment variable will not be supported in the future, use APPTAINERENV_TMPDIR instead
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_DEBUG as environment variable will not be supported in the future, use APPTAINERENV_NXF_DEBUG instead
  [ERROR] We failed to reach a server.
  [ERROR] Reason: [Errno 110] Connection timed out

Work dir:
  /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/2023/20230125.nfcore/work/8b/f1972099f26ee351e766989e4fa599

HTTPS_PROXY and HTTP_PROXY are both set.

Relevant files

No response

System information

nf 22.10.4
Linux gkq4xd2.compute.bird2.prive 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
sge
singularity
linux

Single text input field for project-level identifiers

Description of feature

In my current usage of this workflow I generally have a single SRP* project identifier, and I trigger the workflow in Tower.

It's a bit painful to have to write a CSV file for that single identifier and upload it to tower as a dataset before I trigger the workflow.

Would you be amenable to a single string input for a project-level identifier (as an alternative to the file input)? I can take a crack at that if so.

Mixture of ids provided via --input: Please provide either SRA / ENA / DDBJ or Synapse ids!

Description of the bug

PRJNA625551 id is not recognised by the pipeline. Perhaps I am using it incorrectly.

Command used and terminal output

nextflow run nf-core/fetchngs --input PRJNA625551 --outdir results --input_type sra

ERROR: Validation of pipeline parameters failed!
* --input: string [PRJNA602995] does not match pattern ^\S+\.txt$ (PRJNA602995)

Relevant files

No response

System information

Nextflow/21.10.6
nf-core/2.1

Bug in sra_ids_to_runinfo.py

Hi,

This should be quick, basically, the function gse_to_srx() can't recognize GEO ids.
gsm_ids = [x.split('=')[1].strip() for x in fetch_url(url) if x.startswith('GSM')] should be
gsm_ids = [x.split('=')[1].strip() for x in fetch_url(url) if x.split('=')[1].strip().startswith('GSM')]

Thanks for creating fetchngs pipeline!

--Kai

Add GEO support

GEO support was stripped out in v1.7. See this PR for more information. We should add it back in when we have a robust solution.

some samples fail with --force_sratools_download due to changes in prefetch results

Description of the bug

It is now over a month that I handle my data with fetchngs and I am pretty satisfied with the results. However, I recently encountered some difficulties when trying to force data download via sratools. Previously everything worked fine (in this context previously refers to the month May) but I had to reprocess and thus redownload some of the samples which resulted in pipeline fails due to error when fetching the data with prefetch. I vaguely remember reading somewhere that the SRA has made changes to its data storage policies or similar around beginning of June and the error I get as well as the timing (i.e. rerunning the same pipe command with as in May in June) is quite a hint towards a connection to this change. Looking at the .command.log file of the respective jobs reveals the core of the issue where prefetch will not download the typical *.sra file but something called *.sralite which is not recognized by the subsequent vdb-validate command as prefetch just puts it in the temp directory and not in the ./temp_dir/SRAsomething directory as expected by vdb-validate. This in turn causes the pipeline to fail. I haven't looked into it further as to if vdb-validate also excepts the *.sralite file and the problem being resolved by just checking if prefetch generates the expected folder or the *.sralite file and handling the cases accordingly. However, downloading the failing samples via the ENA FTP is still possible so a temporary fix is downloading everything I can with sratools and fetching the rest from the FTP.

Command used and terminal output

nextflow run nf-core/fetchngs ... --force_sratools_download
2022-06-24T14:44:39 prefetch.2.11.0 int: self NULL while reading file within network system module - cannot Make Compute Environment Token

2022-06-24T14:44:40 prefetch.2.11.0: 1) Downloading 'ERR1141695.sralite'...
2022-06-24T14:44:40 prefetch.2.11.0:  Downloading via HTTPS...
|-------------------------------------------------- 100%
2022-06-24T14:45:04 prefetch.2.11.0:  HTTPS download succeed
2022-06-24T14:45:05 prefetch.2.11.0:  'ERR1141695.sralite' is valid
2022-06-24T14:45:05 prefetch.2.11.0: 1) 'ERR1141695.sralite' was downloaded successfully
2022-06-24T14:45:06 vdb-validate.2.11.0 info: 'ERR1141695' could not be found

Relevant files

No response

System information

No response

Add support to download 10X Genomics data

Description of the bug

In it's current form, fetchngs does not download the relevant files required for re-processing single-cell experiments from the 10X Genomics platforms.

As discussed on the Slack channel, 10X data currently gets downloaded only as a single FastQ file. However, 10X data typically contains the the cell ID and UMI data in Read 1 (~28 bp), Read 2 is the RNA insert (~91 bp). Read 3 tends to be the Illumina multiplexing index (mostly irrelevant as they should all belong to a single sample anyway. Read 1 is flagged as a technical, so it doesn't get included when using fasterq-dump currently, rendering the single-cell experiment into one single big bulk RNA-seq dataset.

Note:

It is also worth noting that the ENA does not serve out technical reads at all, so 10X raw data can only be obtained via the SRA (prefetch, or fasterq-dump + accession).

Here is a description of the bug:

This is the command run by fetchngs with a 10X sample accession SRR9320616:

fasterq-dump --threads 6 SRR9320616 --outfile SRR9320616.fastq

it gives the following output:

SRR9320616.fastq

This output is arguably useless for single-cell (re-)analysis.

Proposal:

This is the command required for 10X data. It uses both --split-files and --include-technical:

fasterq-dump --threads 6 --split-files --include-technical SRR9320616 --outfile SRR9320616.fastq --progress

It gives the following output:

SRR9320616_1.fastq
SRR9320616_2.fastq
SRR9320616_3.fastq

Read 1 is the cell barcode +UMI:

@SRR9320616.1 K00125:67:HHJF7BBXX:1:1101:2777:998 length=26
NCACCTTCTGCTGTCGCCGATGTTGT
+SRR9320616.1 K00125:67:HHJF7BBXX:1:1101:2777:998 length=26
#AAFFJJJJJJJJJJJJJJJJJJJJJ

Read 2 is the RNA insert read:

@SRR9320616.1 K00125:67:HHJF7BBXX:1:1101:2777:998 length=98
NGTTACGCTAGTAATCCCTCTACCTTTAGCCACTCACTTGGCCCTAGGTAACTAAGACCCTGACATCACTTTGCCTCTTAGGGCACAAGGAGGAACTA
+SRR9320616.1 K00125:67:HHJF7BBXX:1:1101:2777:998 length=98
#A<FAFAAJFF-<FAJFF<--FFAJ-7F-7<--7-<--7-777-7<77-7F<AJJ7J-----A7-A-FFF7<-7--7F<JF---AAAJ7<J---7--F

Read3 is the multiplexing index read (not strictly required but doesnโ€™t hurt, can always be deleted afterwards if desired):

@SRR9320616.1 K00125:67:HHJF7BBXX:1:1101:2777:998 length=8
NTTGAGAA
+SRR9320616.1 K00125:67:HHJF7BBXX:1:1101:2777:998 length=8
#AA-FFJF

Adding these options to the pipeline, either as config file or straight within the fasterq-dump process works fine.

process {
    withName: 'SRATOOLS_FASTERQDUMP' {
        ext.args = '--split-files --include-technical'
    }
}

Download, extraction into 3 files as well as the pigz compression appear to have worked well:

2023-04-24 10:47:08          0
2023-04-24 10:50:35          6 .command.begin
2023-04-24 11:33:02         90 .command.err
2023-04-24 11:35:08         90 .command.log
2023-04-24 11:33:01          0 .command.out
2023-04-24 10:47:08      13370 .command.run
2023-04-24 10:47:08        527 .command.sh
2023-04-24 11:33:02        261 .command.trace
2023-04-24 11:35:06          1 .exitcode
2023-04-24 11:33:03 3133859956 SRX6088086_SRR9320616_1.fastq.gz
2023-04-24 11:33:03 8441509889 SRX6088086_SRR9320616_2.fastq.gz
2023-04-24 11:33:03 1496357946 SRX6088086_SRR9320616_3.fastq.gz
2023-04-24 11:33:03        124 versions.yml

I have changed the file pattern recognition to:

fastq = meta.single_end ? '*.fastq.gz' : '*_{1,2,3,4}.fastq.gz'

However the files then never get published, and I suspect it has to do with how the read names are extracted afterwards:

https://github.com/FelixKrueger/fetchngs/blob/62b2bc840b14465a0ff551f614d613a15fdef582/workflows/sra.nf#L120-L132

sra.nf

SRA_FASTQ_FTP
           .out
           .fastq
           .mix(FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS.out.reads)
           .map { 
               meta, fastq ->
                   def reads = meta.single_end ? [ fastq ] : fastq
                   def meta_clone = meta.clone()
                   meta_clone.fastq_1 = reads[0] ? "${params.outdir}/fastq/${reads[0].getName()}" : ''
                   meta_clone.fastq_2 = reads[1] && !meta.single_end ? "${params.outdir}/fastq/${reads[1].getName()}" : ''
                   return meta_clone
           }
           .set { ch_sra_metadata }

This is the error message that brings the whole process down:

Unknown method invocation `getName` on ArrayList type
-- Check script '.nextflow/assets/FelixKrueger/fetchngs/./workflows/sra.nf' at line: 128 or see 'nf-62eTOEybyloWFq.log' file for more details
WARN: Failed to publish file: s3://altos-lab-nextflow/scratch/5c32VUHOyVZskM/aa/b062914e17b4b9d68ae187ffb920a7/SRX6088086_SRR9320616_2.fastq.gz; to: s3://testbucket/results/fastq/SRX6088086_SRR9320616_2.fastq.gz [copy] -- See log file for details

It could be really trivial to get the getName() method to work in the new data structure, but I am currently at a loss how to fix it.

Many thanks for your kind attention!

Command used and terminal output

No response

Relevant files

No response

System information

No response

Allow input pattern validation for csv/tsv/txt

Description of the bug

minor point, but why does --input have to match pattern ^\S+.txt$ Since the input is just a single column of SRR identifiers, it could be csv or tsv.

Command used and terminal output

nextflow run nf-core/fetchngs \
--input srr_id.txt \
--outdir reads \
--force_sratools_download -profile crick -r 1.7 -resume

Referring to:
see

"pattern": "^\\S+\\.txt$",
and
"pattern": "^\\S+\\.txt$",

Add tower.yml to render samplesheet as Report in Tower

Description of feature

Apologies if I've missed the feature somewhere, but currently nothing appears in the Tower reports when running this workflow, which to my rookie eyes looks like it's due to the lack of a tower.yml.

Can the sample sheet be a Tower report (I'll see if I can make a PR for that)? Even better if it could also make a Tower dataset.

Pipeline is working with SRA run IDs, but failing with corresponding Biosample IDs

Description of the bug

The pipeline works when supplying SRA run IDs (SRR***), but fails with corresponding Biosample IDs (SAMN***).

For instance, the following works:

echo 'SRR9041539
SRR9041540
SRR9041541
SRR9041542
SRR9041543
SRR9041544' > yeast-test-SRArun-ids.csv

nextflow run nf-core/fetchngs -r 1.8 --input yeast-test-SRArun-ids.csv --outdir yeast-test-SRArun-ids -profile docker 

and the following fails:

echo 'SAMN11619543
SAMN11619542
SAMN11619539
SAMN11619538
SAMN11619541
SAMN11619540' > yeast-test-sample-ids.csv

nextflow run nf-core/fetchngs -r 1.8 --input yeast-test-sample-ids.csv --outdir yeast-test-sample-ids -profile docker 

Command used and terminal output

$ nextflow run nf-core/fetchngs -r 1.8 --input yeast-test-sample-ids.csv --outdir yeast-test-sample-ids -profile docker 
N E X T F L O W  ~  version 22.10.2
Launching `https://github.com/nf-core/fetchngs` [friendly_brahmagupta] DSL2 - revision: 249210f185 [1.8]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/fetchngs v1.8
------------------------------------------------------
Core Nextflow options
  revision       : 1.8
  runName        : friendly_brahmagupta
  containerEngine: docker
  launchDir      : /data/share/nf-core
  workDir        : /data/share/nf-core/work
  projectDir     : /home/amizeranschi/.nextflow/assets/nf-core/fetchngs
  userName       : amizeranschi
  profile        : docker
  configFiles    : /home/amizeranschi/.nextflow/assets/nf-core/fetchngs/nextflow.config

Input/output options
  input          : yeast-test-sample-ids.csv
  outdir         : yeast-test-sample-ids

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/fetchngs for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.5070524

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/fetchngs/blob/master/CITATIONS.md
------------------------------------------------------
executor >  local (8)
executor >  local (8)
[a4/4cb5a2] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619540)                       [ 25%] 2 of 8, failed: 2...
executor >  local (9)
[e3/4cc40b] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619540)                       [ 25%] 2 of 8, failed: 2...
executor >  local (10)
[c7/9e9094] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619541)                       [ 33%] 3 of 9, failed: 3...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
executor >  local (10)
[c7/9e9094] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619541)                       [ 33%] 3 of 9, failed: 3...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
executor >  local (11)
[35/d8ab51] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619539)                       [ 40%] 4 of 10, failed: ...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
executor >  local (12)
[f9/7d2f51] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619538)                       [ 45%] 5 of 11, failed: ...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
executor >  local (13)
[60/d956f8] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619543)                       [ 50%] 6 of 12, failed: ...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
executor >  local (14)
[57/8d60bd] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619542)                       [ 53%] 7 of 13, failed: ...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
executor >  local (15)
[0e/099e8d] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619540)                       [ 57%] 8 of 14, failed: ...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_SAMPLESHEET                                      -
executor >  local (15)
[0e/099e8d] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619540)                       [ 57%] 8 of 14, failed: ...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_SAMPLESHEET                                      -
executor >  local (16)
[43/f354a6] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619541)                       [ 60%] 9 of 15, failed: ...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_SAMPLESHEET                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_MERGE_SAMPLESHEET                                   -
executor >  local (16)
[43/f354a6] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619541)                       [ 60%] 9 of 15, failed: ...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_SAMPLESHEET                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_MERGE_SAMPLESHEET                                   -
executor >  local (17)
[61/a0069e] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619539)                       [ 62%] 10 of 16, failed:...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_SAMPLESHEET                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_MERGE_SAMPLESHEET                                   -
[-        ] process > NFCORE_FETCHNGS:SRA:MULTIQC_MAPPINGS_CONFIG                                 -
[-        ] process > NFCORE_FETCHNGS:SRA:CUSTOM_DUMPSOFTWAREVERSIONS                             -
executor >  local (18)
[ae/7f1586] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619543)                       [ 66%] 12 of 18, failed:...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_SAMPLESHEET                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_MERGE_SAMPLESHEET                                   -
[-        ] process > NFCORE_FETCHNGS:SRA:MULTIQC_MAPPINGS_CONFIG                                 -
[-        ] process > NFCORE_FETCHNGS:SRA:CUSTOM_DUMPSOFTWAREVERSIONS                             -
[b2/8f976e] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619542)` terminated with an error exit status (1) --executor >  local (19)
[5a/305aef] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619538)                       [ 66%] 12 of 18, failed:...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_SAMPLESHEET                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_MERGE_SAMPLESHEET                                   -
[-        ] process > NFCORE_FETCHNGS:SRA:MULTIQC_MAPPINGS_CONFIG                                 -
[-        ] process > NFCORE_FETCHNGS:SRA:CUSTOM_DUMPSOFTWAREVERSIONS                             -
[b2/8f976e] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619542)` terminated with an error exit status (1) -- Execution is retried (1)
[a4/4cb5a2] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619540)` terminated with an error exit status (1) -- Execution is retried (1)
[f6/368e1e] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619541)` terminated with an error exit status (1) -- Execution is retried (1)
[b3/b9bb77] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619539)` terminated with an error exit status (1) -- Execution is retried (1)
[54/a149f0] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619538)` terminated with an error exit status (1) -- Execution is retried (1)
[f9/63adbc] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619543)` terminated with an error exit status (1) -- Execution is retried (1)
[f6/68d470] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619542)` terminated with an error exit status (1) --executor >  local (19)
[5a/305aef] process > NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619538)                       [100%] 13 of 13, failed:...
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_RUNINFO_TO_FTP                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FASTQ_FTP                                           -
[ba/8919d2] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:CUSTOM_... [100%] 1 of 1 โœ”
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS:SRATOOL... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_SAMPLESHEET                                      -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_MERGE_SAMPLESHEET                                   -
[-        ] process > NFCORE_FETCHNGS:SRA:MULTIQC_MAPPINGS_CONFIG                                 -
[-        ] process > NFCORE_FETCHNGS:SRA:CUSTOM_DUMPSOFTWAREVERSIONS                             -
[b2/8f976e] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619542)` terminated with an error exit status (1) -- Execution is retried (1)
[a4/4cb5a2] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619540)` terminated with an error exit status (1) -- Execution is retried (1)
[f6/368e1e] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619541)` terminated with an error exit status (1) -- Execution is retried (1)
[b3/b9bb77] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619539)` terminated with an error exit status (1) -- Execution is retried (1)
[54/a149f0] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619538)` terminated with an error exit status (1) -- Execution is retried (1)
[f9/63adbc] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619543)` terminated with an error exit status (1) -- Execution is retried (1)
[f6/68d470] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619542)` terminated with an error exit status (1) -- Execution is retried (2)
[e3/4cc40b] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619540)` terminated with an error exit status (1) -- Execution is retried (2)
[c7/9e9094] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619541)` terminated with an error exit status (1) -- Execution is retried (2)
[35/d8ab51] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619539)` terminated with an error exit status (1) -- Execution is retried (2)
[60/d956f8] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619543)` terminated with an error exit status (1) -- Execution is retried (2)
[f9/7d2f51] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619538)` terminated with an error exit status (1) -- Execution is retried (2)
Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619542)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SAMN11619542)` terminated with an error exit status (1)

Command executed:

  echo SAMN11619542 > id.txt
  sra_ids_to_runinfo.py \
      id.txt \
      SAMN11619542.runinfo.tsv \
  
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  [ERROR] The server couldn't fulfill the request.
  [ERROR] Status: 400 Bad Request

Work dir:
  /data/share/nf-core/work/57/8d60bd8d90ce9f15f3a574a7568f7c

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Relevant files

nextflow.log

System information

N E X T F L O W ~ version 22.10.2

Local; Docker; Ubuntu 20.04

nf-core/fetchngs v1.8

The input file extension matters

Description of the bug

When providing a suitable input with a file extension of .tsv, the pipeline fails as shown below. However, when providing the same file with extension .txt the pipeline runs successfully. This was unexpected to me and the error message is not helpful.

Command used and terminal output

nextflow run \
    nf-core/fetchngs \
    -r '1.5' \
    --input 'test.tsv' \
    -profile docker \
    -resume

N E X T F L O W  ~  version 21.10.5
Launching `nf-core/fetchngs` [golden_avogadro] - revision: c318ae12fc [1.5]
                                               
ERROR: Validation of pipeline parameters failed!
                                                                                               
                                                                                               
Unknown method invocation `printExceptions` on NfcoreSchema type 
                                               
 -- Check script '/home/moritz/.nextflow/assets/nf-core/fetchngs/main.nf' at line: 20 or see '.nextflow.log' file for more details

Relevant files

test.tsv

SRR2231363
SRR2231379
SRR2195657

System information

  • Nextflow 21.10.5

Use ffq to fetch id metadata

Description of feature

We should try to strip out the custom Python scripts I initially wrote for this pipeline. They have worked relatively well but are fragile to breakages in the standard database APIs. This would be true for any tool querying those databases, however, it would be nice to should replace the custom scripts with a more conventional tool. ffq seems to be the perfect tool for this and I have created an nf-core/module and installed it in the pipeline in #100 We now need to mould this into a fully fledged implementation.

support SRP/PRJNA input type

Description of feature

Hi,

Thanks for the great tool! Something that might be helpful is a project number input type, as users may want to download every file from a study.

I've been using [pysradb] (https://pypi.org/project/pysradb/) as an upstream process to fetch the id list with reliable success, and it's been really helpful to avoid manual id list download.

Downloading of controlled data

Description of feature

I would like to request for the ability to pass in our dbGAP credentials through the --ngc flag of SRAtools so as to be able to download controlled data.

Test fails when running with singularity

Description of the bug

Repost from closed issue: nf-core/rnaseq#383
Can't finish test run, "destination is already in the mount point list"

Command used and terminal output

nextflow run nf-core/fetchngs -profile test,singularity --outdir test_outdir

Command error:
  WARNING: While bind mounting '/erdos/elavelle/vigna/work/b5/250dd7d8fea316c771737cb79d8f39:/erdos/elavelle/vigna/work/b5/250dd7d8fea316c771737cb79d8f39': destination is already in the mount point list
  [ERROR] The server couldn't fulfill the request.
  [ERROR] Status: 400 Bad Request. Both list of IDs and query_key are empty

Relevant files

No response

System information

Nextflow version 22.04.3
Singularity version 3.8.7-1.el8
Linux HPC

Download error with docker

When I run with -profile docker the pipeline uses a 2.7 python version that produces this error:
`Error executing process > 'NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (PRJNA79595)'

Caused by:
Process NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (PRJNA79595) terminated with an error exit status (1)

Command executed:

echo PRJNA79595 > id.txt
sra_ids_to_runinfo.py
id.txt
PRJNA79595.runinfo.tsv \

Command exit status:
1

Command output:
(empty)

Command error:
Traceback (most recent call last):
File "/home/rfor10/.nextflow/assets/nf-core/fetchngs/bin/sra_ids_to_runinfo.py", line 170, in
sys.exit(main())
File "/home/rfor10/.nextflow/assets/nf-core/fetchngs/bin/sra_ids_to_runinfo.py", line 167, in main
fetch_sra_runinfo(args.FILE_IN, args.FILE_OUT, ena_metadata_fields)
File "/home/rfor10/.nextflow/assets/nf-core/fetchngs/bin/sra_ids_to_runinfo.py", line 133, in fetch_sra_runinfo
for row in csv_dict:
File "/opt/conda/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 778: ordinal not in range(128)`

Either running locally, or without docker which have python 3.8 in path runs fine with no errors.

input_type is supposed to have a default of 'sra' but parameter validation fails without it

Description of the bug

https://nf-co.re/fetchngs/1.5/parameters describes input_type as having a default value of 'sra' but without specifying the argument explicitly, the pipeline fails to validate parameters.

Command used and terminal output

nextflow run \
    nf-core/fetchngs \
    -r '1.5' \
    --input ids.txt \
    -profile docker \
    -resume


N E X T F L O W  ~  version 21.10.5
Launching `nf-core/fetchngs` [golden_avogadro] - revision: c318ae12fc [1.5]
                                               
ERROR: Validation of pipeline parameters failed!
                                                                                               
                                                                                               
Unknown method invocation `printExceptions` on NfcoreSchema type 
                                               
 -- Check script '/home/moritz/.nextflow/assets/nf-core/fetchngs/main.nf' at line: 20 or see '.nextflow.log' file for more details

Relevant files

Some IDs to test but they don't really matter.

SRR2231363
SRR2231379
SRR2195657

System information

No response

Running sratools prefetch on Azure

Description of the bug

The exit status of the task that caused the workflow execution to fail was: 1

Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14709033)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14709033)` terminated with an error exit status (1)

Command executed:

  eval "$(vdb-config -o n NCBI_SETTINGS | sed 's/[" ]//g')"
  if [[ ! -f "${NCBI_SETTINGS}" ]]; then
      mkdir -p "$(dirname "${NCBI_SETTINGS}")"
      printf '/LIBS/GUID = "5d0db162-e6a9-4167-a826-10b11c40e7e1"\n/libs/cloud/report_instance_identity = "true"\n' > "${NCBI_SETTINGS}"
  fi
  
  retry_with_backoff.sh prefetch \
       \
      --progress \
      SRR14709033
  
  vdb-validate SRR14709033
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH":
      sratools: $(prefetch --version 2>&1 | grep -Eo '[0-9.]+')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  mkdir: can't create directory '/home/_azbatchtask_1/': Permission denied

Work dir:
  az://pipelines/work/b2/8953f0f0c34a6644be0f7ee8a84b21

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Command used and terminal output

Pipeline configuration:

{
    "description" : "Pipeline to fetch metadata and raw FastQ files from public databases.",
    "launch" : {
        "pipeline" : "https://github.com/nf-core/fetchngs",
        "revision" : "1.5",
        "configProfiles" : [ "test" ],
        "paramsText" : "outdir: 'az://pipelines/profile_test/fetchngs/results'"
    }
}
``

Relevant files

Configuration used to create Azure environment:

{
  "workDir" : "az://pipelines/work",
  "region" : "eastus2",
  "preRunScript" : "export NXF_VER=21.10.4",
  "forge" : {
    "vmType" : "Standard_E8ds_v4",
    "vmCount" : 60,
    "autoScale" : true,
    "disposeOnDeletion" : true
  },
  "deleteJobsOnCompletion" : "on_success",
  "deletePoolsOnCompletion" : false,
  "environment" : [ ],
  "discriminator" : "azure-batch"
}

System information

Nextflow version 21.10.4
Hardware: Azure
Container engine: Docker
Version of nf-core/fetchngs: 1.5

Choice between fastq and fasterq-dump when using --force_sratools_download

Description of feature

Hi it's me again,

After solving all the issues related to my ncbi config file I stumbled yet across another pitfall connected to parallel file system architectures when using fasterq-dump which is also a bit more detailed in this issue here. The core of it is that fasterq-dump with multiple threads will send respective compute nodes into a sort of hibernation from which they can't be woken unless one reboots the nodes. This already cased quite a portion of our cluster to sleep. A proposed solution by our IT was to simply don't use it as there is no clear threshold of used threads upon which this will occur. I updated my local copy of the pipeline accordingly and it seems to work fine. Since I think I won't be the only one who is deploying fetchngs on a compute cluster I think it will be a valuable augmentation of the pipeline to enable the user to choose between fasterq and fastq-dump in order to avoid breaking a cluster with parallel file system architectures.

Error when using singularity - /etc/resolv.conf doesn't exist in container

Check Documentation

I have checked the following places for your error:

Description of the bug

Command:

nextflow run nf-core/fetchngs --input ids.txt -profile singularity

Error:

Error executing process > 'NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (SRR7667101)'

Caused by:
  Process `NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (SRR7667101)` terminated with an error exit status (1)

Command executed:

  echo SRR7667101 > id.txt
  sra_ids_to_runinfo.py \
      id.txt \
      SRR7667101.runinfo.tsv \

Command exit status:
  1

Command output:
  (empty)

executor >  local (6)
[e5/c7daaf] process > NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (SRR7667099) [100%] 6 of 6, failed: 6, retries: 4 โœ˜
[-        ] process > NFCORE_FETCHNGS:FETCHNGS:SRA_RUNINFO_TO_FTP              -
[-        ] process > NFCORE_FETCHNGS:FETCHNGS:SRA_FASTQ_FTP                   -
[-        ] process > NFCORE_FETCHNGS:FETCHNGS:SRA_TO_SAMPLESHEET              -
[-        ] process > NFCORE_FETCHNGS:FETCHNGS:SRA_MERGE_SAMPLESHEET           -
[e1/a4ddc5] NOTE: Process `NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (SRR7667101)` terminated with an error exit status (1) -- Execution is retried (1)
[ab/aab919] NOTE: Process `NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (SRR7667099)` terminated with an error exit status (1) -- Execution is retried (1)
[b3/e42c10] NOTE: Process `NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (SRR7667101)` terminated with an error exit status (1) -- Execution is retried (2)
[ee/bafd63] NOTE: Process `NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (SRR7667099)` terminated with an error exit status (1) -- Execution is retried (2)
Error executing process > 'NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (SRR7667101)'

Caused by:
  Process `NFCORE_FETCHNGS:FETCHNGS:SRA_IDS_TO_RUNINFO (SRR7667101)` terminated with an error exit status (1)

Command executed:

  echo SRR7667101 > id.txt
  sra_ids_to_runinfo.py \
      id.txt \
      SRR7667101.runinfo.tsv \

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: While bind mounting '/lustre7/home/lustre3/harukao/work/dc/30737a68a705c5750ef5205813c04e:/lustre7/home/lustre3/harukao/work/dc/30737a68a705c5750ef5205813c04e': destination is already in the mount point list
  WARNING: Skipping mount /opt/pkg/singularity/3.7.1/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
  HTTPSConnectionPool(host='www.ebi.ac.uk', port=443): Max retries exceeded with url: /ena/portal/api/returnFields?dataPortal=ena&format=tsv&result=read_run (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2aaaac67ae50>: Failed to establish a new connection: [Errno -2] Name or service not known'))

Work dir:
  /lustre7/home/lustre3/harukao/work/dc/30737a68a705c5750ef5205813c04e

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
$ cat ids.txt
SRR7667099
SRR7667101

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: As above
  2. See error: As above

Expected behaviour

  • Get FASTQ files for ids in ids.txt.

Log

.nextflow.log https://www.dropbox.com/sh/fiie8ian20zt17f/AACBfnMn1QiMjUi9778NL_jka?dl=0

System

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Nextflow Installation

$ nextflow -version

      N E X T F L O W
      version 21.04.1 build 5556
      created 14-05-2021 15:20 UTC (15-05-2021 00:20 JDT)
      cite doi:10.1038/nbt.3820
      http://nextflow.io

Container engine

$ singularity --version
singularity version 3.7.1

Additional context

Problem executing `sra_ids_to_runinfo.py`: KeyError: 'Experiment'

Description of the bug

While executing fetchngs on a list of SRA RUN accessions (attached) I got, during NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO:

    File "/home/ubuntu/.nextflow/assets/nf-core/fetchngs/bin/sra_ids_to_runinfo.py", line 491, in <module>
      sys.exit(main())
    File "/home/ubuntu/.nextflow/assets/nf-core/fetchngs/bin/sra_ids_to_runinfo.py", line 487, in main
      fetch_sra_runinfo(args.file_in, args.file_out, ena_metadata_fields)
    File "/home/ubuntu/.nextflow/assets/nf-core/fetchngs/bin/sra_ids_to_runinfo.py", line 461, in fetch_sra_runinfo
      ids = DatabaseResolver.expand_identifier(db_id)
    File "/home/ubuntu/.nextflow/assets/nf-core/fetchngs/bin/sra_ids_to_runinfo.py", line 233, in expand_identifier
      return cls._id_to_srx(identifier)
    File "/home/ubuntu/.nextflow/assets/nf-core/fetchngs/bin/sra_ids_to_runinfo.py", line 262, in _id_to_srx
      return [row["Experiment"] for row in open_table(response, delimiter=",")]
    File "/home/ubuntu/.nextflow/assets/nf-core/fetchngs/bin/sra_ids_to_runinfo.py", line 262, in <listcomp>
      return [row["Experiment"] for row in open_table(response, delimiter=",")]
  KeyError: 'Experiment'

with all the samples processed.

Example IDs:

  • SRR8907310
  • SRR8907330
  • SRR8907379

It could be that "${ID}.runinfo.tsv" was not generated properly: it's a single line like

accession       run_accession   experiment_accession    sample_accession        secondary_sample_accession      study_accession secondary_study_accession       parent_study submission_accession    run_alias       experiment_alias        sample_alias    study_alias     library_layout  library_selection       library_source  library_strategy     library_name    instrument_model        instrument_platform     base_count      read_count      tax_id  scientific_name sample_title    experiment_title     study_title     description     sample_description      fastq_md5       fastq_bytes     fastq_ftp       fastq_galaxy    fastq_aspera

Command used and terminal output

nextflow run nf-core/fetchngs -r 1.7 --input list.txt --outdir sra -profile docker

Relevant files

Input list: SRR_Acc_List.txt

System information

  • nextflow 21.10.6 build 5660
  • Cloud VM (OpenStack)
  • Executor: local
  • OS: "Ubuntu 20.04.3 LTS"
  • Container: Docker version 18.09.7, build 2d0083d
  • fetchngs 1.7

--force_sratools_download results in different fastq names compared to FTP download

Description of the bug

Hey,

is it intended that --force_sratools_download results in different fastq names?

Without --force_sratools_download the fastqs have naming format SRX[0-9]+SRR[0-9][1-2].fastq.gz whereas with --force_sratools_download the naming is SRR[0-9]+_[1-2].fastq.gz.

This means that you cannot combine downloading fastqs both with and without --force_sratools_download in the event of ebi curl errors (without) e.g.

Command error:
  INFO:    Converting SIF file to temporary sandbox...
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                   Dload  Upload   Total   Spent    Left  Speed
  
    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  curl: (7) Failed to connect to ftp.sra.ebi.ac.uk port 21 after 1614 ms: Connection refused

and fasterq-dump errors (with) e.g.

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  .command.run: fork: Cannot allocate memory
  fasterq-dump was killed (signal 9 SIGKILL)
  .command.run: line 163: kill: (80666) - No such process
  INFO:    Cleaning up image...

Many thanks,
Oliver

Wrong use of --progress in prefetch command leads to failure of pipeline

Description of the bug

I am currently trying to retrieve 15k samples from SRA. Since FTP download was painfully slow due to an apparent connection limit I defaulted to use the sra-toolkit via --force_sratools_download. However, this led to an immediate failure of the pipeline at the prefetch process stage since it just produced retries and ultimately aborted execution due to the retry limit. After a bit of digging through the pipelines code I found that the prefetch process is using the following command

prefetch --progress <sra_accession>

Although --progress is a valid command line parameter it's use in this case is wrong. --progress expects an integer value denoting the time scale to use for displaying the progress of the download. Since there is no such value in this command it consumes the actual accession subsequently is not available for the tool to fetch the data which leads to failure. Here is the relevant part of the help message

-p|--progress <value>            time period in minutes to display download
                                 progress (0: no progress), default: 1

I would thus recommend to either remove the --progress argument or add a value.

Command used and terminal output

I ran the pipeline with `--force_sratools_download` and a simple modification where I added `echo $output` to the `retry_with_backoff.sh` script in the `retry_with_backoff` function to see what was wrong and got the following output in any of the .command.log files

prefetch --progress DRR078852
Usage: prefetch [options] <SRA accession | kart file> [...] Download SRA or dbGaP files and their dependencies prefetch [options] <SRA file> [...] Check SRA file for missed dependencies and download them prefetch --list <kart file> [...] List the content of a kart file
Failed attempt 1 of 5. Retrying in 1 s.
prefetch --progress DRR078852
Usage: prefetch [options] <SRA accession | kart file> [...] Download SRA or dbGaP files and their dependencies prefetch [options] <SRA file> [...] Check SRA file for missed dependencies and download them prefetch --list <kart file> [...] List the content of a kart file
Failed attempt 2 of 5. Retrying in 2 s.
prefetch --progress DRR078852
Usage: prefetch [options] <SRA accession | kart file> [...] Download SRA or dbGaP files and their dependencies prefetch [options] <SRA file> [...] Check SRA file for missed dependencies and download them prefetch --list <kart file> [...] List the content of a kart file
Failed attempt 3 of 5. Retrying in 4 s.
prefetch --progress DRR078852
Usage: prefetch [options] <SRA accession | kart file> [...] Download SRA or dbGaP files and their dependencies prefetch [options] <SRA file> [...] Check SRA file for missed dependencies and download them prefetch --list <kart file> [...] List the content of a kart file
Failed attempt 4 of 5. Retrying in 8 s.
prefetch --progress DRR078852
Usage: prefetch [options] <SRA accession | kart file> [...] Download SRA or dbGaP files and their dependencies prefetch [options] <SRA file> [...] Check SRA file for missed dependencies and download them prefetch --list <kart file> [...] List the content of a kart file
Failed after 5 attempts.

Relevant files

No response

System information

No response

WARN: Cannot read project manifest -- Cause: API rate limit exceeded

Description of the bug

Thank you for this is excellent tool. It has worked fine for me in the past, but I've started getting the message below.

(I can't expect users on the system to have logged in to GitHub.)

Do you know how I can solve the problem? Other nf-core pipelines currently work on my machine.

Steps to reproduce

swingett@cell-bio-xeon:/data1_wingett/Magda/classifier/HipSci_Full_Download$ nextflow run fetcher --input to_download_ERR.txt -bg
swingett@cell-bio-xeon:
/data1_wingett/Magda/classifier/HipSci_Full_Download$ N E X T F L O W ~ version 21.04.1
Pulling nextflow-io/fetcher ...
WARN: Cannot read project manifest -- Cause: API rate limit exceeded -- Provide your GitHub user name and password to get a higher rate limit
API rate limit exceeded -- Provide your GitHub user name and password to get a higher rate limit

System

  • Hardware:
  • Executor:
  • OS:
  • Version

Nextflow Installation

  • Version:

fetchngs pipeline not working on scRNAseq

Hi there!

I have used fetchngs with no problems for downloading RNAseq data from SRA. However, when I tried the same with scRNAseq data from SRA. The pipeline ended saying "Pipeline completed successfully-", but when I opened the results folder, it had the "IDS_NOT_DOWNLOADED.txt" containing the input IDs. This was tried with the following script;

#!/bin/bash
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --mem=4gb
#SBATCH --job-name=ids8-download
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
#SBATCH --partition=tmp_anvil,batch

module load nextflow
module load singularity

The job command(s):

nextflow run nf-core/fetchngs --input ids8.txt -profile singularity

nextflow run nf-core/fetchngs --input ids8.txt -profile singularity

This was tried on following SRA Id's
SRX7278088
SRX7278087
SRX7278086
SRX7278085
SRX7278084
SRX7278083
SRX7278082
SRX7278081

Can anybody troubleshoot this and provide a way to make it work.

UPDATE: The fetchngs pipeline does not seem to work on any other datasets besides RNASeq. Tested pipeline on exome sequencing datasets and single-cell RNAsequencing datasets. It does not download datasets, but completes the job saying it has successfully completed the pipeline from execution.

regards
Arch

Test run for fetchngs failed

Description of the bug

Test run on the minimal dataset failed

Command used and terminal output

Command
nextflow run nf-core/fetchngs -profile test docker

Output
Core Nextflow options
  revision                  : master
  runName                   : nauseous_yalow
  launchDir                 : /home/dalin/nf_core
  workDir                   : /home/dalin/nf_core/work
  projectDir                : /home/dalin/.nextflow/assets/nf-core/fetchngs
  userName                  : dalin
  profile                   : test
  configFiles               : /home/dalin/.nextflow/assets/nf-core/fetchngs/nextflow.config

Input/output options
  input                     : https://raw.githubusercontent.com/nf-core/test-datasets/fetchngs/sra_ids_test.txt

Institutional config options
  config_profile_name       : Test profile
  config_profile_description: Minimal test dataset to check pipeline function

Max job request options
  max_cpus                  : 2
  max_memory                : 6 GB
  max_time                  : 6h

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/fetchngs for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.5070524

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/fetchngs/blob/master/CITATIONS.md
------------------------------------------------------
executor >  local (28)
[91/de6dcf] process > NFCORE_FETCHNGS:SRA:SRA_IDS... [ 70%] 7 of 10
[38/346b04] process > NFCORE_FETCHNGS:SRA:SRA_RUN... [100%] 7 of 7
[1b/9ccda1] process > NFCORE_FETCHNGS:SRA:SRA_FAS... [  0%] 0 of 4
[6b/3f1324] process > NFCORE_FETCHNGS:SRA:SRA_FAS... [ 62%] 5 of 8, failed: 5..
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FAS... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_MER... -
[-        ] process > NFCORE_FETCHNGS:SRA:MULTIQC... -
[-        ] process > NFCORE_FETCHNGS:SRA:CUSTOM_... -
executor >  local (30)
[91/de6dcf] process > NFCORE_FETCHNGS:SRA:SRA_IDS... [ 70%] 7 of 10
[38/346b04] process > NFCORE_FETCHNGS:SRA:SRA_RUN... [100%] 7 of 7
[1b/9ccda1] process > NFCORE_FETCHNGS:SRA:SRA_FAS... [  0%] 0 of 4
[bd/9bd173] process > NFCORE_FETCHNGS:SRA:SRA_FAS... [ 66%] 6 of 9, failed: 6..
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FAS... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_MER... -
[-        ] process > NFCORE_FETCHNGS:SRA:MULTIQC... -
[-        ] process > NFCORE_FETCHNGS:SRA:CUSTOM_... -
[68/a3b0c4] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14593545)` terminated with an error exit status (1) -- Execution is retried (1)
[2d/b2d3a5] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14709033)` terminated with an error exit status (1) -- Execution is retried (1)
[9a/02508e] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR12848126)` terminated with an error exit status (1) -- Execution is retried (1)
[29/3ce30d] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14709033)` terminated with an error exit status (1) -- Execution is retried (2)
[c1/c5b350] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14593545)` terminated with an error exit status (1) -- Execution is retried (2)
[b3/0c2b7d] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR12848126)` terminated with an error exit status (1) -- Execution is retried (2)
Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14709033)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14709033)` terminated with an error exit status (1)

Command executed:

  eval "$(vdb-config -o n NCBI_SETTINGS | sed 's/[" ]//g')"
  if [[ ! -f "${NCBI_SETTINGS}" ]]; then
      mkdir -p "$(dirname "${NCBI_SETTINGS}")"
      printf '/LIBS/GUID = "b6aa337c-d642-4239-a4ab-552792ae78b2"\n/libs/cloud/report_instance_identity = "true"\n' > "${NCBI_SETTINGS}"
  fi

  retry_with_backoff.sh prefetch \
       \
      --progress \
      SRR14709033

  vdb-validate SRR14709033

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH":
      sratools: $(prefetch --version 2>&1 | grep -Eo '[0-9.]+')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  .command.sh: line 2: vdb-config: command not found
  .command.sh: line 3: NCBI_SETTINGS: unbound variable

executor >  local (30)
[e7/f6dad9] process > NFCORE_FETCHNGS:SRA:SRA_IDS... [100%] 7 of 7
[38/346b04] process > NFCORE_FETCHNGS:SRA:SRA_RUN... [100%] 7 of 7
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FAS... -
[bd/9bd173] process > NFCORE_FETCHNGS:SRA:SRA_FAS... [100%] 9 of 9, failed: 9..
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_FAS... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_TO_... -
[-        ] process > NFCORE_FETCHNGS:SRA:SRA_MER... -
[-        ] process > NFCORE_FETCHNGS:SRA:MULTIQC... -
[-        ] process > NFCORE_FETCHNGS:SRA:CUSTOM_... -
[68/a3b0c4] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14593545)` terminated with an error exit status (1) -- Execution is retried (1)
[2d/b2d3a5] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14709033)` terminated with an error exit status (1) -- Execution is retried (1)
[9a/02508e] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR12848126)` terminated with an error exit status (1) -- Execution is retried (1)
[29/3ce30d] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14709033)` terminated with an error exit status (1) -- Execution is retried (2)
[c1/c5b350] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14593545)` terminated with an error exit status (1) -- Execution is retried (2)
[b3/0c2b7d] NOTE: Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR12848126)` terminated with an error exit status (1) -- Execution is retried (2)
Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14709033)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH (SRR14709033)` terminated with an error exit status (1)

Command executed:

  eval "$(vdb-config -o n NCBI_SETTINGS | sed 's/[" ]//g')"
  if [[ ! -f "${NCBI_SETTINGS}" ]]; then
      mkdir -p "$(dirname "${NCBI_SETTINGS}")"
      printf '/LIBS/GUID = "b6aa337c-d642-4239-a4ab-552792ae78b2"\n/libs/cloud/report_instance_identity = "true"\n' > "${NCBI_SETTINGS}"
  fi

  retry_with_backoff.sh prefetch \
       \
      --progress \
      SRR14709033

  vdb-validate SRR14709033

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_FASTQ_SRATOOLS:SRATOOLS_PREFETCH":
      sratools: $(prefetch --version 2>&1 | grep -Eo '[0-9.]+')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  .command.sh: line 2: vdb-config: command not found
  .command.sh: line 3: NCBI_SETTINGS: unbound variable

Work dir:
  /home/dalin/nf_core/work/6b/3f1324636644f4b5afbc809ce41f5f

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`


WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.

Relevant files

No response

System information

nextflow: v22.04
Hardware: Desktop
Container: Docker
OS: Ubuntu 22.04
nf-core/fetch ngs: v1.5

eutils rate limits exceeded (with relatively small input set of accessions)

Description of the bug

looks like #102 may have had some unintended side effects, although I am not %100 sure that I'm interpreting the situation correctly.

Basically I returned to use the pipeline since the last release and I was getting a lot of 429 http errors at the stage of the pipeline where SRR accessions are being resolved to SRX accessions. I think this is likely due to the pipeline exceeding the eutils rate limits (3 requests per second without an API key, 10 requests per second with an API key, per https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/)
Happily, I was able to work around it by getting the SRX numbers from the SRA Run metadata download and just supplying those directly; I assume that this simply avoids calling eutils but I guess this isn't the way it's expected to work.

It seems like adding support for API keys to eutils could be one approach but likely it wouldn't be too hard to exceed 10 requests per second either, so probably some other strategy for throttling or batching the requests would be better. Maybe not parallelizing this aspect of the pipeline and just getting the accessions in a single batch request would make more sense? It's kind of funny that the actual parallel download of fastqs is not an issue while the parallel conversion of ids seems to trigger this (at least, if I'm interpreting correctly).

Command used and terminal output

nextflow run nf-core/fetchngs --force_sratools_download --input alfalfa_gene_index_acclist.txt --nf_core_pipeline rnaseq --outdir alfalfa_gene_index -profile singularity

...

Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SRR1820232)'

Caused by:
  Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SRR1820232)` terminated with an error exit status (1)

Command executed:

  echo SRR1820232 > id.txt
  sra_ids_to_runinfo.py \
      id.txt \
      SRR1820232.runinfo.tsv \


  cat <<-END_VERSIONS > versions.yml
  "NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_DEBUG as environment variable will not be supported in the future, use APPTAINERENV_NXF_DEBUG instea
  WARNING: While bind mounting '/erdos/adf/nf-core/fetchngs/work/7a/d13ccf59fee2fb454b5ea67d464909:/erdos/adf/nf-core/fetchngs/work/7a/d13ccf59fee2fb454b5ea67d464909': destination is already in the mount point list
  [ERROR] The server couldn't fulfill the request.
  [ERROR] Status: 429 Too Many Requests

Work dir:
  /erdos/adf/nf-core/fetchngs/work/7a/d13ccf59fee2fb454b5ea67d464909

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Relevant files

this file triggered the error:
SRR_Acc_List.txt
this file has the equivalent SRX accessions and worked fine
SRX_Acc_List.txt

System information

nextflow version 21.10.6.5660
nf-core/fetchngs v1.7

Check if running offline

This pipeline won't work in offline situations because it needs to download metadata and fastq files from the web. We should probably add a check/warning/fail to make sure that there is an internet connection when running. This would have to factor in situations where login nodes could have an internet connection but the nodes on the cluster may be offline. We should be able to add try/catch logic to the bin/sra_ids_to_runinfo.py script to detect this or maybe we just write a separate module for this that can be re-used if required as this will only ever by run once at the beginning of the workflow with almost zero resources.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.