pachterlab / ffq Goto Github PK

A tool to find sequencing data and metadata from public databases.

License: MIT License

Python 98.96% Makefile 1.04%

ffq's Introduction

ffq

! NCBI is deprecating .SRA file links. This may result in an empty list with `--ncbi`.
+ Have a cool use case for ffq? Submit a PR to the `Use cases` section and we'll feature it!

Fetch metadata information from the following databases:

GEO: Gene Expression Omnibus,
SRA: Sequence Read Archive,
EMBL-EBI: European Molecular BIology Laboratory’s European BIoinformatics Institute,
DDBJ: DNA Data Bank of Japan,
NIH Biosample: Biological source materials used in experimental assays,
ENCODE: The Encyclopedia of DNA Elements.

ffq receives an accession and returns the metadata for that accession as well as the metadata for all downstream accessions following the connections between GEO, SRA, EMBL-EBI, DDBJ, and Biosample. If you use ffq in a publication, please the cite*:

Gálvez-Merchán, Á., et al. (2022). Metadata retrieval from sequence databases with ffq. bioRxiv 2022.05.18.492548.

The manuscript is available here: https://doi.org/10.1101/2022.05.18.492548.

By default, ffq returns all downstream metadata down to the level of the SRR record. However, the desired level of resolution can be specified.

ffq can also skip returning the metadata, and instead return the raw data download links from any available host (FTP, AWS, GCP or NCBI) for GEO and SRA ids.

Installation

The latest release can be installed with

pip install ffq

The development version can be installed with

pip install git+https://github.com/pachterlab/ffq

Usage

Fetch information of an accession and display it in the terminal

ffq [accession]

where [accession] is either:

an SRA/EBI/DDJ accession
- (SRR, SRX, SRS or SRP)
- (ERR, ERX, ERS or ERP)
- (DRR, DRS, DRX or DRP)
a GEO accession (GSE or GSM)
an ENCODE accession (ENCSR, ENCSB or ENCSD)
a Bioproject accession (CXR)
a Biosample accession (SAMN)
a DOI

Examples:

$ ffq SRR9990627
#=> Returns metadata for the SRR9990627 run.

$ ffq SRX7347523
#=> Returns metadata for the experiment SRX7347523 and for its associated SRR run.

$ ffq GSE129845
#=> Returns metadata for GSE129845 and for its 5 associated GSM, SRS, SRX and SRR ids.

$ ffq DRP004583
#=> Returns metadata for the study DRP004583 and its 104 associated DRS, DRX and SRR ids.

$ ffq ENCSR998WNE
#=> Returns metadata for the ENCODE experiment ENCSR998WNE.

Fetch information of multiple accessions and display it in the terminal

ffq [accession 1] [accession 2] ...

where [accession 1] and [accession 2] are accessions belonging to any of the above usage example categories.

Examples:

$ ffq SRR11181954 SRR11181954 SRR11181956
#=> Returns metadata for the three SRR runs.

$ ffq GSM4339769 GSM4339770 GSM4339771
#=> Returns metadata for the three GSM accessions, as well as for their corresponding downstream SRS, SRX and SRR accessions.

Fetch information of an accession only down to specified level

ffq -l [level] [accession]

where [level] is the number of downstream accessions you want to fetch

Examples:

$ ffq -l 1 GSM4339769
#=> Returns metadata only for GSM4339769, and not from any downstream accession.

$ ffq -l 3 GSE115469
#=> Returns metadata for GSE115469 and its downstream GSM and SRS accessions.

Fetch only raw data links from the host of your choice and display it in the terminal

FTP host

ffq --ftp [accession(s)]

where [accession(s)] is either a single accession or a space-delimited list of accessions.

AWS host

ffq --aws [accession(s)]

GCP host

ffq --gcp [accession(s)]

NCBI host

ffq --ncbi [accession(s)]

Examples:

# FTP with an SRR
$ ffq --ftp SRR10668798
[
    {
        "accession": "SRR10668798",
        "filename": "SRR10668798_1.fastq.gz",
        "filetype": "fastq",
        "filesize": 31876537192,
        "filenumber": 1,
        "md5": "bf8078b5a9cc62b0fee98059f5b87fa7",
        "urltype": "ftp",
        "url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/098/SRR10668798/SRR10668798_1.fastq.gz"
    },
...

# FTP with a GSE
$ ffq --ftp GSE115469
[
    {
        "accession": "SRR7276474",
        "filename": "P1TLH.bam",
        "filetype": "bam",
        "filesize": 48545467653,
        "filenumber": 1,
        "md5": "d0fde6bf21d9f97bdf349a3d6f0a8787",
        "urltype": "ftp",
        "url": "ftp://ftp.sra.ebi.ac.uk/vol1/SRA716/SRA716608/bam/P1TLH.bam"
    },
...

# AWS with SRX
$ ffq --aws SRX7347523
[
    {
        "accession": "SRR10668798",
        "filename": "T84_S1_L001_R1_001.fastq.1",
        "filetype": "fastq",
        "filesize": null,
        "filenumber": 1,
        "md5": null,
        "urltype": "aws",
        "url": "s3://sra-pub-src-6/SRR10668798/T84_S1_L001_R1_001.fastq.1"
    },
...

# GCP with ERS
$ ffq --gcp ERS3861775
[
    {
        "accession": "ERR3585496",
        "filename": "4834STDY7002879.bam.1",
        "filetype": "bam",
        "filesize": null,
        "filenumber": 1,
        "md5": null,
        "urltype": "gcp",
        "url": "gs://sra-pub-src-17/ERR3585496/4834STDY7002879.bam.1"
    }
]

# NCBI with GSM
$ ffq --ncbi GSM2905292
[
    {
        "accession": "SRR6425163",
        "filename": "SRR6425163.1",
        "filetype": "sra",
        "filesize": null,
        "filenumber": 1,
        "md5": null,
        "urltype": "ncbi",
        "url": "https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-13/SRR6425163/SRR6425163.1"
    }
]

Write accession information to a single JSON file

ffq -o [JSON_PATH] [accession(s)]

where [JSON_PATH] is the path to the JSON file that will contain the information and [accession(s)] is either a single accession or a space-delimited list of accessions.

Write accession information to multiple JSON files, one file per accession

ffq -o [OUT_DIR] --split [accessions]

where [OUT_DIR] is the path to directory to which to write the JSON files and [accessions] is a space-delimited list of accessions. Information about each accession will be written to its own separate JSON file named [accession].json.

Fetch information of all studies (and all of their runs) in one or more papers

ffq [DOIS]

where [DOIS] is a space-delimited list of one or more DOIs. The output is a JSON-formatted string (or a JSON file if -o is provided) with SRA study accessions as keys. When --split is also provided, each study is written to its own separate JSON.

Complete output examples

Examples of complete outputs are available in the examples directory.

Downloading data

ffq is specifically designed to download metadata and to facilitate obtaining links to sequence files. To download raw data from the links obtained with ffq you can use one of the following:

cURL and wget for FTP links,
aws for AWS links,
gsutil for GCP links,
fasterq dump for converting SRA files to FASTQ files.

FTP

By default, cURL is installed on most computers and can be used to download files with FTP links. Alternatively, wget can be used.

# Obtain FTP links
$ ffq --ftp SRR10668798
[
    {
        "accession": "SRR10668798",
        "filename": "SRR10668798_1.fastq.gz",
        "filetype": "fastq",
        "filesize": 31876537192,
        "filenumber": 1,
        "md5": "bf8078b5a9cc62b0fee98059f5b87fa7",
        "urltype": "ftp",
        "url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/098/SRR10668798/SRR10668798_1.fastq.gz"
    },
    {
        "accession": "SRR10668798",
        "filename": "SRR10668798_2.fastq.gz",
        "filetype": "fastq",
        "filesize": 43760586944,
        "filenumber": 2,
        "md5": "351df47dca211c1f66ef327e280bd4fd",
        "urltype": "ftp",
        "url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/098/SRR10668798/SRR10668798_2.fastq.gz"
    }
]

# Download the files one-by-one
$ curl -O ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/098/SRR10668798/SRR10668798_1.fastq.gz 
$ curl -O ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR106/098/SRR10668798/SRR10668798_2.fastq.gz

Alternatively, the urls can be extracted from the json output with jq and then piped into cURL.

$ ffq --ftp SRR10668798 | jq -r '.[] | .url' | xargs curl -O

If you don't have jq installed, you can use the default program grep.

$ ffq --ftp SRR10668798 | grep -Eo '"url": "[^"]*"' | grep -o '"[^"]*"$' | xargs curl -O

AWS

In order to download files from AWS, the aws tool must be installed and credentials must be setup.

# Pipe AWS links to aws s3 cp and download
$ ffq --aws SRX7347523 | jq -r '.[] | .url' | xargs -I {} aws s3 cp {} .

GCP

In order to download files from GCP, the gsutil tool must be install and credentials must be setup.

# Pipe GCP links to gsutil cp and download
$ ffq --gcp ERS3861775 | jq -r '.[] | .url' | xargs -I {} gsutil cp {} .

NCBI-SRA

SRA files downloaded from NCBI can be converted to FASTQ files using fastq-dump or the improved fasterq-dump both of which are installed as part of SRA Toolkit.

# Pipe SRA link to curl and download the SRA file
$ ffq --ncbi GSM2905292 | jq -r '.[] | .url' | xargs curl -O

# Convert the SRA file to FASTQ files with one of the following
$ fastq-dump   ./SRR6425163.1 --split-files --include-technical -O ./SRR6425163 --gzip 
$ fasterq-dump ./SRR6425163.1 --split-files --include-technical -O ./SRR6425163        # fasterq-dump does not have gzip option

Use cases

ffq facilitates the acquisition of publicly available sequencing data to help answer relevant research questions.

The following was submitted by @sbooeshaghi.

# Goal: quantify publicly available scRNAseq data
$ pip install kb-python gget ffq
$ kb ref -i index.idx -g t2g.txt -f1 transcriptome.fa $(gget ref --ftp -w dna,gtf homo_sapiens)
$ kb count -i index.idx -g t2g.txt -x 10xv3 -o out $(ffq --ftp SRR10668798 | jq -r '.[] | .url' | tr '\n' ' ')
# -> count matrix in out/ folder

# Goal: count the total number of reads
$ ffq SRR10668798 | jq '.. | ."ENA-SPOT-COUNT"? | select(. != null)' |  paste -sd+ - | bc
624886427

# Goal: check the total size of the FASTQ files
$ ffq --ftp SRR10668798 | jq '.[] | .filesize ' | paste -sd+ - | bc | numfmt --to=iec-i --suffix=B
71GiB

# Goal: count the number of FASTQ files
$ ffq --ftp SRR10668798 | jq -r 'length'
2

# Goal: get sequence stats for the first 100 entries with seqkit
$ curl -s $(ffq --ftp SRR10668798 | jq -r '.[0] | .url') | zcat | head -400 | seqkit stats -a
file  format  type  num_seqs  sum_len  min_len  avg_len  max_len  Q1  Q2  Q3  sum_gap  N50  Q20(%)  Q30(%)
-     FASTQ   DNA        100    2,600       26       26       26  13  26  13        0   26   95.31   92.92

The following was submitted by @agalvezm.

# Goal: print the first 3 sequences of read 1 to the screen
$ curl -s $(ffq --ftp SRR10668798 | jq -r '.[0] | .url') | zcat | awk '(NR-2)%4==0' | head -n
NCCAAATAGGAATTACATACACCCCC
NAACCTGAGTAGATGTGTTGTTAACT
NGATCTGAGAACTCGGAACTATTTTC

# Goal: get number of counts per unique read sequence from the first 10000 reads
$ curl -s $(ffq --ftp accession | jq -r '.[0] | .url') | zcat | awk '(NR-2)%4==0'| head -n 10000 | sort | uniq -c | sort -r
4 TACACGACACTTAACGATCGGCCTTC
4 GTACTTTAGGCCCGTTTGTGTGCGAT
4 GACGGCTAGTACATGATATAACAAGC
...

The following was submitted by @telatin.

# Goal: concurrent download of a set of FASTQ files given a list of IDs (list.txt)
# (Requires Nextflow and Docker, or Conda, to be installed. Pipeline and dependencies will be installed automatically)
$ nextflow run telatin/getreads -r main -profile docker --list list.txt --outdir downloaded-reads/

For instructions on how to install Nextflow and Docker, or Conda, see the installation instructions.

Do you have a cool use case for ffq? Submit a PR (including the goal, code snippet, and your username) so that we can feature it here.

Failure modes

Many factors, independent of ffq, may result in failure to fetch metadata or missing metadata including:

broken internet connection
improperly formatted accession
recently submitted data to SRA (not synced with ENA)
exceeded request rate for servers
missing metadata from online database

If you believe you have identified a bug in ffq please see the section on contributing*.

Contributing

Thank you for wanting to improve ffq! If you have a bug that is related to ffq please create an issue. The issue should contain

the ffq command ran with --verbose,
the error message, and
the ffq and python version.

Please make all Pull Requests against the devel branch and include a message detailing the exact changes made, the reasons for the change, and tests that check for the correctness of those changes.

Some tips for improving the ffq code base:

the developer dependencies can be installed with pip install -r dev-requirements.txt
unit tests can be added to the ./tests/test_*.py
code reformatting can be performed by running black ffq/
code quality can be checked by running make check
tests can be performed by running make test

Caveats and limitations

ffq relies on the information provided by the different APIs it uses to retrieve metadata (hosted by ENA, NCBI, ENCODE, etc). Therefore, returning consistent and accurate metadata is dependent on the accuracy and consistency of such databases. Unfortunately, we have observed instances where some APIs are updated without notice. This leads to unconsistent metadata retrieval by ffq that cannot be solved on our end.

For example, as of May 29th, the command:

ffq --ncbi SRR6835844

returned:

[{'accession': 'SRR6835844',
'filename': 'SRR6835844.1',
'filenumber': 1,
'filesize': None,
'filetype': 'sra',
 'md5': None,
'url': 'https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-13/SRR6835844/SRR6835844.1',
'urltype': 'ncbi'}]

On June 1st, we detected an error in one of ffq’s tests. Running the same command led to the following output:

[]

Investigating this issue, we discovered that the output of the eutil’s efetch tool had changed (for a comparison, compare files SRR6835844_altlinks_old.txt and SRR6835844_altlinks_new.txt contained in tests/fixtures). In the new output, ncbi hosted links were no longer provided. This affects a large number of accessions, not only SRR6835844. We have updated our tests accordingly and will continue to monitor the situation.

Naming

ffq is short for FetchFastQ.

Cite

@article{galvez2022metadata,
  title={Metadata retrieval from sequence databases with ffq},
  author={G{\'a}lvez-Merch{\'a}n, {\'A}ngel and Min, Kyung Hoi Joseph and Pachter, Lior and Booeshaghi, A. Sina},
  year={2022}
}

ffq's People

Contributors

Stargazers

Watchers

ffq's Issues

ffq --ftp file not found

Running the following commandffq --ftp SRR13917446 returned the output below

[2024-03-19 13:53:23,828] INFO Parsing run SRR13917446
[
{
"accession": "SRR13917446",
"filename": "SRR13917446.fastq.gz",
"filetype": "fastq",
"filesize": 163077560,
"filenumber": 1,
"md5": "7ce0cb1683922dcf10e8e7e73001b515",
"urltype": "ftp",
"url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/046/SRR13917446/SRR13917446.fastq.gz"
}
]

When I tried to use $(ffq --ftp SRR13917422 | jq -r '.[] | .url' | tr '\n' ' ') (as written in the kb tutorial) it gave

Error: file not found ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/022/SRR13917422/SRR13917422.fastq.gz

Which was weird because I can still access it via both the https and ftp https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR139/022/SRR13917422/SRR13917422.fastq.gz

I used this version of ffq:

Name: ffq
Version: 0.3.1
Summary: A command line tool that makes it easier to find sequencing data from SRA / GEO / ENCODE / ENA / EBI-EMBL / DDBJ / Biosample.
Home-page: https://github.com/pachterlab/ffq
Author: Kyung Hoi (Joseph) Min, Sina Booeshaghi, Ángel Gálvez Merchán
Author-email: [email protected], [email protected], [email protected]
License: MIT
Location: /uoa/scratch/users/r04mr23/envs/preprocessing/lib/python3.9/site-packages
Requires: beautifulsoup4, frozendict, lxml, requests
Required-by:

Connection reset

Hi all!

Thank you for a great resource! I am using it to download the R1 reads of multiple bioprojects. I have created a text file with all the SRR ids associated with a bioproject, and feed that to ffq for download.

Code-
while read line;do ffq --ftp ${line} | xargs curl -O ;done < 2-accession_list_PRJNA730929.txt

where the 2-accession_list_PRJNA730929.txt consists of -
SRR14583629
SRR14583630
SRR14583631
SRR14583632
SRR14583633

After generating output for ~100 SRR ids (out of 316 ids), I get the following error. Any suggestions?

Error-
curl: (56) Recv failure: Connection reset by peer curl: (56) Recv failure: Connection reset by peer % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:02:11 --:--:-- 0 curl: (28) Failed to connect to ftp.sra.ebi.ac.uk port 21: Connection timed out

PRJNA format Bioproject support

Many thanks the brilliant tool.

I am just wondering if there is support for bio project IDs that begin with PRJNA. This does not appear to be recognised.

Many thanks

Couldn't find a tree builder with the features you requested: xml. Do you need to install a parser library?

Error when running ffq: ERROR Couldn't find a tree builder with the features you requested: XML. Do you need to install a parser library?

Description: I installed ffq and lxml is already installed (ffq 0.3.0)

Resolution: soup = BeautifulSoup(html, "html5lib") # or html.parser

add CRX accession parsing

e.g. CRX102289, associated with bioproject: PRJNA102289

https://ngdc.cncb.ac.cn/gsa/browse/CRA002497/CRX102289

https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA002413

unable to retrieve non-SRR runs (DRR and ERR runs)

Great tool! It works perfectly for the SRR runs I have attempted to fetch from SRA. However, when I try to fetch DRR or ERR runs, for example using ffq ERR3372842 > ERR3372842.json, I get this error:
usage: ffq [-h] [-o OUT] [-t TYPE] [--split] [--verbose] IDs [IDs ...]
ffq: error: ERR3372842 failed validation. SRRs must start with 'SRR' and end with digits.

This problem could be because of how/where these samples are stored, in which case this is not necessarily an easy fix, but if it is an internal check you are doing, it is worth noting that some run accessions start with "ERR" and "DRR" - this Biostars question might be helpful in explaining the difference between SRR/ERR/DRR runs: https://www.biostars.org/p/381527/

Suggested solution: If there is an internal check giving this error, modify internal check to accept ERR/DRR run accessions. If it is not an internal check and ERR/DRR runs not easily accessible, I don't have a solution.

Thank you!

Error: ffq GSE10320

I tried download two different GEO series and got the same error.

ERROR No sample found. Either the provided GSM accession is invalid or raw data was not provided for this record

access original uploaded bam files from SRA?

Hi, thanks for this great tool. Is it possible to access the original uploaded files for an accession in SRA? I am trying to recover raw fastqs from 10x samples uploaded as bams to SRA. I am told by SRA that "The SRA does not properly preserve all 10X bam flags when archiving aligned 10X data. The SRA is therefore not able to correctly serve 10X data that is submitted in an aligned format via the SRA Toolkit. Because of this, the SRA makes the originally submitted files available for download directly from the SRA Run pages. For example:https://www.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR11546787&display=data-access

Is it possible to recover these original bams with ffq? Are you aware of another tool to achieve this?

Thanks for any help!

requirements.txt in pypi source file

It seems that requirements.txt is missing in the source file distributed by pypi causing non wheel installations to crash.

`--split` throws an error

when running on 0.2.1 the command

# Echo exemplifies the fact that my input is a list of IDs from a process
echo -e "SRR19440534\tSRR19440543" | grep . | xargs ffq --split -o T/

The program throws this error:

Traceback (most recent call last):
  File "/mnt/disk/miniconda3/bin/ffq", line 10, in <module>
    sys.exit(main())
  File "/mnt/disk/miniconda3/lib/python3.9/site-packages/ffq/main.py", line 245, in main
    with open(os.path.join(args.o, f'{result["accession"]}.json'),
TypeError: string indices must be integers

Notes:

omitting --split works
even ffq --split -o T/ SRR19440534 SRR19440543 fails, but I wanted to supply my use case

fetchngs + ffq = <3 ?

Hi,

I saw your thread on Twitter. Congrats on the new release. I'm one of the people who created fetchngs, an nf-core nextflow pipeline for downloading FASTQ files from public archives.

We have discussed a few times that ffq might be a good fit to be used in the pipeline instead of our current scripts (mentioned here and also on the nf-core Slack). @drpatelh it seems to me that with the ability to supply multiple accessions at various levels, ffq might now be a good fit. I haven't fully perused ffq's source code yet but if there are any missing features, it makes more sense to me to contribute them here. Thus ffq would be further improved and we can drop all the scripts.

ERROR Couldn't find a tree builder with the features you requested: xml.

I am use ffq V0.2.1 . Install ffq by pip install ffq .
I can't get any results running ffq, either SRR or SRP.
I also check the require packages. Version as follows.

Requirement already satisfied: beautifulsoup4>=4.8.2 in /share/home/miniconda3/lib/python3.8/site-packages (from ffq==0.2.1) (4.11.1)
Requirement already satisfied: frozendict>=2.0.3 in /share/home/miniconda3/lib/python3.8/site-packages (from ffq==0.2.1) (2.1.3)
Requirement already satisfied: lxml>=4.5.0 in /share/home/miniconda3/lib/python3.8/site-packages (from ffq==0.2.1) (4.7.1)
Requirement already satisfied: requests>=2.23.0 in /share/home/miniconda3/lib/python3.8/site-packages (from ffq==0.2.1) (2.27.1)
Requirement already satisfied: soupsieve>1.2 in /share/home/miniconda3/lib/python3.8/site-packages (from beautifulsoup4>=4.8.2->ffq==0.2.1) (2.3.1)
Requirement already satisfied: idna<4,>=2.5 in /share/home/miniconda3/lib/python3.8/site-packages (from requests>=2.23.0->ffq==0.2.1) (3.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in /share/home/miniconda3/lib/python3.8/site-packages (from requests>=2.23.0->ffq==0.2.1) (2.0.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /share/home/miniconda3/lib/python3.8/site-packages (from requests>=2.23.0->ffq==0.2.1) (1.26.9)
Requirement already satisfied: certifi>=2017.4.17 in /share/home/miniconda3/lib/python3.8/site-packages (from requests>=2.23.0->ffq==0.2.1) (2022.6.15)

run command
ffq SRR9990627
Error info:

[2022-07-19 16:11:20,162]    INFO Parsing run SRR9990627
[2022-07-19 16:11:21,280]   ERROR Couldn't find a tree builder with the features you requested: xml. Do you need to install a parser library?
{}

False metadata fetched when using BioSample accession number

Hey,

Nice and useful tool! I am trying it to access metadata based on BioSample number (unfortunately the info I am interested in can only be found on BioSample entry, and not ERS, ERR...) and it returns completely different sample. When I use ERS and ERR numbers, it works normally!
Some of SAMEA numbers I am interested in: SAMEA3449213, SAMEA3449357, SAMEA3449368 ...
Maybe I am doing something wrong.. I have installed it with pip (as in README) and just use ffq SAMEA

Thanks!

No AWS links parsed (for run SRR8387856)

Hi everyone,

I am trying to use ffq to get file links to scRNA-seq datasets in GEO.
Currently I can see cases where AWS links are present in the description (and in efetch xml results), yet not parsed by ffq.
The smallest example would be run SRR8387856

$ ffq SRR8387856
[2022-07-27 11:16:59,893]    INFO Parsing run SRR8387856
{
    "SRR8387856": {
        "accession": "SRR8387856",
        "experiment": "SRX5197633",
        "study": "SRP175077",
        "sample": "SRS4203422",
        "title": "Illumina HiSeq 3000 paired end sequencing; GSM3535276: AXLN1; Homo sapiens; RNA-Seq",
        "attributes": {
            "loader": "fastq-load.py",
            "options": "--platform=Illumina --readTypes=TBT --read1PairFiles=AXLN1_S1_L004_R1_001.fastq --read2PairFiles=AXLN1_S1_L004_R2_001.fastq --read3PairFiles=AXLN1_S1_L004_I1_001.fastq",
            "ENA-SPOT-COUNT": 190394607,
            "ENA-BASE-COUNT": 14469990132,
            "ENA-FIRST-PUBLIC": "2019-08-09",
            "ENA-LAST-UPDATE": "2019-08-09"
        },
        "files": {
            "ftp": [
                {
                    "accession": "SRR8387856",
                    "filename": "SRR8387856.fastq.gz",
                    "filetype": "fastq",
                    "filesize": 10758070270,
                    "filenumber": 1,
                    "md5": "c149fcffaf5c1c17a4afcc75aaf3428a",
                    "urltype": "ftp",
                    "url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR838/006/SRR8387856/SRR8387856.fastq.gz"
                }
            ],
            "aws": [],
            "gcp": [],
            "ncbi": []
        }
    }
}
$ ffq SRR8387856 --aws
[2022-07-27 11:17:20,265]    INFO Parsing run SRR8387856
[]

The AWS link list is returned empty while these links exist in efetch results (below are shortened results for efetch -db sra -id SRR8387856 -format xml and AWS files are described on the trace.ncbi page as well https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&page_size=10&acc=SRR8387856&display=data-access (below is the screenshot for this)

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE EXPERIMENT_PACKAGE_SET>
<EXPERIMENT_PACKAGE_SET>
  <EXPERIMENT_PACKAGE>
    <EXPERIMENT alias="GSM3535276" accession="SRX5197633">
    ...
    <SAMPLE alias="GSM3535276" accession="SRS4203422">
    ...
    <RUN_SET>
      <RUN alias="GSM3535276_r1" accession="SRR8387856" total_spots="190394607" total_bases="20943406770" size="10330963197" load_done="true" published="2019-08-08 18:26:07" is_public="true" cluster_name="public" has_taxanalysis="1" static_data_available="1">
        ...
        <SRAFiles>
          <SRAFile cluster="public" filename="AXLN1_S1_L004_I1_001.fastq.gz" size="1318416196" date="2020-02-07 13:55:03" md5="c01103a6a0423bc81f635b8864fb1008" semantic_name="fastq" supertype="Original" sratoolkit="0">
            <Alternatives url="gs://sra-pub-src-3/SRR8387856/AXLN1_S1_L004_I1_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="GCP"/>
            <Alternatives url="s3://sra-pub-src-3/SRR8387856/AXLN1_S1_L004_I1_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="AWS"/>
          </SRAFile>
          <SRAFile cluster="public" filename="AXLN1_S1_L004_R1_001.fastq.gz" size="4125035593" date="2020-02-07 17:24:36" md5="dc361e73baa88147fd903047831b0094" semantic_name="fastq" supertype="Original" sratoolkit="0">
            <Alternatives url="gs://sra-pub-src-3/SRR8387856/AXLN1_S1_L004_R1_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="GCP"/>
            <Alternatives url="s3://sra-pub-src-3/SRR8387856/AXLN1_S1_L004_R1_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="AWS"/>
          </SRAFile>
          <SRAFile cluster="public" filename="AXLN1_S1_L004_R2_001.fastq.gz" size="11154984853" date="2020-02-07 17:53:41" md5="8022b2c985e6dcd9bbc2d81bb8ad2816" semantic_name="fastq" supertype="Original" sratoolkit="0">
            <Alternatives url="gs://sra-pub-src-3/SRR8387856/AXLN1_S1_L004_R2_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="GCP"/>
            <Alternatives url="s3://sra-pub-src-3/SRR8387856/AXLN1_S1_L004_R2_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="AWS"/>
          </SRAFile>
          <SRAFile cluster="public" filename="SRR8387856" url="https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR8387856/SRR8387856" size="10330965701" date="2019-01-04 01:34:15" md5="5a1e0d1b10ccb80c03ea1813826bf9a7" semantic_name="run" supertype="Primary ETL" sratoolkit="1">
            <Alternatives url="gs://sra-pub-crun-5/SRR8387856/SRR8387856.1" free_egress="-" access_type="Use Cloud Data Delivery" org="GCP"/>
            <Alternatives url="https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR8387856/SRR8387856" free_egress="worldwide" access_type="anonymous" org="AWS"/>
          </SRAFile>
        </SRAFiles>
        ...
      <RUN alias="GSM3535276_r2" accession="SRR8387857" total_spots="154905917" total_bases="17039650870" size="8538940323" load_done="true" published="2019-08-08 18:26:07" is_public="true" cluster_name="public" has_taxanalysis="1" static_data_available="1">
        ...
        <SRAFiles>
          <SRAFile cluster="public" filename="AXLN1_S1_L005_I1_001.fastq.gz" size="1127707492" date="2020-02-07 11:31:54" md5="8548d88e1a205bb1451db78e906df398" semantic_name="fastq" supertype="Original" sratoolkit="0">
            <Alternatives url="gs://sra-pub-src-3/SRR8387857/AXLN1_S1_L005_I1_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="GCP"/>
            <Alternatives url="s3://sra-pub-src-3/SRR8387857/AXLN1_S1_L005_I1_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="AWS"/>
          </SRAFile>
          <SRAFile cluster="public" filename="AXLN1_S1_L005_R1_001.fastq.gz" size="3480060093" date="2020-02-07 13:50:19" md5="003da75c574bdf3cfe2b22fa61a37b8a" semantic_name="fastq" supertype="Original" sratoolkit="0">
            <Alternatives url="gs://sra-pub-src-3/SRR8387857/AXLN1_S1_L005_R1_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="GCP"/>
            <Alternatives url="s3://sra-pub-src-3/SRR8387857/AXLN1_S1_L005_R1_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="AWS"/>
          </SRAFile>
          <SRAFile cluster="public" filename="AXLN1_S1_L005_R2_001.fastq.gz" size="9202131636" date="2020-02-07 13:50:41" md5="2acd6da9fb7d0e20b7b19b7befda7ecb" semantic_name="fastq" supertype="Original" sratoolkit="0">
            <Alternatives url="gs://sra-pub-src-3/SRR8387857/AXLN1_S1_L005_R2_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="GCP"/>
            <Alternatives url="s3://sra-pub-src-3/SRR8387857/AXLN1_S1_L005_R2_001.fastq.gz" free_egress="-" access_type="Use Cloud Data Delivery" org="AWS"/>
          </SRAFile>
          <SRAFile cluster="public" filename="SRR8387857" url="https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos3/sra-pub-run-19/SRR8387857/SRR8387857.1" size="8538942830" date="2019-01-03 23:33:46" md5="965a29f5c96f0a225a061d083a1efdcf" semantic_name="run" supertype="Primary ETL" sratoolkit="1">
            <Alternatives url="gs://sra-pub-crun-5/SRR8387857/SRR8387857.1" free_egress="-" access_type="Use Cloud Data Delivery" org="GCP"/>
            <Alternatives url="https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos3/sra-pub-run-19/SRR8387857/SRR8387857.1" free_egress="worldwide" access_type="anonymous" org="NCBI"/>
            <Alternatives url="https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR8387857/SRR8387857" free_egress="worldwide" access_type="anonymous" org="AWS"/>
          </SRAFile>
        </SRAFiles>
       ...
      </RUN>
    </RUN_SET>
  </EXPERIMENT_PACKAGE>
</EXPERIMENT_PACKAGE_SET>

FFQ is great btw, thanks for developing this cool small thing.

Cheers,
Kostya Zaitsev

No sample found for any GEO entries

Hi, I report that ffq was not able to fetch fastq URL for any GEO entries right now.

For example, I tried to download metadata for GSE66230 like this:

ffq --verbose GSE66230 -o ./

The log says like ERROR 404 Client Error: for url: https://www.ebi.ac.uk/ena/browser/api/xml/SRX886447/ and WARNING No sample found for any samples.
I tried other entries many times and got the same error each time.
It looks problems of ~~GEO~~ ENA side but just in case I report this here.

Thanks,

HTTPError' object has no attribute 'getcode' and HTTP 401

Im am running ffq 0.2.1 on Centos 7 (ID_LIKE="rhel fedora") using python 3.7.5
I am in a company network behind a proxy with authorization. The proxy IP, port, user and password are set for http, https, ft, sftp as environment variables. Also, this info is in my .wgetrc, .curlrc, .netrc.
I use bash as a shell.

A call for an existing SRA experiment fails:

ffq --verbose SRX5256050
[2022-06-30 08:19:03,561] DEBUG Printing verbose output
[2022-06-30 08:19:03,561] DEBUG Namespace(IDs=['SRX5256050'], aws=False, ftp=False, gcp=False, l=None, ncbi=False, o=None, split=False, t=None, verbose=True)
[2022-06-30 08:19:03,561] INFO Parsing Experiment SRX5256050
[2022-06-30 08:19:04,016] WARNING There is 1 run for SRX5256050
[2022-06-30 08:19:04,016] INFO Parsing run SRR8449072
[2022-06-30 08:19:04,418] ERROR 'HTTPError' object has no attribute 'getcode'
Traceback (most recent call last):
File "/usr/prg/python-3.7.5/lib/python3.7/site-packages/ffq/utils.py", line 40, in cached_get
response.raise_for_status()
File "/usr/prg/python-3.7.5/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRR8449072&result=read_run&fields=run_accession,fastq_ftp,fastq_md5,fastq_bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/prg/python-3.7.5/lib/python3.7/site-packages/ffq/main.py", line 212, in main
results.append(FFQ[v["prefix"]](v["accession"], args.l))
File "/usr/prg/python-3.7.5/lib/python3.7/site-packages/ffq/ffq.py", line 498, in ffq_experiment
experiment = parse_experiment_with_run(get_xml(accession), level)
File "/usr/prg/python-3.7.5/lib/python3.7/site-packages/ffq/ffq.py", line 276, in parse_experiment_with_run
runs = {run: ffq_run(run) for run in runs}
File "/usr/prg/python-3.7.5/lib/python3.7/site-packages/ffq/ffq.py", line 276, in
runs = {run: ffq_run(run) for run in runs}
File "/usr/prg/python-3.7.5/lib/python3.7/site-packages/ffq/ffq.py", line 355, in ffq_run
run = parse_run(get_xml(accession))
File "/usr/prg/python-3.7.5/lib/python3.7/site-packages/ffq/ffq.py", line 116, in parse_run
ftp_files = get_files_metadata_from_run(soup)
File "/usr/prg/python-3.7.5/lib/python3.7/site-packages/ffq/utils.py", line 892, in get_files_metadata_from_run
table = parse_tsv(cached_get(fastq_url))
File "/usr/prg/python-3.7.5/lib/python3.7/site-packages/ffq/utils.py", line 42, in cached_get
if exception.getcode() == 429:
AttributeError: 'HTTPError' object has no attribute 'getcode'
{}

but
curl "https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRR8449072&result=read_run&fields=run_accession,fastq_ftp,fastq_md5,fastq_bytes"
run_accession fastq_ftp fastq_md5 fastq_bytes
SRR8449072 ftp.sra.ebi.ac.uk/vol1/fastq/SRR844/002/SRR8449072/SRR8449072_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR844/002/SRR8449072/SRR8449072_2.fastq.gz 4402ac7d7929e14d1606b4c3647e4219;175d79964dd97f110ffaf10487c6e1d5 2391514918;2489429551

works as expected.

ERROR 'str' object has no attribute 'copy'

I ran the following command ffq --verbose --ftp 10.1111/jipb.13435, which should fetch the information of this paper, but I got the following error:

[2024-05-02 08:48:36,075]   DEBUG Printing verbose output
[2024-05-02 08:48:36,076]   DEBUG Namespace(IDs=['10.1111/jipb.13435'], o=None, t=None, l=None, ftp=True, aws=False, gcp=False, ncbi=False, split=False, verbose=True)
[2024-05-02 08:48:36,076] WARNING Searching by DOI may result in missing information.
[2024-05-02 08:48:36,076]    INFO Searching for DOI '10.1111/JIPB.13435'
[2024-05-02 08:48:36,461]    INFO Searching for Study SRP with title 'Single‐cell RNA‐seq of <i>Lotus japonicus</i> provide insights into identification and function of root cell types of legume'
[2024-05-02 08:48:36,680] WARNING No studies found with the given title. Searching Pubmed for DOI '10.1111/JIPB.13435'
[2024-05-02 08:48:37,307]    INFO Searching for GEO record linked to Pubmed ID '36537698'
[2024-05-02 08:48:37,807] WARNING No GEO records are linked to the Pubmed ID '36537698'. Searching for SRA record linked to this Pubmed ID.
[2024-05-02 08:48:39,822] WARNING Found 2 run accessions.
[2024-05-02 08:48:39,822]    INFO Parsing run SRR19350618
[2024-05-02 08:48:40,191]   DEBUG Encoding detection: ascii is most likely the one.
[2024-05-02 08:48:40,191]   DEBUG Encoding detection: ascii is most likely the one.
[2024-05-02 08:48:41,254]    INFO Parsing run SRR19411554
[2024-05-02 08:48:41,429]   DEBUG Encoding detection: ascii is most likely the one.
[2024-05-02 08:48:41,430]   DEBUG Encoding detection: ascii is most likely the one.
[2024-05-02 08:48:42,344]   ERROR 'str' object has no attribute 'copy'

If I subsequently run ffq --verbose --ftp SRR19411554 (the second of the runs printed in the above log) I do get some useful results:

[2024-05-02 08:54:29,910]   DEBUG Printing verbose output
[2024-05-02 08:54:29,910]   DEBUG Namespace(IDs=['SRR19411554'], o=None, t=None, l=None, ftp=True, aws=False, gcp=False, ncbi=False, split=False, verbose=True)
[2024-05-02 08:54:29,911]    INFO Parsing run SRR19411554
[2024-05-02 08:54:30,034]   DEBUG Encoding detection: ascii is most likely the one.
[2024-05-02 08:54:30,034]   DEBUG Encoding detection: ascii is most likely the one.
[
    {
        "accession": "SRR19411554",
        "filename": "SRR19411554_1.fastq.gz",
        "filetype": "fastq",
        "filesize": 33550492026,
        "filenumber": 1,
        "md5": "4105a4863679738623b0fbffb34b2926",
        "urltype": "ftp",
        "url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR194/054/SRR19411554/SRR19411554_1.fastq.gz"
    },
    {
        "accession": "SRR19411554",
        "filename": "SRR19411554_2.fastq.gz",
        "filetype": "fastq",
        "filesize": 38659517396,
        "filenumber": 2,
        "md5": "67d185bd850a5bc54702673289043fc2",
        "urltype": "ftp",
        "url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR194/054/SRR19411554/SRR19411554_2.fastq.gz"
    }
]

If I try the first run (ffq --verbose --ftp SRR19350618) I get an empty result:

[2024-05-02 08:58:23,150]   DEBUG Printing verbose output
[2024-05-02 08:58:23,150]   DEBUG Namespace(IDs=['SRR19350618'], o=None, t=None, l=None, ftp=True, aws=False, gcp=False, ncbi=False, split=False, verbose=True)
[2024-05-02 08:58:23,151]    INFO Parsing run SRR19350618
[2024-05-02 08:58:23,257]   DEBUG Encoding detection: ascii is most likely the one.
[2024-05-02 08:58:23,258]   DEBUG Encoding detection: ascii is most likely the one.
[]

Using ffq version 0.3.1

biosample error

I tried to install using the instructions below, but the "devel" part caused an error:
pip install git+https://github.com/pachterlab/ffq.git@devel

WARNING: Did not find branch or tag 'devel', assuming revision or ref.
Running command git checkout -q devel
error: pathspec 'devel' did not match any file(s) known to git
error: subprocess-exited-with-error

I went to your githup page and installed the current version using the command shown there:

pip install git+https://github.com/pachterlab/ffq

That worked, and one of the biosample numbers I was trying to get data from worked (SAMN16895766), but a couple of others that I tried still return errors, e.g.;

ffq SAMN12326775 > Ttru_biosample_SAMN12326775_data.txt
[2022-08-10 11:42:48,468] ERROR 404 Client Error: for url: https://www.ebi.ac.uk/ena/browser/api/xml/SAMN12326775/
[2022-08-10 11:42:48,468] ERROR Provided accession is invalid

This accession (https://www.ncbi.nlm.nih.gov/biosample?Db=biosample&DbFrom=bioproject&Cmd=Link&LinkName=bioproject_biosample&LinkReadableName=BioSample&ordinalpos=1&IdsFromResult=625792) is valid. Is this still an issue with not using the current development version of ffq, or a different issue?

How open are you to larger contributions?

Hi,

I mentioned in #22 that ffq could be a good fit for some of our work. I'm now at a point where it makes sense for me to look into this. I played a little with ffq and looked at the source code. I definitely see some possible optimizations in how requests are made (basically batching them rather than always making individual requests) and also regarding code organization.

There is already good traction for ffq so I was wondering how open you are to accepting some larger refactoring? It'd be good to know that someone is there to review and merge PRs before I start the work. Also having an understanding regarding your philosophy on additional dependencies etc. would be good. I'm open to have a call if that simplifies things.

ffq

ERROR object of type 'NoneType' has no len()

Hello and many thanks for this much-needed tool!

I am getting a recurrent error on a few accessions, e.g.:

ffq SRP301121

leads to:

[2022-07-15 22:20:29,563]    INFO Parsing Study SRP301121
[2022-07-15 22:20:29,924]    INFO Getting Sample for SRP301121
[2022-07-15 22:20:29,926]    WARNING No samples found for study. Modify code to search through ENA
[2022-07-15 22:20:29,926]    ERROR object of type 'NoneType' has no len()

Would you have any suggestions to solve this issue?

Many thanks,
Francesca

How to Download Files Requiring a Key with prefetch

Hello,

I am currently working on a project that requires downloading datasets from dbGaP, and I understand that this process requires an access key. Specifically, I am trying to use the prefetch command from the SRA Toolkit to download files that need a key for access, similar to the following command:

prefetch --ngc /prj_1.ngc  --cart /cart_1.krt

I have already obtained the necessary .ngc file for my project, but I am unsure about the correct steps to use it with ffq to access the files I need.

Could someone please provide guidance or point me towards documentation on how to properly use the .ngc file with ffq for downloading protected datasets?

Additionally, if there are best practices for managing and using these keys securely, I would appreciate learning about those as well.

Thank you for your time and help.

Best regards,
Zikai Lin
[email protected]

ffq not working anymore (ffq: error)

Hi,

thanks for developing ffq. It has been really helpful so far!

However, it doesn't seem to work anymore. The following command line used to output something and I'm now getting an error message:

ffq SRR6046717
[2024-03-01 22:19:28,054] INFO Parsing run SRR6046717
[2024-03-01 22:19:28,192] ERROR 404 Client Error: for url: https://www.ebi.ac.uk/ena/browser/api/xml/SRR6046717/
[2024-03-01 22:19:28,192] ERROR Provided accession is invalid
usage: ffq [-h] [-o OUT] [-l LEVEL] [--ftp] [--aws] [--gcp] [--ncbi] [--split] [--verbose] [--version] IDs [IDs ...]
ffq: error: For possible failure modes, please see https://github.com/pachterlab/ffq#failure-modes

Does it have to do with ncbi depreciating the SRA IDs?
Is there a work around?

Thanks in advance if you could help me.

Conda version out of date

Conda version is pretty out of date relative to PyPI version, some features did not work for me when I installed using conda.

Convert json output to a tsv

Hi guys - great work on the tool, it's impressively robust (for such a messy problem in particular!)

Is there a standard way to convert the json output to the flat tsv with select ID types? What I mean is either a table looking like this

GSE<tab>GSM<tab>SRS<tab>SRX<tab>SRR

(with potentially multiple lines per GSM etc), or

GSE<tab>GSM<tab>SRS<tab>SRX<tab>SRR1,SRR2,SRR3

etc. Thank you in advance!

Question on ffq [DOIS]

Greetings ffq developers,

Could anyone help explain a bit why my first attempt to fetch the accession information from one paper (https://doi.org/10.1016/j.cell.2023.01.034) fails, while my second attempt for another paper (https://doi.org/10.1093/nar/gkac908) is successful? The commands and corresponding outputs for 2 studies are shared here. In either situation, ffq 0.3.0 and Python 3.11.1 are used.

In each attempt, the similar warning messages pop up in my terminal: "WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...". For the first study, there is no output file in json format. However, for the second study, even I meet this sort of warning message of SRX8785942, the output file nar.json contains the information of SRX8785942, as you see below.

[2023-08-19 16:10:13,676] INFO Parsing Experiment SRX8785942
[2023-08-19 16:10:13,681] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:14,088] WARNING There are 0 runs for SRX8785942

" "experiments": {
"SRX8785942": {
"accession": "SRX8785942",
"title": "Illumina NovaSeq 6000 paired end sequencing; GSM4681765: Breast cancer cells - MCF7 CL; Homo sapiens; RNA-Seq",
"platform": "ILLUMINA",
"instrument": "Illumina NovaSeq 6000",
"runs": {}
}"

Look forward to learning from your insights. I really appreciate your help in advance.

ffq -o ./json_file/cell.json 10.1016/j.cell.2023.01.034 --verbose

Output:
[2023-08-19 15:40:47,871] DEBUG Printing verbose output
[2023-08-19 15:40:47,871] DEBUG Namespace(IDs=['10.1016/j.cell.2023.01.034'], o=None, t=None, l=None, ftp=False, aws=False, gcp=False, ncbi=False, split=False, verbose=True)
[2023-08-19 15:40:47,871] WARNING Searching by DOI may result in missing information.
[2023-08-19 15:40:47,871] INFO Searching for DOI '10.1016/J.CELL.2023.01.034'
[2023-08-19 15:40:47,978] INFO Searching for Study SRP with title 'Cell lineage-specific mitochondrial resilience during mammalian organogenesis'
[2023-08-19 15:40:48,510] INFO Found 2 studies that match this title: SRP370066, SRP400925
[2023-08-19 15:40:48,511] INFO Parsing Study SRP370066
[2023-08-19 15:40:48,936] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:48,937] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:48,939] INFO Getting Sample for SRP370066
[2023-08-19 15:40:53,327] WARNING There are 5 samples for SRP370066
[2023-08-19 15:40:53,327] INFO Parsing sample SRS12598636
[2023-08-19 15:40:53,741] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:53,742] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:53,745] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:40:54,462] INFO Getting Experiment for SRS12598636
[2023-08-19 15:40:54,462] INFO Parsing Experiment SRX14843696
[2023-08-19 15:40:54,882] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:54,883] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:54,886] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:40:55,573] WARNING There is 1 run for SRX14843696
[2023-08-19 15:40:55,574] INFO Parsing run SRR18743675
[2023-08-19 15:40:55,999] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:55,999] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:57,846] INFO Parsing sample SRS12598635
[2023-08-19 15:40:58,260] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:58,260] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:58,263] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:40:58,670] INFO Getting Experiment for SRS12598635
[2023-08-19 15:40:58,670] INFO Parsing Experiment SRX14843695
[2023-08-19 15:40:59,088] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:59,089] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:59,093] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:40:59,501] WARNING There is 1 run for SRX14843695
[2023-08-19 15:40:59,501] INFO Parsing run SRR18743676
[2023-08-19 15:40:59,903] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:40:59,903] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:02,002] INFO Parsing sample SRS12598634
[2023-08-19 15:41:02,413] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:02,413] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:02,416] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:02,952] INFO Getting Experiment for SRS12598634
[2023-08-19 15:41:02,952] INFO Parsing Experiment SRX14843694
[2023-08-19 15:41:03,361] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:03,362] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:03,365] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:03,780] WARNING There is 1 run for SRX14843694
[2023-08-19 15:41:03,780] INFO Parsing run SRR18743677
[2023-08-19 15:41:04,187] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:04,187] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:07,105] INFO Parsing sample SRS12598633
[2023-08-19 15:41:07,512] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:07,512] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:07,515] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:07,921] INFO Getting Experiment for SRS12598633
[2023-08-19 15:41:07,921] INFO Parsing Experiment SRX14843693
[2023-08-19 15:41:08,331] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:08,332] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:08,336] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:08,744] WARNING There is 1 run for SRX14843693
[2023-08-19 15:41:08,745] INFO Parsing run SRR18743678
[2023-08-19 15:41:09,147] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:09,147] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:11,117] INFO Parsing sample SRS12598637
[2023-08-19 15:41:11,532] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:11,532] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:11,535] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:12,129] INFO Getting Experiment for SRS12598637
[2023-08-19 15:41:12,129] INFO Parsing Experiment SRX14843697
[2023-08-19 15:41:12,544] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:12,545] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:12,548] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:12,956] WARNING There is 1 run for SRX14843697
[2023-08-19 15:41:12,956] INFO Parsing run SRR18743674
[2023-08-19 15:41:13,365] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:13,366] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:15,265] INFO Parsing Study SRP400925
[2023-08-19 15:41:15,683] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:15,684] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:15,685] INFO Getting Sample for SRP400925
[2023-08-19 15:41:30,185] WARNING There are 20 samples for SRP400925
[2023-08-19 15:41:30,185] INFO Parsing sample SRS15319024
[2023-08-19 15:41:30,626] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:30,626] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:30,631] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:31,051] INFO Getting Experiment for SRS15319024
[2023-08-19 15:41:31,052] INFO Parsing Experiment SRX17793201
[2023-08-19 15:41:31,487] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:31,487] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:31,492] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:32,709] WARNING There are 2 runs for SRX17793201
[2023-08-19 15:41:32,709] INFO Parsing run SRR21799208
[2023-08-19 15:41:33,141] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:33,142] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:35,187] INFO Parsing run SRR21799209
[2023-08-19 15:41:35,614] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:35,615] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:37,563] INFO Parsing sample SRS15319023
[2023-08-19 15:41:37,991] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:37,992] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:37,997] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:38,414] INFO Getting Experiment for SRS15319023
[2023-08-19 15:41:38,415] INFO Parsing Experiment SRX17793200
[2023-08-19 15:41:38,839] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:38,840] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:38,847] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:39,272] WARNING There are 2 runs for SRX17793200
[2023-08-19 15:41:39,272] INFO Parsing run SRR21799210
[2023-08-19 15:41:39,693] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:39,694] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:41,682] INFO Parsing run SRR21799211
[2023-08-19 15:41:42,102] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:42,102] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:44,049] INFO Parsing sample SRS15319023
[2023-08-19 15:41:44,055] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:44,055] INFO Getting Experiment for SRS15319023
[2023-08-19 15:41:44,055] INFO Parsing Experiment SRX17793200
[2023-08-19 15:41:44,059] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:44,060] WARNING There are 2 runs for SRX17793200
[2023-08-19 15:41:44,060] INFO Parsing run SRR21799210
[2023-08-19 15:41:44,317] INFO Parsing run SRR21799211
[2023-08-19 15:41:44,591] INFO Parsing sample SRS15319020
[2023-08-19 15:41:45,020] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:45,021] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:45,025] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:45,438] INFO Getting Experiment for SRS15319020
[2023-08-19 15:41:45,438] INFO Parsing Experiment SRX17793197
[2023-08-19 15:41:45,871] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:45,871] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:45,878] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:46,292] WARNING There are 2 runs for SRX17793197
[2023-08-19 15:41:46,292] INFO Parsing run SRR21799216
[2023-08-19 15:41:46,714] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:46,715] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:48,669] INFO Parsing run SRR21799217
[2023-08-19 15:41:49,102] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:49,103] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:51,084] INFO Parsing sample SRS15319017
[2023-08-19 15:41:51,514] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:51,515] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:51,521] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:51,944] INFO Getting Experiment for SRS15319017
[2023-08-19 15:41:51,944] INFO Parsing Experiment SRX17793194
[2023-08-19 15:41:52,365] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:52,366] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:52,371] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:52,791] WARNING There are 2 runs for SRX17793194
[2023-08-19 15:41:52,791] INFO Parsing run SRR21799222
[2023-08-19 15:41:53,222] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:53,223] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:55,341] INFO Parsing run SRR21799223
[2023-08-19 15:41:55,761] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:55,761] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:57,729] INFO Parsing sample SRS15319017
[2023-08-19 15:41:57,735] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:57,735] INFO Getting Experiment for SRS15319017
[2023-08-19 15:41:57,735] INFO Parsing Experiment SRX17793194
[2023-08-19 15:41:57,740] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:57,740] WARNING There are 2 runs for SRX17793194
[2023-08-19 15:41:57,740] INFO Parsing run SRR21799222
[2023-08-19 15:41:58,001] INFO Parsing run SRR21799223
[2023-08-19 15:41:58,289] INFO Parsing sample SRS15319016
[2023-08-19 15:41:58,714] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:58,715] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:58,725] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:59,142] INFO Getting Experiment for SRS15319016
[2023-08-19 15:41:59,142] INFO Parsing Experiment SRX17793193
[2023-08-19 15:41:59,570] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:59,570] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:41:59,581] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:41:59,996] WARNING There are 2 runs for SRX17793193
[2023-08-19 15:41:59,996] INFO Parsing run SRR21799225
[2023-08-19 15:42:00,427] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:00,427] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:02,381] INFO Parsing run SRR21799224
[2023-08-19 15:42:02,808] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:02,809] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:04,780] INFO Parsing sample SRS15319022
[2023-08-19 15:42:05,203] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:05,203] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:05,208] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:05,746] INFO Getting Experiment for SRS15319022
[2023-08-19 15:42:05,746] INFO Parsing Experiment SRX17793199
[2023-08-19 15:42:06,180] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:06,181] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:06,188] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:06,610] WARNING There are 2 runs for SRX17793199
[2023-08-19 15:42:06,611] INFO Parsing run SRR21799212
[2023-08-19 15:42:07,031] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:07,032] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:08,995] INFO Parsing run SRR21799213
[2023-08-19 15:42:09,423] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:09,425] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:11,391] INFO Parsing sample SRS15319021
[2023-08-19 15:42:11,822] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:11,823] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:11,828] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:12,243] INFO Getting Experiment for SRS15319021
[2023-08-19 15:42:12,243] INFO Parsing Experiment SRX17793198
[2023-08-19 15:42:12,667] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:12,669] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:12,674] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:13,114] WARNING There are 2 runs for SRX17793198
[2023-08-19 15:42:13,114] INFO Parsing run SRR21799215
[2023-08-19 15:42:13,533] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:13,534] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:15,519] INFO Parsing run SRR21799214
[2023-08-19 15:42:15,939] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:15,940] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:17,899] INFO Parsing sample SRS15319020
[2023-08-19 15:42:17,906] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:17,906] INFO Getting Experiment for SRS15319020
[2023-08-19 15:42:17,907] INFO Parsing Experiment SRX17793197
[2023-08-19 15:42:17,912] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:17,912] WARNING There are 2 runs for SRX17793197
[2023-08-19 15:42:17,912] INFO Parsing run SRR21799216
[2023-08-19 15:42:18,174] INFO Parsing run SRR21799217
[2023-08-19 15:42:18,437] INFO Parsing sample SRS15319018
[2023-08-19 15:42:18,847] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:18,848] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:18,852] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:19,265] INFO Getting Experiment for SRS15319018
[2023-08-19 15:42:19,265] INFO Parsing Experiment SRX17793195
[2023-08-19 15:42:19,688] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:19,688] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:19,692] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:20,100] WARNING There are 2 runs for SRX17793195
[2023-08-19 15:42:20,101] INFO Parsing run SRR21799220
[2023-08-19 15:42:20,508] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:20,508] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:22,444] INFO Parsing run SRR21799221
[2023-08-19 15:42:22,851] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:22,852] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:24,727] INFO Parsing sample SRS15319015
[2023-08-19 15:42:25,138] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:25,138] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:25,141] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:25,554] INFO Getting Experiment for SRS15319015
[2023-08-19 15:42:25,554] INFO Parsing Experiment SRX17793192
[2023-08-19 15:42:25,965] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:25,965] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:25,969] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:26,376] WARNING There are 2 runs for SRX17793192
[2023-08-19 15:42:26,376] INFO Parsing run SRR21799226
[2023-08-19 15:42:26,781] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:26,782] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:28,665] INFO Parsing run SRR21799227
[2023-08-19 15:42:29,076] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:29,076] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 15:42:31,326] INFO Parsing sample SRS15319015
[2023-08-19 15:42:31,327] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:31,327] INFO Getting Experiment for SRS15319015
[2023-08-19 15:42:31,327] INFO Parsing Experiment SRX17793192
[2023-08-19 15:42:31,329] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:31,329] WARNING There are 2 runs for SRX17793192
[2023-08-19 15:42:31,329] INFO Parsing run SRR21799226
[2023-08-19 15:42:31,569] INFO Parsing run SRR21799227
[2023-08-19 15:42:31,832] INFO Parsing sample SRS15319024
[2023-08-19 15:42:31,835] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:31,835] INFO Getting Experiment for SRS15319024
[2023-08-19 15:42:31,835] INFO Parsing Experiment SRX17793201
[2023-08-19 15:42:31,837] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 15:42:31,837] WARNING There are 2 runs for SRX17793201
[2023-08-19 15:42:31,838] INFO Parsing run SRR21799208
[2023-08-19 15:42:32,090] INFO Parsing run SRR21799209
[2023-08-19 15:42:32,161] ERROR 429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=sra&id=SRR21799209&retmode=xml
[2023-08-19 15:42:32,161] ERROR Provided accession is invalid
usage: ffq [-h] [-o OUT] [-l LEVEL] [--ftp] [--aws] [--gcp] [--ncbi] [--split] [--verbose] [--version] IDs [IDs ...]
ffq: error: For possible failure modes, please see https://github.com/pachterlab/ffq#failure-modes

ffq -o ./json_file/nar.json 10.1093/nar/gkac908 --verbose

Output:
[2023-08-19 16:09:47,143] DEBUG Printing verbose output
[2023-08-19 16:09:47,143] DEBUG Namespace(IDs=['10.1093/nar/gkac908'], o='./json_file/nar.json', t=None, l=None, ftp=False, aws=False, gcp=False, ncbi=False, split=False, verbose=True)
[2023-08-19 16:09:47,143] WARNING Searching by DOI may result in missing information.
[2023-08-19 16:09:47,143] INFO Searching for DOI '10.1093/NAR/GKAC908'
[2023-08-19 16:09:47,245] INFO Searching for Study SRP with title 'Estrogen regulates divergent transcriptional and epigenetic cell states in breast cancer'
[2023-08-19 16:09:48,612] WARNING No studies found with the given title. Searching Pubmed for DOI '10.1093/NAR/GKAC908'
[2023-08-19 16:09:48,880] INFO Searching for GEO record linked to Pubmed ID '36318267'
[2023-08-19 16:09:49,226] INFO Found 1 GEO Accessions: GSE154873
[2023-08-19 16:09:50,232] INFO Parsing GEO GSE154873
[2023-08-19 16:09:50,413] INFO Finding supplementary files for GEO GSE154873
[2023-08-19 16:09:52,959] WARNING There are 14 samples for GSE154873
[2023-08-19 16:09:52,959] INFO Parsing GSM GSM4681761
[2023-08-19 16:09:53,417] INFO Finding supplementary files for GSM GSM4681761
[2023-08-19 16:09:55,289] INFO Getting sample for GSM4681761
[2023-08-19 16:09:55,898] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:09:55,898] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:09:55,901] INFO Parsing sample SRS7054554
[2023-08-19 16:09:56,315] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:09:56,316] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:09:56,319] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:09:56,723] WARNING No experiment found
[2023-08-19 16:09:56,723] INFO Getting Experiment for SRS7054554
[2023-08-19 16:09:57,192] INFO Parsing Experiment SRX8785938
[2023-08-19 16:09:57,196] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:09:57,601] WARNING There are 0 runs for SRX8785938
[2023-08-19 16:09:57,601] INFO Parsing GSM GSM4681762
[2023-08-19 16:09:57,815] INFO Finding supplementary files for GSM GSM4681762
[2023-08-19 16:09:59,995] INFO Getting sample for GSM4681762
[2023-08-19 16:10:00,555] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:00,555] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:00,557] INFO Parsing sample SRS7054555
[2023-08-19 16:10:00,973] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:00,973] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:00,976] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:01,381] WARNING No experiment found
[2023-08-19 16:10:01,382] INFO Getting Experiment for SRS7054555
[2023-08-19 16:10:01,780] INFO Parsing Experiment SRX8785939
[2023-08-19 16:10:01,784] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:02,190] WARNING There are 0 runs for SRX8785939
[2023-08-19 16:10:02,190] INFO Parsing GSM GSM4681763
[2023-08-19 16:10:02,424] INFO Finding supplementary files for GSM GSM4681763
[2023-08-19 16:10:03,940] INFO Getting sample for GSM4681763
[2023-08-19 16:10:04,503] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:04,504] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:04,506] INFO Parsing sample SRS7054556
[2023-08-19 16:10:04,930] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:04,930] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:04,933] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:05,344] WARNING No experiment found
[2023-08-19 16:10:05,344] INFO Getting Experiment for SRS7054556
[2023-08-19 16:10:05,742] INFO Parsing Experiment SRX8785940
[2023-08-19 16:10:05,746] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:06,154] WARNING There are 0 runs for SRX8785940
[2023-08-19 16:10:06,155] INFO Parsing GSM GSM4681764
[2023-08-19 16:10:06,386] INFO Finding supplementary files for GSM GSM4681764
[2023-08-19 16:10:07,928] INFO Getting sample for GSM4681764
[2023-08-19 16:10:08,482] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:08,483] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:08,485] INFO Parsing sample SRS7054557
[2023-08-19 16:10:08,898] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:08,898] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:08,900] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:09,303] WARNING No experiment found
[2023-08-19 16:10:09,303] INFO Getting Experiment for SRS7054557
[2023-08-19 16:10:09,795] INFO Parsing Experiment SRX8785941
[2023-08-19 16:10:09,799] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:10,211] WARNING There are 0 runs for SRX8785941
[2023-08-19 16:10:10,211] INFO Parsing GSM GSM4681765
[2023-08-19 16:10:10,439] INFO Finding supplementary files for GSM GSM4681765
[2023-08-19 16:10:11,943] INFO Getting sample for GSM4681765
[2023-08-19 16:10:12,496] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:12,496] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:12,499] INFO Parsing sample SRS7054558
[2023-08-19 16:10:12,907] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:12,908] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:12,911] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:13,314] WARNING No experiment found
[2023-08-19 16:10:13,314] INFO Getting Experiment for SRS7054558
[2023-08-19 16:10:13,676] INFO Parsing Experiment SRX8785942
[2023-08-19 16:10:13,681] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:14,088] WARNING There are 0 runs for SRX8785942
[2023-08-19 16:10:14,088] INFO Parsing GSM GSM6418438
[2023-08-19 16:10:14,288] INFO Finding supplementary files for GSM GSM6418438
[2023-08-19 16:10:15,626] INFO No supplementary files found for GSM6418438
[2023-08-19 16:10:15,999] INFO Getting sample for GSM6418438
[2023-08-19 16:10:16,633] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:16,634] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:16,636] INFO Parsing sample SRS14360572
[2023-08-19 16:10:17,039] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:17,040] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:17,043] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:17,446] INFO Getting Experiment for SRS14360572
[2023-08-19 16:10:17,446] INFO Parsing Experiment SRX16732252
[2023-08-19 16:10:17,449] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:17,855] WARNING There are 2 runs for SRX16732252
[2023-08-19 16:10:17,855] INFO Parsing run SRR20711356
[2023-08-19 16:10:18,260] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:18,260] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:20,138] INFO Parsing run SRR20711355
[2023-08-19 16:10:20,544] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:20,545] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:22,428] INFO Parsing GSM GSM6418439
[2023-08-19 16:10:23,090] INFO Finding supplementary files for GSM GSM6418439
[2023-08-19 16:10:24,182] INFO No supplementary files found for GSM6418439
[2023-08-19 16:10:24,692] INFO Getting sample for GSM6418439
[2023-08-19 16:10:25,245] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:25,245] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:25,247] INFO Parsing sample SRS14360573
[2023-08-19 16:10:25,652] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:25,653] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:25,656] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:26,062] INFO Getting Experiment for SRS14360573
[2023-08-19 16:10:26,062] INFO Parsing Experiment SRX16732253
[2023-08-19 16:10:26,066] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:26,491] WARNING There are 4 runs for SRX16732253
[2023-08-19 16:10:26,491] INFO Parsing run SRR20711357
[2023-08-19 16:10:26,902] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:26,902] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:28,868] INFO Parsing run SRR20711359
[2023-08-19 16:10:29,283] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:29,283] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:31,229] INFO Parsing run SRR20711358
[2023-08-19 16:10:31,641] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:31,642] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:33,682] INFO Parsing run SRR20711360
[2023-08-19 16:10:34,094] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:34,095] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:36,038] INFO Parsing GSM GSM6418440
[2023-08-19 16:10:36,903] INFO Finding supplementary files for GSM GSM6418440
[2023-08-19 16:10:38,072] INFO No supplementary files found for GSM6418440
[2023-08-19 16:10:38,444] INFO Getting sample for GSM6418440
[2023-08-19 16:10:39,518] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:39,519] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:39,520] INFO Parsing sample SRS14360574
[2023-08-19 16:10:39,937] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:39,937] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:39,940] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:40,344] INFO Getting Experiment for SRS14360574
[2023-08-19 16:10:40,344] INFO Parsing Experiment SRX16732254
[2023-08-19 16:10:40,349] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:40,762] WARNING There is 1 run for SRX16732254
[2023-08-19 16:10:40,762] INFO Parsing run SRR20711361
[2023-08-19 16:10:41,183] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:41,183] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:43,138] INFO Parsing GSM GSM6418441
[2023-08-19 16:10:43,890] INFO Finding supplementary files for GSM GSM6418441
[2023-08-19 16:10:44,994] INFO No supplementary files found for GSM6418441
[2023-08-19 16:10:45,413] INFO Getting sample for GSM6418441
[2023-08-19 16:10:46,000] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:46,001] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:46,003] INFO Parsing sample SRS14360575
[2023-08-19 16:10:46,417] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:46,418] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:46,420] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:46,832] INFO Getting Experiment for SRS14360575
[2023-08-19 16:10:46,832] INFO Parsing Experiment SRX16732255
[2023-08-19 16:10:46,836] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:47,241] WARNING There is 1 run for SRX16732255
[2023-08-19 16:10:47,241] INFO Parsing run SRR20711362
[2023-08-19 16:10:47,647] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:47,648] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:49,547] INFO Parsing GSM GSM6418442
[2023-08-19 16:10:49,766] INFO Finding supplementary files for GSM GSM6418442
[2023-08-19 16:10:50,855] INFO No supplementary files found for GSM6418442
[2023-08-19 16:10:51,245] INFO Getting sample for GSM6418442
[2023-08-19 16:10:51,819] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:51,819] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:51,821] INFO Parsing sample SRS14360576
[2023-08-19 16:10:52,236] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:52,236] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:52,238] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:52,650] INFO Getting Experiment for SRS14360576
[2023-08-19 16:10:52,650] INFO Parsing Experiment SRX16732256
[2023-08-19 16:10:52,653] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:53,061] WARNING There is 1 run for SRX16732256
[2023-08-19 16:10:53,061] INFO Parsing run SRR20711363
[2023-08-19 16:10:53,483] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:53,484] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:55,436] INFO Parsing GSM GSM6418443
[2023-08-19 16:10:55,646] INFO Finding supplementary files for GSM GSM6418443
[2023-08-19 16:10:56,767] INFO No supplementary files found for GSM6418443
[2023-08-19 16:10:57,188] INFO Getting sample for GSM6418443
[2023-08-19 16:10:58,203] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:58,204] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:58,206] INFO Parsing sample SRS14360577
[2023-08-19 16:10:58,617] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:58,617] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:58,620] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:59,039] INFO Getting Experiment for SRS14360577
[2023-08-19 16:10:59,039] INFO Parsing Experiment SRX16732257
[2023-08-19 16:10:59,043] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:10:59,458] WARNING There are 2 runs for SRX16732257
[2023-08-19 16:10:59,458] INFO Parsing run SRR20711365
[2023-08-19 16:10:59,883] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:10:59,883] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:01,814] INFO Parsing run SRR20711364
[2023-08-19 16:11:02,232] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:02,233] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:04,124] INFO Parsing GSM GSM6418444
[2023-08-19 16:11:04,337] INFO Finding supplementary files for GSM GSM6418444
[2023-08-19 16:11:05,427] INFO No supplementary files found for GSM6418444
[2023-08-19 16:11:06,217] INFO Getting sample for GSM6418444
[2023-08-19 16:11:07,075] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:07,076] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:07,078] INFO Parsing sample SRS14360578
[2023-08-19 16:11:07,485] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:07,486] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:07,488] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:11:07,891] INFO Getting Experiment for SRS14360578
[2023-08-19 16:11:07,892] INFO Parsing Experiment SRX16732258
[2023-08-19 16:11:07,897] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:11:08,300] WARNING There are 2 runs for SRX16732258
[2023-08-19 16:11:08,301] INFO Parsing run SRR20711366
[2023-08-19 16:11:08,712] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:08,713] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:10,618] INFO Parsing run SRR20711367
[2023-08-19 16:11:11,032] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:11,033] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:12,943] INFO Parsing GSM GSM6418445
[2023-08-19 16:11:13,141] INFO Finding supplementary files for GSM GSM6418445
[2023-08-19 16:11:14,232] INFO No supplementary files found for GSM6418445
[2023-08-19 16:11:15,152] INFO Getting sample for GSM6418445
[2023-08-19 16:11:15,699] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:15,700] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:15,702] INFO Parsing sample SRS14360579
[2023-08-19 16:11:16,115] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:16,115] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:16,118] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:11:16,523] INFO Getting Experiment for SRS14360579
[2023-08-19 16:11:16,524] INFO Parsing Experiment SRX16732259
[2023-08-19 16:11:16,526] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:11:16,939] WARNING There are 8 runs for SRX16732259
[2023-08-19 16:11:16,940] INFO Parsing run SRR20711374
[2023-08-19 16:11:17,359] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:17,359] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:19,432] INFO Parsing run SRR20711375
[2023-08-19 16:11:19,848] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:19,849] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:21,902] INFO Parsing run SRR20711368
[2023-08-19 16:11:22,312] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:22,312] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:24,435] INFO Parsing run SRR20711370
[2023-08-19 16:11:24,849] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:24,849] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:26,932] INFO Parsing run SRR20711371
[2023-08-19 16:11:27,358] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:27,359] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:29,574] INFO Parsing run SRR20711373
[2023-08-19 16:11:30,001] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:30,002] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:32,149] INFO Parsing run SRR20711369
[2023-08-19 16:11:32,581] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:32,581] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:34,850] INFO Parsing run SRR20711372
[2023-08-19 16:11:35,281] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:35,282] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:37,507] INFO Parsing GSM GSM6418446
[2023-08-19 16:11:37,730] INFO Finding supplementary files for GSM GSM6418446
[2023-08-19 16:11:39,076] INFO No supplementary files found for GSM6418446
[2023-08-19 16:11:39,536] INFO Getting sample for GSM6418446
[2023-08-19 16:11:40,159] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:40,162] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:40,165] INFO Parsing sample SRS14360580
[2023-08-19 16:11:40,597] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:40,597] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:40,601] WARNING Failed to parse sample information from ENA XML. Falling back to ENA search...
[2023-08-19 16:11:41,012] INFO Getting Experiment for SRS14360580
[2023-08-19 16:11:41,012] INFO Parsing Experiment SRX16732260
[2023-08-19 16:11:41,020] WARNING Failed to parse experiment information from ENA XML. Falling back to ENA search...
[2023-08-19 16:11:41,469] WARNING There are 8 runs for SRX16732260
[2023-08-19 16:11:41,469] INFO Parsing run SRR20711376
[2023-08-19 16:11:41,896] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:41,897] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:44,034] INFO Parsing run SRR20711377
[2023-08-19 16:11:44,457] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:44,458] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:46,581] INFO Parsing run SRR20711379
[2023-08-19 16:11:47,008] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:47,009] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:49,133] INFO Parsing run SRR20711380
[2023-08-19 16:11:49,559] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:49,560] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:51,665] INFO Parsing run SRR20711381
[2023-08-19 16:11:52,098] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:52,099] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:54,265] INFO Parsing run SRR20711382
[2023-08-19 16:11:54,701] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:54,702] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:56,834] INFO Parsing run SRR20711378
[2023-08-19 16:11:57,260] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:57,260] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:59,410] INFO Parsing run SRR20711383
[2023-08-19 16:11:59,832] DEBUG Encoding detection: utf_8 is most likely the one.
[2023-08-19 16:11:59,833] DEBUG Encoding detection: utf_8 is most likely the one.

Incomplete metadata obtained

Hi there, I think that this tool is great and really useful for the community. However, it does sometimes does not return the complete information or incorrect information.

I think one example is this

ffq SRS5634098
which returns

[2022-05-26 16:47:06,659] INFO Parsing sample SRS5634098
[2022-05-26 16:47:07,699] WARNING No experiment found
[2022-05-26 16:47:07,699] INFO Getting Experiment for SRS5634098
[2022-05-26 16:47:07,699] WARNING No Experiment found for SRS5634098
{
"SRS5634098": {
"accession": "SRS5634098",
"title": "GM12878 3T3 species mixing mouse (RNA-Seq)",
"organism": "Mus musculus",
"attributes": {
"source_name": "GM12878 3T3 species mixing",
"strain": "none",
"ENA-FIRST-PUBLIC": "2020-10-24",
"ENA-LAST-UPDATE": "2020-10-24"
},
"experiments": ""

However, from the ncbi link we can see that there are experiments that come with it.
https://www.ncbi.nlm.nih.gov/sra?term=SRX7124455

Edit: Seems that https://www.ebi.ac.uk/ena/browser/api/xml/SRS5634098 is different from ncbi resulting in this

Incomplete output from DOI

Hi there,

Running ffq -t DOI 10.1038/s41467-020-19648-2 -o test.json, ffq parses 7 runs but only writes the last one - SRR12623522. Is there a way around this?

This is happening on ffq 0.0.1.

Thanks!

ffq [DOI]

Hi ffq team,

I'm trying to obtain FTP links (through GEO?) for many DOIs and I have noticed that for some DOIs, SRP accessions are found but FTP links are not returned. When I use the requests.get() notation from ffq.ffq.ffq_doi() on these SRPs, I find they do not contain linksets.

for example

`
origin = 'sra'
destination = 'gds'
id = 'SRP281827'

response = requests.get(
'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi',
params={
"dbfrom": origin,
"db": destination,
"id": id,
"retmode": "json",
},
)
`

but on the SRA, this accession has a GEO dataset
10.1016/j.celrep.2021.10881 -> https://www.ncbi.nlm.nih.gov/sra?term=SRP281827 -> https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157694

this is true for many SRP accessions found in these DOIs -
10.1016/j.cell.2018.08.067 -> https://www.ncbi.nlm.nih.gov/sra?term=SRP100795
10.1016/j.cell.2019.08.008 -> https://www.ncbi.nlm.nih.gov/sra?term=SRP216403

is there a better way to return GEO links? If I can help, let me know

Private GSEs

Can ffq report when a GSE is private?

ERROR 429 Client Error: Too Many Requests for url

I'm seeing HTTP 429 response on a GEO dataset (GSE205117):

% ffq GSE205117         
[2022-06-17 17:51:29,411]    INFO Parsing GEO GSE205117
[2022-06-17 17:51:29,573]    INFO Finding supplementary files for GEO GSE205117
[2022-06-17 17:51:31,967] WARNING There are 22 samples for GSE205117
[2022-06-17 17:51:31,968]    INFO Parsing GSM GSM6205416
[2022-06-17 17:51:32,188]    INFO Finding supplementary files for GSM GSM6205416
[2022-06-17 17:51:33,852]    INFO Getting sample for GSM6205416
[2022-06-17 17:51:34,956] WARNING No sample found
[2022-06-17 17:51:34,956]    INFO Parsing GSM GSM6205417
[2022-06-17 17:51:35,239]    INFO Finding supplementary files for GSM GSM6205417
[2022-06-17 17:51:37,184]    INFO Getting sample for GSM6205417
[2022-06-17 17:51:38,091] WARNING No sample found
[2022-06-17 17:51:38,091]    INFO Parsing GSM GSM6205418
[2022-06-17 17:51:38,300]    INFO Finding supplementary files for GSM GSM6205418
[2022-06-17 17:51:39,943]    INFO Getting sample for GSM6205418
[2022-06-17 17:51:40,669] WARNING No sample found
[2022-06-17 17:51:40,669]    INFO Parsing GSM GSM6205419
[2022-06-17 17:51:40,882]    INFO Finding supplementary files for GSM GSM6205419
[2022-06-17 17:51:42,661]    INFO Getting sample for GSM6205419
[2022-06-17 17:51:43,263] WARNING No sample found
[2022-06-17 17:51:43,263]    INFO Parsing GSM GSM6205420
[2022-06-17 17:51:43,347]   ERROR 429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gds&term=GSM6205420&retmode=json&retmax=100000
{}

Similar result on two systems, though point of rejection was variable.

System Info

System 1
macOS 10.6.7
Python 3.8.12
ffq v0.2.1

System 2
CentOS 7.9.2009
Python 3.9.10
ffq v0.2.1

Fetching metadata from NCBI

Hi!
Thank you for your hard work put into creating this tool. It's great! I've tried to use it to retrieve some metadata information from NCBI for this accession number GSE130033. My goal is to retrieve each sample's characteristics such as sample type, sex and age.
I tried to run
ffq GSE130033
and it failed with a following error
CalledProcessError: Command 'b'ffq GSE130033\n'' returned non-zero exit status 2.
Then I tried to run
ffq -l 2 GSE130033
and I managed to get some information, but there is no metadata I need. If I try set a number greater than 2 it fails with an error as well. My question is whether my output is correct or I am missing something. Thank you!

id stored in variable

Is it possible to pass to the ffq command an id that is stored inside a variable?

e.g.

srr_id = 'SRR62547667'
ffq --ftp srr_id

I keep getting a syntax error.

Thank you in advance!

suggestion: add --api-key KEY

To allow users with an NCBI Api Key to bypass the

  [2022-07-14 20:02:11,572]   ERROR 429 Client Error: Too Many Requests for url: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=sra&id=SRR8907352&rettype=fasta&retmode=xml

error

ERROR 'NoneType' object has no attribute 'text'

ffq -t GSE GSE164829 leads to the following error message:

[2021-05-29 13:17:56,541] INFO Parsing GEO GSE164829
[2021-05-29 13:17:57,054] INFO Getting Study SRP for GSE164829
[2021-05-29 13:17:58,548] INFO Parsing Study SRP SRP301759
[2021-05-29 13:17:59,420] ERROR 'NoneType' object has no attribute 'text'

Improve tests

Tests should be added for

Encode fetching
link fetching
--split with various accessions
DOI parsing
DOI fetching
ENA search

Add --version flag

Hi. Thank you for ffq.

It would be good if it had a --version flag:

From the argparse docs you could do something like this:

from ffq import __version__ as version
...
    parser.add_argument('--version', action='version', version=f'%(prog)s {version}')

`ffq` does not work for some low-number SRRs

I'm not sure if this is expected behavior, but ffq does not work for some low-number SRRs. For instance, ffq SRR038165 gives

[2021-07-23 06:33:59,377]    INFO Parsing run SRR038165
[2021-07-23 06:34:02,904]   ERROR 400 Client Error:  for url: https://www.ebi.ac.uk/ena/browser/api/xml/SRS011879,SRS024956-SRS025194/

despite the fact that SRR038165 is a valid SRR: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR038165

sample search gets NoneType' object has no attribute 'text'

ffq 0.3.1. Python 3.12.4

% ffq --verbose SAMN27382599
[2024-07-28 08:33:34,190]   DEBUG Printing verbose output
[2024-07-28 08:33:34,190]   DEBUG Namespace(IDs=['SAMN27382599'], o=None, t=None, l=None, ftp=False, aws=False, gcp=False, ncbi=False, split=False, verbose=True)
[2024-07-28 08:33:35,151]   DEBUG Encoding detection: utf_8 is most likely the one.
[2024-07-28 08:33:35,152]   DEBUG Encoding detection: utf_8 is most likely the one.
[2024-07-28 08:33:35,154]    INFO Parsing sample SAMN27382599
[2024-07-28 08:33:35,155]   ERROR 'NoneType' object has no attribute 'text'
usage: ffq [-h] [-o OUT] [-l LEVEL] [--ftp] [--aws] [--gcp] [--ncbi] [--split] [--verbose]
           [--version]
           IDs [IDs ...]
ffq: error: For possible failure modes, please see https://github.com/pachterlab/ffq#failure-modes

ffq crashes when run has no data

Example

ffq -t GSE GSE75790

Should handle

INFO Parsing run SRR3729917
ERROR 'NoneType' object has no attribute 'text'

Gracefully and should continue parsing the rest of the runs instead of stopping.

[Feature] project level support

Dear ffq developer,

does ffq support project level search ?
for example:

ffq SRP315963

I got:

➤ ffq SRP315963
[2022-07-06 23:45:31,532]    INFO Parsing Study SRP315963
[2022-07-06 23:45:33,077]    INFO Getting Sample for SRP315963
[2022-07-06 23:45:33,079] WARNING No samples found for study. Modify code to search through ENA
[2022-07-06 23:45:33,079]   ERROR object of type 'NoneType' has no len()
{}

Network unreachable behind proxy

Trying to run an example given in the README I get an error:

$ ffq GSE129845
[2022-06-22 10:45:15,161]    INFO Parsing GEO GSE129845
[2022-06-22 10:45:15,908]    INFO Finding supplementary files for GEO GSE129845
[2022-06-22 10:49:32,108]   ERROR [Errno 101] Network is unreachable
{}

I traced it back here:
https://github.com/pachterlab/ffq/blob/master/ffq/utils.py#L727

I have standard proxy variables like (FTP_PROXY/ftp_proxy) configured and I can access ftp.ncbi.nlm.nih.gov with for instance lftp.

Looking at the config file I realized the other repositories use https rather than ftp. Using SRP192646 rather than GSE129845 works.

Still is there a way to make the ftp part of ffq aware of proxy settings?

aws example no longer works

Thanks for the great tool!

Running the example from the readme to retrieve the AWS data worked on July 5th, 2022 but didn't work on the next day (July 6th):

$ ffq --aws SRX7347523
[2022-07-06 12:15:44,487]    INFO Parsing Experiment SRX7347523
[2022-07-06 12:15:44,993] WARNING There is 1 run for SRX7347523
[2022-07-06 12:15:44,993]    INFO Parsing run SRR10668798
[]

I additionally tried on a number of other IDs but none worked

conda installation

Dear developers,
can you please add your software to bioconda. This helps a lot to use ffq within workflows using conda as software manager.

feature request: return metadata when no fastq available

Hi, Thanks for this great tool.
If you can add the functionality to return metadata even if the raw fastqs are not provided on GEO, that will be great.

Thanks!

Enable variable length SRRs

Example: SRR578548

(SRR is length 9)
https://www.ebi.ac.uk/ena/browser/view/SRR578548

HTTP 404 Error

Seeing an HTTP 404 error on an SRR (SRR19450559) that is available on SRA. Seems like the issue is that it relies on EBI for the metadata and perhaps it hasn't yet been synced.

% ffq --verbose SRR19450559       
[2022-06-17 18:52:15,552]   DEBUG Printing verbose output
[2022-06-17 18:52:15,553]   DEBUG Namespace(IDs=['SRR19450559'], aws=False, ftp=False, gcp=False, l=None, ncbi=False, o=None, split=False, t=None, verbose=True)
[2022-06-17 18:52:15,553]    INFO Parsing run SRR19450559
[2022-06-17 18:52:16,069]   ERROR 'HTTPError' object has no attribute 'getcode'
Traceback (most recent call last):
  File "/Users/mfansler/miniconda3/lib/python3.8/site-packages/ffq/utils.py", line 40, in cached_get
    response.raise_for_status()
  File "/Users/mfansler/miniconda3/lib/python3.8/site-packages/requests/models.py", line 953, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error:  for url: https://www.ebi.ac.uk/ena/browser/api/xml/SRR19450559/

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mfansler/miniconda3/lib/python3.8/site-packages/ffq/main.py", line 212, in main
    results.append(FFQ[v["prefix"]](v["accession"], args.l))
  File "/Users/mfansler/miniconda3/lib/python3.8/site-packages/ffq/ffq.py", line 355, in ffq_run
    run = parse_run(get_xml(accession))
  File "/Users/mfansler/miniconda3/lib/python3.8/site-packages/ffq/utils.py", line 69, in get_xml
    return BeautifulSoup(cached_get(f'{ENA_URL}/{accession}/'), 'xml')
  File "/Users/mfansler/miniconda3/lib/python3.8/site-packages/ffq/utils.py", line 42, in cached_get
    if exception.getcode() == 429:
AttributeError: 'HTTPError' object has no attribute 'getcode'
{}

System Info

macOS 10.6.7
Python 3.8.12
ffq v0.2.1

[enhancement] Generate Aspera URL

Hi there,

thank you for the very useful tool. I'd like to suggest a simple feature, which is to generate URLs for downloading with Aspera (supported by the ENA). For example, using --ftp generates the following URLs:

ffq --ftp SRX079566 | grep '"url"'
# snipped
#         "url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR292/SRR292241/SRR292241_1.fastq.gz"
#         "url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR292/SRR292241/SRR292241_2.fastq.gz"
#         "url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR390/SRR390728/SRR390728_1.fastq.gz"
#         "url": "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR390/SRR390728/SRR390728_2.fastq.gz"

Perhaps a new --aspera option that generates the following URLs:

ffq --aspera SRX079566 | grep '"url"'
# snipped
#         "url": "[email protected]:vol1/fastq/SRR292/SRR292241/SRR292241_1.fastq.gz"
#         "url": "[email protected]:vol1/fastq/SRR292/SRR292241/SRR292241_2.fastq.gz"
#         "url": "[email protected]:vol1/fastq/SRR390/SRR390728/SRR390728_1.fastq.gz"
#         "url": "[email protected]:vol1/fastq/SRR390/SRR390728/SRR390728_2.fastq.gz"

I know I can simply substitute ftp://ftp.sra.ebi.ac.uk/ with [email protected]: but I thought it would be nice to add it natively to ffq. It may also inform people of Aspera, which performs much faster downloads.

Cheers,
Dave

Enhancements

make links to supplementary_files files accessible via --supplementary (for example, like --ftp)
implement file name -> filetype parsing
implement class-based data structure for accessions and metadata

Tests framework `nose` no longer maintained

The nose testing library used in ffq is no longer maintained and was last updated in 2015. It doesn't work with Python 3.10 with an issue that will not be fixed.

It looks like most people have switched to pytest, and there's even an automated tool to help the conversion: nose2pytest. Is this something that would be of interest?

Phil

pachterlab / ffq Goto Github PK

ffq's Introduction

ffq

Installation

Usage

Fetch information of an accession and display it in the terminal

Examples:

Fetch information of multiple accessions and display it in the terminal

Examples:

Fetch information of an accession only down to specified level

Examples:

Fetch only raw data links from the host of your choice and display it in the terminal

FTP host

AWS host

GCP host

NCBI host

Examples:

Write accession information to a single JSON file

Write accession information to multiple JSON files, one file per accession

Fetch information of all studies (and all of their runs) in one or more papers

Complete output examples

Downloading data

FTP

AWS

GCP

NCBI-SRA

Use cases

Failure modes

Contributing

Caveats and limitations

Naming

Cite

ffq's People

Contributors

Stargazers

Watchers

Forkers

ffq's Issues

System Info

System Info

Recommend Projects

Recommend Topics

Recommend Org