Comments (6)
@ErinBecker this looks like a great tool, but maybe the HPC lesson is a better spot for this. There are a few tools for faster data transfer, and it's a great topic, but for this lesson we intentionally are using a small dataset. IMHO.
from organization-genomics.
I believe we can close this, although it seems appropriate to re-write the lesson to make use of the new tools. @ErinBecker
According to NCBI (Ben Busby?) and on https://github.com/ncbi/sra-tools,
"With release 2.9.1 of sra-tools we have finally made available the tool fasterq-dump, a replacement for the much older fastq-dump tool. As its name implies, it runs faster, and is better suited for large-scale conversion of SRA objects into FASTQ files that are common on sites with enough disk space for temporary files. fasterq-dump is multi-threaded and performs bulk joins in a way that improves performance as compared to fastq-dump, which performs joins on a per-record basis (and is single-threaded).
fastq-dump is still supported as it handles more corner cases than fasterq-dump, but it is likely to be deprecated in the future.
You can get more information about fasterq-dump in our Wiki at https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump."
from organization-genomics.
Thanks for the feedback @hoytpr. I'm pinging @ACharbonneau to see if she wants to try to incorporate this into the Cloud lesson.
from organization-genomics.
Arizona BugBBQ - We don't think any importing from SRA is needed for this workshop. Learners should be given skills that will be make this easier on its own. There tools that will pull from SRA without using NCBI tools, etc.
from organization-genomics.
It's true that there are tons of ways to download, and life science students probably know a couple of ways. Providing the data with a link for interested learners is a great way to save time for more important items.
from organization-genomics.
@JasonJWilliamsNY and @ErinBecker Because the curl
and wget
functions worked very well in the Arizona BugBBQ, and because the original/predecessor fastq-dump
is already multi-threaded, (even if not needed), I'm going to close this issue.
from organization-genomics.
Related Issues (20)
- Typo in the common problems section HOT 4
- Combine Data Tidiness and Planning for NGS HOT 1
- Combine (and reduce) the SRA Lesson HOT 3
- clarification HOT 1
- Lesson 3 REL4541B instructions no longer work as-written HOT 1
- topic for the repository HOT 1
- Ep1 Data Tidyness Metadata Example HOT 1
- Locate the Run Selector Table for the Lenski Dataset on the SRA does not work with IE HOT 2
- _episodes/03-ncbi-sra.md NCBI SRA dowload workflow error HOT 4
- SraRunTable.txt downloads as CSV, not TSV HOT 2
- Ask about learners' science in data tidiness first discussion? HOT 3
- Some issues https://datacarpentry.org/organization-genomics/02-project-planning/index.html HOT 1
- Need jump lists (anchors) for headings HOT 1
- Standardise heading levels
- Data tidiness file download issue HOT 3
- Updates required in ‘Examining Data on the NCBI SRA Database’ HOT 2
- Suggestion for page: Project Organization and Management for Genomics: Data HOT 2
- Scheduling early transition to Workbench HOT 2
- Transition To Workbench in May HOT 25
- Links need to be fixed in CONTRIBUTING.md HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from organization-genomics.