Comments (4)
Great, happy to hear that it is useful!
Yes, you can combine the runs via concatenating the demuxed files - but you need to make sure that each sample has a unique label. So the function of the --mult_samples
is to add a "base name" to each read so that it is distinct from another run, i.e.
amptk ion -i run1.fastq.gz -o run1 --mult_samples r1 --barcode_fasta labels.fa
This will create fastq headers that have 'r1.' at the beginning of the sample name, i.e.
$ amptk ion -i ion.test.fastq -o run1 --mult_samples r1
-------------------------------------------------------
[Apr 04 03:04 PM]: OS: MacOSX 10.13.3, 8 cores, ~ 17 GB RAM. Python: 2.7.11
[Apr 04 03:04 PM]: AMPtk v1.1.2-c1661a0, USEARCH v9.2.64, VSEARCH v2.7.0
[Apr 04 03:04 PM]: Foward primer: AGTGARTCATCGAATCTTTG, Rev comp'd rev primer: GCATATCAATAAGCGGAGGA
[Apr 04 03:04 PM]: Loading FASTQ Records
[Apr 04 03:04 PM]: 2,000 reads (1.6 MB)
-------------------------------------------------------
[Apr 04 03:04 PM]: Concatenating Demuxed Files
[Apr 04 03:04 PM]: 2,000 total reads
[Apr 04 03:04 PM]: 1,409 valid Barcode
[Apr 04 03:04 PM]: 1,406 Fwd Primer found, 1,151 Rev Primer found
[Apr 04 03:04 PM]: 34 discarded too short (< 100 bp)
[Apr 04 03:04 PM]: 1,372 valid output reads
[Apr 04 03:04 PM]: Found 19 barcoded samples
Sample: Count
r1.BC.27: 95
r1.BC.23: 90
r1.BC.17: 90
r1.BC.28: 88
r1.BC.20: 82
r1.BC.73: 80
r1.BC.18: 77
r1.BC.16: 72
r1.BC.15: 72
r1.BC.21: 72
r1.BC.10: 71
r1.BC.22: 68
r1.BC.11: 66
r1.BC.14: 65
r1.BC.9: 65
r1.BC.24: 62
r1.BC.12: 60
r1.BC.19: 53
r1.BC.5: 44
[Apr 04 03:04 PM]: Output file: run1.demux.fq.gz (242.0 KB)
[Apr 04 03:04 PM]: Mapping file: run1.mapping_file.txt
The other way is to just provide unique labels in the barcode fasta file:
>sample1
CTAAGGTAAC
>sample2
TAAGGAGAAC
>sample3
AAGAGGATTC
>sample4
TACCAAGATC
...
And then finally you can specify unique names in a mapping file as well, i.e.
#SampleID BarcodeSequence LinkerPrimerSequence ReversePrimer phinchID DemuxReads Treatment
sample1 CAGAAGGAAC CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGAAGGAACAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC r1.BC.5 44 no_data
sample2 TGAGCGGAAC CCATCTCATCCCTGCGTGTCTCCGACTCAGTGAGCGGAACAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC r1.BC.9 65 no_data
sample3 CTGACCGAAC CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGACCGAACAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC r1.BC.10 71 no_data
sample4 TCCTCGAATC CCATCTCATCCCTGCGTGTCTCCGACTCAGTCCTCGAATCAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC r1.BC.11 66 no_data
And then use that mapping file to demux
$ amptk ion -i ion.test.fastq -o run1 -m my_mappingfile.txt
-------------------------------------------------------
[Apr 04 03:09 PM]: OS: MacOSX 10.13.3, 8 cores, ~ 17 GB RAM. Python: 2.7.11
[Apr 04 03:09 PM]: AMPtk v1.1.2-c1661a0, USEARCH v9.2.64, VSEARCH v2.7.0
[Apr 04 03:09 PM]: Foward primer: AGTGARTCATCGAATCTTTG, Rev comp'd rev primer: GCATATCAATAAGCGGAGGA
[Apr 04 03:09 PM]: Loading FASTQ Records
[Apr 04 03:09 PM]: 2,000 reads (1.6 MB)
-------------------------------------------------------
[Apr 04 03:09 PM]: Concatenating Demuxed Files
[Apr 04 03:09 PM]: 2,000 total reads
[Apr 04 03:09 PM]: 248 valid Barcode
[Apr 04 03:09 PM]: 248 Fwd Primer found, 210 Rev Primer found
[Apr 04 03:09 PM]: 2 discarded too short (< 100 bp)
[Apr 04 03:09 PM]: 246 valid output reads
[Apr 04 03:09 PM]: Found 4 barcoded samples
Sample: Count
sample3: 71
sample4: 66
sample2: 65
sample1: 44
[Apr 04 03:09 PM]: Output file: run1.demux.fq.gz (42.2 KB)
[Apr 04 03:09 PM]: Mapping file: my_mappingfile.txt
If you already have your demuxed runs and don't want to redo it, you could also do a find/replace with sed on your demuxed runs:
sed 's/barcodelabel=/barcodelabel=run1./g' demux.fq > demux.fixed.fq
And then finally you can just concatenate using cat
which will work even on gzipped files:
cat run1.demux.fq.gz run2.demux.fq.gz > combined.demux.fq.gz
from amptk.
Brilliant, thanks for the quick and thorough reply, Jon! It is working like a charm.
from amptk.
Thanks Jon! Is the data quality improved on the Ion S5 XL? We are still running the PGM - no plans to upgrade at the moment, but just curious. And let me know if you run into any more problems with AMPtk or have some features that you'd like to see incorporated.
from amptk.
from amptk.
Related Issues (20)
- Issue installing AMPtk (Mac OS - M1 chip) HOT 2
- getting NoneType vs int error in clustering step
- Error when run quick start HOT 7
- usearch9 not found when generate UTAX database
- VSEARCH error on amptk -filter step
- Support Python 3.8 onwards HOT 3
- SyntaxError in "duplicate ID in mapping file: XXX, exiting"
- Default for -p, --index_bleed documented as 0.005 HOT 1
- Typo "Bjerkandara adusta" --> "Bjerkandera adusta" HOT 1
- Missing species names in amptk_mock1.fa HOT 3
- Missing final new line in amptk_mock1.fa and amptk_synmock.fa HOT 2
- Inconsistent primer trimming sequence in amptk_mock*.fa HOT 5
- Matching MockA, MockB1 and MockB2 to FASTQ filenames HOT 2
- platform.linux_distribution is removed since Python 3.8 HOT 1
- Species names in amptk_mock2.fa and amptk_mock3.fa vs Figure 4
- new users cannot install amptk properly, please help HOT 3
- unoise3 clustering HOT 5
- Problem with TypeError during AMPtk cluster HOT 11
- Saw you started some prelim ONT methods HOT 2
- Problematic unoise3 implementation with VSEARCH HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amptk.