Hi, thank you so much for your help with understanding how demultiplex work. Now I am facing a new problem, since every read was falling into UNKNOWN I subsampled the first 2500 to do some tries on those. My barcodes file looks like this now (it is a very big file, this is just the first four of them):
P1004_En TAGGAACTGGCC+TCCCTTGTCTCC
P1022_En CTAGCGAACATC+TCCCTTGTCTCC
P1019_En GACAGGAGATAG+TCCCTTGTCTCC
P1015_En ATTCCTGTGAGT+TCCCTTGTCTCC
And my first reads look like this:
@A00406:67:HK3KJDRXX:1:2101:1090:1000 1:N:0:TGCAGTGTGGAG+NGAGTACGGTTT NTGTCAGCCGCCGCGGTAATACGTAGGGAGCAAGCGTTGTACGGATTTAATGGGCGTAAAGCGCGAGTAGGCGGCCCAGAAGGTCAGCTGTGAAATCTCGGGGCTAAACTACGATCTGTCAATGGAAACAGCATTGCTAGAGTGCGGAAGTGGAAACAGGAATTCTAGGTGTAGCGGTGTAATGCGAAGATATCGGGAGGAACACCGGT GGCGAAGGCGGCGTACTGGAACGCAACTGACGCTGATGAGCG + #::F:F:,FF::FFFF,F,,FF:FFF,F,F:FFFFF,FF,F:FF:FFF:FF,:FFF,FFF:,FFF,FFFFFF:FFF,,F:F,FF:,,,FFF:FF,:FFF,,F::FFFF:,:,FF,F,FFF,,F,::F::FFF,,FFFFF:FFF,F,:FFFFFF,:F,::FFF:FF,,FFF:FFFF,FFF,:FFFFF:FFFFFFF,FFFFF:FFF:F,F: :F:F::F,:FFFF,,FFFF,FF,,,FFFF::,FFF,:,,:FF @A00406:67:HK3KJDRXX:1:2101:1108:1000 1:N:0:GAGGAAATTAAG+NTCACAAGTTTT NTGTCAGCCGCCGCGGTAATACGAAGGGTGCAAGCGTTTATCGGAATTACTGGGCGTAAAGCGAGCGAAGGCGGATGTGCAAGACAGGTGTGAAATCACAGGGCTTAACAAGGGAACTTCACTTGTGACTGCACGGCTGGAGTTCGGAAGAGGGGGATGGAATTCGTCGTGTAGCAGTGAAATGCGTAGATATGAGGAGGAACACCGGT GGCGAAGGCAGTCACCTGGGCCAGGACTGACGCTCATGAACG + #:FFFFFFFFFFFFFF:F,FFFF,FFFF,FFF::F::F,,FFFFF,FFFFFFFFFFFFFFFFF:FFF,FFFFFF,F,FF:FFF:FF:,FFFFFFFFF,F,F:F:FFF:FF,FF::,,:::FFFFFFFF::FFF,,FFF,FFFF,FF,,FF,:F:,F,,FFFFFFF,:,FFFFF:FFFFFFFFF:FFFFFFFFFFFFF:FFF:FFFFF,: FFFF,FFFFFF,,,FFFFF,FF,,,FFFFF:,,F,F,F,,:F @A00406:67:HK3KJDRXX:1:2101:1253:1000 1:N:0:AGCTGGAAGTCC+NTCACCAGGAGT NTGCCAGCAGCCGCGGTAATACGTCGGGTGCAAGCGTGGATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTATGCAAGACAGATGGGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGTGACTGCATGGCTCGAGTACGGCAGAGGGGGCTGGCATTCCGCGTGTAGCAGTGAAATGCGTAGATCTGCGGAGGAACCCCCAT GGCGCCGGCAAACCCCAGGGCCTGTACTGACGCGCCGGCACG + #,:F:,:FF:F:F::F:,,FFFF:FFFFF:FFFFFF:,,FFFFFFFF:,:,:FFFF:F:F,FF,FFFF:FFFFF:F:F,FF:F,F,F::F:FFF,FFFFFF:FFFFF:FF,F,FF,FFFF,,,:FF:F:,::FF,FF,:F,F,,,,:FFF,FFF,:,F,F,F,,FFFF:FF,,FFFF,F,,F,:F:,:F,F:,:FFFF:F,:,::F,F, FFF:::,:F,:,FFF:,,FFFF::,:F,:FF:F,FF,FF,F, @A00406:67:HK3KJDRXX:1:2101:1416:1000 1:N:0:ACGAACCCATAA+NTCTAAAAGCCA NTGTCAGCAGCCGCGGTGACACGTAGGCACCAAGCGTTGTCCGGATTTACTGGGCGTAAAGGGATTGCAGGCTGCCCCTCAAGTGGTGCATGAAAGGGCTCGGCTCAACCCCGCTAGGTTATGCCAGACGGAGGGGCTAGAGATCGAGAGCGGGACGTGGAATTCCGGGTGTAGTGGTGAAATGCGTAGAGATCCGGAGGAACACCAGA GGCGAAGGCGGCTTCCTGGCTCGCATCTGACGCTCAGACACG +
The code I was planning to use is the following:
demultiplex demux -e 22 lane1_barcodes.tsv subsample/Lane_1_Undetermined_I984_L1_R1.subsample.fastq subsample/Lane_1_Undetermined_I984_L1_R2.subsample.fastq
But it still charachterizes all the reads as unknown. When I tried the guess command, it gave me the following list of possible barcodes:
1 GGGGGGGGGGGG+ACGAGACTGATT
2 GGGGGGGGGGGG+AGCGGAGGTTAG
3 GGGGGGGGGGGG+AGTTACGAGCTA
4 GGGGGGGGGGGG+ATCGCACAGTAA
5 GGGGGGGGGGGG+GCGGGCCCGCCC
6 GGGGGGGGGGGG+GCTGTACGGATT
7 GGGGGGGGGGGG+GGGGGGGGGGGG
8 GGGGGGGGGGGG+GTCGTGTAGCCT
9 GGGGGGGGGGGG+TCTTTCCCTACA
10 GGGGGGGGGGGG+TGGTCAACGATA
11 NNNNNNNNNNNN+NCTNNNNNNNNN
12 TCCTCGTCGACA+TCCCTTGTCTCC
But this ones do not match the ones on the headers. Do you know why that might be happening?