Git Product home page Git Product logo

Comments (2)

MikkelSchubert avatar MikkelSchubert commented on August 23, 2024

Dear George,

I've looked at your data, and as far as I can see there are a couple of issues:

Both issues relate to the way AdapterRemoval aligns reads and adapters in paired-end mode, which is done by combining the read and adapter sequences and then performing a gap-less pair-wise alignment between the two sequences:

 Adapter2' + Read1
  aligned to
 Read2' + Adapter1

Once these combined sequences have been aligned, AdapterRemoval can then use alignment information to accurately trim the adapter sequence from the reads.

However, because the alignment is ungapped, indels early in one of the reads can result in a proper alignment not being found, which appears to be the case for 3 of the pairs of reads that you included:

HISEQ:247:C87NTANXX:7:1101:2120:2851
	R1	GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAATGC[...]
	A1	GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	
	R2	GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGTCAGTA[...]
	A2	 ATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

HISEQ:247:C87NTANXX:7:1101:3751:2506
	R1	GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAATGC[...]
	A1	GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	
	R2	GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGACAGTA[...]
	A2	 ATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

HISEQ:247:C87NTANXX:7:1101:9224:2232
	R1	GATCGGAAGAGCACACGTCTGAACTCCAGTCACTAATGC[...]
	A1	GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	
	R2	GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGTCAGTA[...]
	A2	 ATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

However, it is also possible that you are using the wrong sequence for --adapter2. You could try running AdapterRemoval with the --identify-adapters option and see what AdapterRemoval reports. That option prints consensus adapter sequences obtained by aligning the mate 1 and mate 2 reads, and should hopefully correspond (with some uncertainty) to your own --adapter1/--adapter2 values.

The second problem is a bit trickier:

HISEQ:247:C87NTANXX:7:1101:2220:2855
	R1	ATCGTTAATCGATTTTCCTCGGATCGGAAGAGCACACGTCTGAACTCCAGTCACTAATGCGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAA
	A1	                     GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	
	R2	TAATGCGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAACCCGCGACAGCAGTTTGGTTAGGATGCGGCTTAGGGTCTTAGGTCGATCGGTAA
	A2	???

HISEQ:247:C87NTANXX:7:1101:2220:2856
	R1	ATCGTTAATCGATTTTCCTCGTAATGCGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	A1	                                                                   GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
	
	R2	TAATGCGCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAACAACCCGCGACAGCAGTTTGGTTAGGATGCGGCTTAGGGTCTTAGGTCGAAAAATAA
	A2	???

While --adapter1 is clearly present in the reads, the --adapter2 sequence is not. And if you look at the reverse complements of the reads, it doesn't seem like the two reads overlap at all. Possibly you are looking at some sort of dimer or other non-biological sequence with the primer sequence embedded near one end. Either way, since the two sequences are not complementary, AdapterRemoval correctly fails to align the two reads and therefore does not trim the embedded adapter sequence.

You could maybe filter out reads like this after you've performing adapter trimming, which would also catch the kind of false negatives described above, but I don't have specific advice in that regard.

from adapterremoval.

g-pacheco avatar g-pacheco commented on August 23, 2024

Dear Mikkel,

Thanks very much for your quick reply, and apologies for my delayed one.

I have run AR with the --identify-adapters flag as you indicated on three of my sample, and it found the following sequences:

Adapter1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
Adapter2 (i5): AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

For Adapter 2, I reckon it makes sense and the sequence does seem to be correct. However, I have noticed that I get a better result when I use just the first part of it (AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA). As for Adapter 1, from what I could see this initial A should not really be there, but I get a much better result when I include it. Moreover, I also get a better result when I use just the first part of this adapter (GATCGGAAGAGCACACGTCTGAACTCCAGTCAC), and I include this initial A.

I have run some tests using this configuration, and I think AR is working as I would expect now.

Many thanks once again, George.

from adapterremoval.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.