Git Product home page Git Product logo

Comments (10)

MikkelSchubert avatar MikkelSchubert commented on August 23, 2024

Hey Kevin,
Thank you very much for the praise.

I would expect that I can add such an option to AdapterRemoval, and will
try to get it done no later than the week after the next one (I am
currently busy with other projects).

Is the "PASSED" / "FAILED" labels on the read-name line something you need
as well, and if so, is the expectation that these labels are the only
meta-data, or can there be addition meta-data assosiated with a read? I.e.
which of the following are valid, other than the first one:

1: @read1/1 PASSED
2: @read1/1 PASSED other_meta_data
3: @read1/1 other_meta_data PASSED

assuming that you had a read like this:

@read1/1 other_meta_data

The (3) option would of course be the easiest to implement, but I can do
(2) as well. I'm generally not a fan of throwing away meta-information, but
I can implement (1) if that is what is expected by down-stream tools.

With regards to writing to standard out, that should be possible by simply
specifying /dev/stdout as the output file, once I've added this option.
I'll double-check that it is still the case, but AdapterRemoval should not
write to STDOUT as part of the trimming operations (only STDERR).

Cheers

On Thu, Jun 23, 2016 at 8:01 AM Kevin Murray [email protected]
wrote:

Hi Mikkel,

Thanks for an awesome tool. One small issue we have at the moment is being
able to output all reads that pass filtering to a single file (or even
STDOUT).

How difficult would it be to implement an option --output-everything that
accepts a single filename where reads shall be written? This file would be
interleaved, and maintain correct pairing with filtered reads using the
"single N" convention that many QC tools use. This entails replacing empty
(due to collapsing, minimum length filtering or trimming) read sequences
with a single 'N', with minimum quality score (example below).

@read1/1 PASSED
ACGTACTACTG
+
IIIIIIIIIII
@read1/2 PASSED
GATACTACTGT
+
IIIIIIIIIII
@read2/1 PASSED
CTGACGTACTA
+
IIIIIIIIIII
@read1/2 FAILED
N
+

Cheers,
Kevin


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#8, or mute the
thread
https://github.com/notifications/unsubscribe/ACTMa2-aLEECHivTva45yiCjc1P4iiAFks5qOiExgaJpZM4I8etE
.

from adapterremoval.

kdm9 avatar kdm9 commented on August 23, 2024

Wow, thanks heaps @MikkelSchubert.

Regarding the labels, I personally have no need for these (and used them above only to illustrate what I meant). If you think others would find this useful, I think your option 3 above is the best, as this keeps any metadata that exists in the reads before QC verbatim (e.g. barcodes from a demultiplexer etc).

As an aside, how would you treat a read pair where both reads had failed? I have seen some tools that output the pair with both reads' sequences set to N, but I've always found this a little strange (as it is not required to maintain pairing). The alternative is to simply throw the whole pair out. Do you have any opinions on this?

Cheers,
Kevin

from adapterremoval.

MikkelSchubert avatar MikkelSchubert commented on August 23, 2024

Hey Keven,
Sorry for the delay, but I did not have time to work on this when I thought I would, and then a vacation got in the way. Adding support for this turned out to be a bit more complex than anticipated, due to the number of existing options and files, but I have got it working now and will push the changes today or tomorrow, once I'm satisfied that it works as intended, so that you can take a look and see that it does what you want.

In the initial version, I've implemented it as the option --combined-output (may be renamed), which signals that output should be combined into just the normal --output1 and --output2 files. This works both for SE and PE reads. This can further be reduced to just --output1 for PE reads by using the existing --interleave-output option. Currently, discarded reads are written even if both mates are discarded, simply because I have not added a special case for that. I don't have a strong opinion either way.

Cheers,
Mikkel

from adapterremoval.

MikkelSchubert avatar MikkelSchubert commented on August 23, 2024

Hey Kevin,
Sorry for the delay, it took a bit longer to do all the testings I wanted.
The ability to write all output to the --output1 / --output2 files is now available on the master branch, using the --combined-output flag, and you can get single-file output for paired reads by also using the --paired-output flag. As I noted above, the naming is not final, and I could potentially add an --output option that automatically enables both --combined-output and --paired-output, e.g. something like --output-everything.

Please let me know what you think.

Cheers

from adapterremoval.

kdm9 avatar kdm9 commented on August 23, 2024

Thanks very much for this Mikkel. I will test this next week and report back to you.

THanks again,
Kevin

from adapterremoval.

MikkelSchubert avatar MikkelSchubert commented on August 23, 2024

Hey Kevin,
Did you have the opportunity to test the combined output feature?
And if so, did you have any issues with how it is currently implemented?

Cheers

from adapterremoval.

kdm9 avatar kdm9 commented on August 23, 2024

Hi Mikkel,

I was able to test the combined output feature. It works beautifully, with a couple of comments:

  • the "PASSED" and "FAILED" strings that are appended aren't required. I'm sorry, I never meant for these to be added, I simply put them in my example to explain what I meant. That said, they do no harm, so there is no problem them being there.
  • Another useful feature would be to output what some people call "broken paired" format. This is exactly the same as interleaved output, however reads that fail are excluded from the output. This means that read pairs no longer come as two lines. A few tools that used to require true interleave format have now moved to requiring this format, namely BWA (see the -p flag of bwa mem for example).

Again, thanks very much for you generous support!

Cheers,
Kevin

from adapterremoval.

MikkelSchubert avatar MikkelSchubert commented on August 23, 2024

That's good to hear.

I've removed the PASSED / FAILED strings now, sorry about the misunderstanding. With regards to 'broken' output, I should be able to add that as well, though it will take some more work, and it will probably have to wait a bit (just leave this issue open to remind me).

from adapterremoval.

kdm9 avatar kdm9 commented on August 23, 2024

Beautiful, thanks!

from adapterremoval.

kdm9 avatar kdm9 commented on August 23, 2024

This was resolved some time ago, thanks @MikkelSchubert

from adapterremoval.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.