Git Product home page Git Product logo

Comments (8)

vishnubob avatar vishnubob commented on August 22, 2024

I just pushed a commit that provided a considerable speed up of convert_sequence_to_ints, but the the speed up of alignment_report is much trickier. One problem I know about is that the header of alignment_report essentially calls alignment three times (first for the match count, second for the mismatch count, and third for the report itself). The problem is that the CIGAR string does not differentiate between matched and mismatched bases, only that bases align. The reason I'm calling upper so much is to preserve the upper/lower case of input sequences in the subsequent report. The getseq() closure is a convenience, and really doesn't lend to itself to the inefficiencies you are experiencing. I've thought about caching the results, but I also wanted to avoid memory bloat. A quick win to try is setting the header=False on the alignment_report, or just using the alignment property, and formatting your own version of the content.

from ssw.

ksahlin avatar ksahlin commented on August 22, 2024

Great, Thanks!

I fixed a speedup to alignment(). I have only profiled one data set, where it seems to give about 2x speedup.

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
88540   49.172    0.001   82.287    0.001 sswobj.py:185(alignment)

Compare to the same profile row in previous post:
88412 52.584 0.001 158.666 0.002 sswobj.py:149(alignment)

I noticed slightly more calls to the new run but that should be change in the parameters in my software, i.e., not related to the alignment function itself (if does not call itself) . I did assert on the output of yours and mine alignment function and it passed all of my 3700 alignments (biological data), hopefully it will pass your unittests as well. Could you consider using this function instead? The function is found here:
https://gist.github.com/ksahlin/76a16f0b6fa19988b1db

from ssw.

ksahlin avatar ksahlin commented on August 22, 2024

correction: should be "it does not call itself"

from ssw.

ksahlin avatar ksahlin commented on August 22, 2024

Hi again, I did a further optimization in the alignment() function, specifically of the match string parsing as:

match_seq = ''.join(['|' if r_base.upper() == q_base.upper() else '*' for (r_base, q_base) in zip(ref_piece, query_peace)])

see gist https://gist.github.com/ksahlin/3b740b78d1d80cbd20be
new benchmark:

    88540   35.453    0.000   61.895    0.001 sswobj.py:185(alignment)

from ssw.

ksahlin avatar ksahlin commented on August 22, 2024

Hi again,

could you consider including the function i attached in the gist in your repo?

from ssw.

Chris7 avatar Chris7 commented on August 22, 2024

@ksahlin why not fork it and submit a PR?

from ssw.

ksahlin avatar ksahlin commented on August 22, 2024

@Chris7 Ok! Done in #5

from ssw.

vishnubob avatar vishnubob commented on August 22, 2024

Thanks for this PR, this has been integrated.

from ssw.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.