Git Product home page Git Product logo

Comments (10)

dtolnay avatar dtolnay commented on June 14, 2024 1

Thanks @mookid, I tested it in dtolnay/trybuild#27 against some projects and it is definitely a lot better. I think it is still possible to do better, like in my example of diff -u <(echo '^^^^^^^^^^') <(echo '^^^^^^^^^^^^^^^') above I believe most people would perceive this as characters added to the end of the line rather than characters inserted at the beginning, i.e. treating the common prefix as shared makes more sense. But at this point I would use this if you released it.

You wrote in #37 (comment) that you are concerned about the performance. Can you give me an idea of how bad it is? What size of input are you testing on and how long does it take to compute the diff?

from diffr.

dtolnay avatar dtolnay commented on June 14, 2024

Another example that comes out wrong:

diffr

Again it would be better to prioritize the common prefix and highlight 5 trailing adjacent ^ as added.

from diffr.

mookid avatar mookid commented on June 14, 2024

Hi @dtolnay

Thanks for the sugestion and the examples. I will take a look soon.

from diffr.

mookid avatar mookid commented on June 14, 2024

Greedily expanding the snakes as much as possible is never a pessimization, but can also miss global extrema.

In other words, given the result yielded by diffr-lib:

-[ note: ]AAA
+[ note:] BBB[ ]CCC

the second snake can be extended one byte to the left (reducing the count of snakes by one) and this is always a win:

+[ note: ]BBB CCC

The best solution to the problem of minimizing the number of snakes projecting to a given LCS would be better, of course. But it does not help in the second case (in the given example as in your proposal, the number of snakes is 1).

from diffr.

mookid avatar mookid commented on June 14, 2024

Hi @dtolnay,

I am working on an improvement; after computing the longest common susequence, I need to figure out the "best" partition of both parts of the diff (ie, the one minimizing the number of different segments). I'll let you know here when I have some code that you can test.

from diffr.

mookid avatar mookid commented on June 14, 2024

@dtolnay I wrote some code that corresponds to the spec I have in mind; please let me know if the result corresponds to something like what you are looking for.

from diffr.

mookid avatar mookid commented on June 14, 2024

The worst case scenario I have seen is the fwllowing diff:
git/git@786dabe

for which the perf goes from ~13s to ~55s on my dev machine, which is pretty bad. In the wost case, the time of both postprocessing steps takes roughly the same time as the Myers algorithm.

I can still improve the postprocessing algorithm, but I think there will still be bad scenarios with that design. The alternative would be to merge tweak the Myers algorithm to yield the best split at the same time, but it's not easy.

About prioritizing the prefix or the suffix, it should be easy to do.

from diffr.

dtolnay avatar dtolnay commented on June 14, 2024

Thanks, that sounds like it wouldn't be a problem for my use case since the diffs I deal with are going to be ~40 lines max. I wasn't sure if you had some kind of quintic loop going on where a 40 lines diff could take multiple seconds. But obviously if you can improve the performance that would be great!

Thanks for your work on this!

from diffr.

mookid avatar mookid commented on June 14, 2024

Ok, good to know! I don't think it will be a problem for that use case.

from diffr.

mookid avatar mookid commented on June 14, 2024

I published diffr-lib 0.1.3. Please let me know how it works for you!

from diffr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.