Git Product home page Git Product logo

edit-distance's People

Contributors

belambert avatar dependabot[bot] avatar sourcery-ai[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

edit-distance's Issues

heap overflow with very long sequences

I get an extreme memory consumption when trying to align very long sequences with Python 3.6 (strings of about 10k characters). Calling SequenceMatcher.get_opcodes() never terminates, allocating more and more (up to 28 GB resident) until interrupted. This happens even with identical a and b, as long as the sequence is long enough.

Minimal example:

from edit_distance.code import SequenceMatcher
a = u"x" * 10000 # dummy text
b = u"x" * 10000 # same
matcher = SequenceMatcher(a, b)
matcher.get_opcodes()

Interested in implementing list of list functionality?

Thanks for the code,
Any idea on how to implement list of list functionality?

Instead of:
['a','b','c']
['a','b','c','d']

This
['a','b',['c']]
['a','b',['c','d','e']]

And the algorithm is allowed to add 'd' and 'e' inside the third element.

incorrect insert index?

The insert index returned by get_opcodes() appears to be off by -1.

Example:
from difflib import SequenceMatcher
sm = SequenceMatcher(a='abc', b='abdc')
print(sm.get_opcodes())

from edit_distance import SequenceMatcher
sm = SequenceMatcher(a='abc', b='abdc')
print(sm.get_opcodes())

output:
[('equal', 0, 2, 0, 2), ('insert', 2, 2, 2, 3), ('equal', 2, 3, 3, 4)]
[['equal', 0, 1, 0, 1], ['equal', 1, 2, 1, 2], ['insert', 1, 1, 2, 3], ['equal', 2, 3, 3, 4]]

the "insert" opcode index should be 2 (not 1).

Assertion error evaluating `opcodes`

Hi there,

I have an AssertionError thrown by this line of code. What does it mean? Shall I be worried? Is there any way to inspect what is happening in depth?

The two arrays of symbols I am comparing are the following:

  • ['that', 'continuous', 'sanction', ':=', '(', 'flee', 'U', 'complain', ')', 'E', 'attendance', 'eye', '^', 'flowery', 'revelation', '^', 'ridiculous', 'destination', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>']
  • ['continuous', ':=', '(', 'sanction', '^', 'flee', '^', 'attendance', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>', '<EOS>']

Thanks in advance,
Giulio

P.S. I know my data look weird, please don't ask what they are about :-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.