> On Jun 8, 2015, at 08:15, Tal Einat <tale...@gmail.com> wrote:> > On Mon, Jun 8, 2015 at 11:44 AM, Serhiy Storchaka <stor...@gmail.com> wrote:> >> If such function will be added, I think it needs better name. E.g.>> difflib.isclose(a, b, threshold).> > Indeed, this is somewhat similar in concept to the recently-added> math.isclose() function, and could use a similar signature, i.e.:> difflib.SequenceMatcher.isclose(a, b, rel_tol=None, abs_tol=None).> > However, the real issue here is whether this is important enough to be> included in the stdlib.
One thing I found is that the fact that stdlib try to be smart (human friendly diff) make it horribly slow[1].
> SequenceMatcher tries to compute a "human-friendly diff" between two sequences.
If you are only interested in a quick ratio, especially on long sequences, I would suggest using another algorithm
which is not worse case scenario in n^3.
On some sequences computing the diff was several order of magnitude faster for me[2] (pure python).
At the point where quick-ratio was not needed.
Note also that SequeceMatcher(None, a, b) might not give the same result/ratio that SequenceMatcher(None, b, a)
>>> SequenceMatcher(None,'aba', 'bca').get_matching_blocks()
[Match(a=0, b=2, size=1), Match(a=3, b=3, size=0)] # 1 common char
>>> SequenceMatcher(None,'bca','aba').get_matching_blocks()
[Match(a=0, b=1, size=1), Match(a=2, b=2, size=1), Match(a=3, b=3, size=0)] # 2 common chars.
—
M
[1] For my application, I don’t really care about having Human Friendly diff, but the actual minimal diff.
I do understand the need for this algorithm though.
[2] Well chosen Benchmark : http://i.imgur.com/cZPwR0H.png <http://i.imgur.com/cZPwR0H.png>
_______________________________________________
Python-ideas mailing list
Pyth...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/