Re: [Python-ideas] difflib.SequenceMatcher quick_ratio

From: Matthias Bussonnier <buss...@gmail.com>

Mon, 8 Jun 2015 08:39:59 -0700

> On Jun 8, 2015, at 08:15, Tal Einat <tale...@gmail.com> wrote:> > On Mon, Jun 8, 2015 at 11:44 AM, Serhiy Storchaka <stor...@gmail.com> wrote:> >> If such function will be added, I think it needs better name. E.g.>> difflib.isclose(a, b, threshold).> > Indeed, this is somewhat similar in concept to the recently-added> math.isclose() function, and could use a similar signature, i.e.:> difflib.SequenceMatcher.isclose(a, b, rel_tol=None, abs_tol=None).> > However, the real issue here is whether this is important enough to be> included in the stdlib.


One thing I found is that the fact that stdlib try to be smart (human friendly diff) make it horribly slow[1]. 

>  SequenceMatcher tries to compute a "human-friendly diff" between two sequences.

If you are only interested in a quick ratio, especially on long sequences, I would suggest using another algorithm
which is not worse case scenario in n^3.

On some sequences computing the diff was several order of magnitude faster for me[2] (pure python).
At the point where quick-ratio was not needed.

Note also that SequeceMatcher(None, a, b) might not give the same result/ratio that SequenceMatcher(None, b, a)

>>> SequenceMatcher(None,'aba', 'bca').get_matching_blocks()
 [Match(a=0, b=2, size=1), Match(a=3, b=3, size=0)]   # 1 common char

>>> SequenceMatcher(None,'bca','aba').get_matching_blocks()
 [Match(a=0, b=1, size=1), Match(a=2, b=2, size=1), Match(a=3, b=3, size=0)] # 2 common chars.

— 
M

[1] For my application, I don’t really care about having Human Friendly diff, but the actual minimal diff. 
     I do understand the need for this algorithm though. 
[2] Well chosen Benchmark : http://i.imgur.com/cZPwR0H.png <http://i.imgur.com/cZPwR0H.png>

_______________________________________________
Python-ideas mailing list
Pyth...@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Recent Messages in this Thread
[Python-ideas] difflib.SequenceMatcher quick_ratio	floyd	Jun 08, 2015 07:56 am
Re: [Python-ideas] difflib.SequenceMatcher quick_ratio	Serhiy Storchaka	Jun 08, 2015 08:44 am
Re: [Python-ideas] difflib.SequenceMatcher quick_ratio	Tal Einat	Jun 08, 2015 03:15 pm
Re: [Python-ideas] difflib.SequenceMatcher quick_ratio	Matthias Bussonnier	Jun 08, 2015 03:39 pm
Re: [Python-ideas] difflib.SequenceMatcher quick_ratio	Andrew Barnert via Python-ideas	Jun 08, 2015 01:31 pm

◄ Messages in this thread ►

Previous post: Re: [Python-ideas] If branch merging

Next post: Re: [Python-ideas] Hooking between lexer and parser

Subscribe to the python-ideas RSS feed

Accounts

List Archives

Feedback & Information

ActiveState

© 2019 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.