Rocco Caputo <rcap...@pobox.com> writes:
>> On Aug 27, 2015, at 14:50, Zefram <zef...@fysh.org> wrote:>> >> Ed Avis wrote:>>> although it seems from discussion here that the '' comparison will be faster>>> if the strings are long.>> >> Yeah. But you'd expect the constant cost of $_ ne '' to be greater than>> the minimum cost of length() (on a short string), because of the greater>> number of ops. Are you optimising for average case or for worst case?>> For average case, how long are your strings, and how often are they>> upgraded?>> >> $_ ne '' could also be reduced to a single op via the peephole optimiser,>> and it's probably common enough to be worth it. That would then be as>> fast as the fully-weakened length. (The two are not quite equivalent>> in behaviour, as Ilmari has pointed out.)>> I tested the premise that the cost of length() on a long UTF-8 string was faster than C<$_ eq "">. If my methodology is sound, that's false. Caveat: This is perl 5, version 20, subversion 2 (v5.20.2) built for darwin-thread-multi-2level.>> length() is 72 CPU nanoseconds faster than C<$_ eq "">:>> % perl -Mutf8 -le '$_ = "å ߆®îñ©" x 10_000_000; my $user = times(); for (my $i = 0; $i < 100_000_000; ++$i) { 1 if length } print times() - $user'> 12.03
length() on upgraded strings caches the result in magic, so your
benchmark is not valid. Try appending "" after each iteration.
$ perl -MDevel::Peek -E 'my $x = "a"; utf8::upgrade($x); Dump $x; length($x); Dump $x; $x .= ""; Dump $x; length $x; Dump $x' 2>&1 | egrep 'PV|MG_LEN'
SV = PV(0x15efd70) at 0x160ef00
PV = 0x15fe350 "a"\0 [UTF8 "a"]
SV = PVMG(0x164a5b0) at 0x160ef00
PV = 0x15fe350 "a"\0 [UTF8 "a"]
MG_LEN = 1
SV = PVMG(0x164a5b0) at 0x160ef00
PV = 0x15fe350 "a"\0 [UTF8 "a"]
MG_LEN = -1
SV = PVMG(0x164a5b0) at 0x160ef00
PV = 0x15fe350 "a"\0 [UTF8 "a"]
MG_LEN = 1
--
"A disappointingly low fraction of the human race is,
at any given time, on fire." - Stig Sandbeck Mathisen