It seems the message was too large, and it's not shown in the web interface. Here is the main information, the perl -V output is in the previous message.
I have observed buggy behaviour of the built-in 'split' function under certain conditions. It is triggered when the PATTERN contains UTF8 characters from Latin-1 Supplement, and EXPR is a non-UTF8 (ascii-only) string. After that, subsequent calls to 'split' produce errorneous results.
In this example, the first and the last iterations of the 'for' loop are supposed to produce the same result, but actually the last result becomes different after 'split' is called as described above.
In addition to that, I have observed in Perl 5.14 and earlier versions that buggy behaviour is also triggered when there are any UTF8 characters in the PATTERN and an ascii-only string in EXPR.
[12:46] u...@debian7 ~/test/split $ cat split.pl
# this file is encoded in UTF8, obviously
use strict;
use warnings;
use utf8;
use Data::Dumper;
sub main {
my $split_chr = 'ä';
my $good = "a${split_chr}b";
my $bad = 'aab';
for my $str ($good, $bad, $good) {
print "Splitting: $str by pattern $split_chr; is_utf8: "
. utf8::is_utf8($str) . "\n";
my @sp = split /$split_chr/, $str;
print Dumper(\@sp);
}
}
binmode STDOUT, ':utf8';
main;
[12:45] u...@debian7 ~/test/split $ perl5.20.1 split.pl
Splitting: aäb by pattern ä; is_utf8: 1
$VAR1 = [
'a',
'b'
];
Splitting: aab by pattern ä; is_utf8:
$VAR1 = [
'aab'
];
Splitting: aäb by pattern ä; is_utf8: 1
$VAR1 = [
"a\x{e4}b"
];
[12:41] u...@debian7 ~/test/split $ perlbrew exec perl split.pl
perl-5.14.4
==========
Splitting: aäb by pattern ä; is_utf8: 1
$VAR1 = [
'a',
'b'
];
Splitting: aab by pattern ä; is_utf8:
$VAR1 = [
'aab'
];
Splitting: aäb by pattern ä; is_utf8: 1
$VAR1 = [
"a\x{e4}b"
];
perl-5.21.6
==========
Splitting: aäb by pattern ä; is_utf8: 1
$VAR1 = [
'a',
'b'
];
Splitting: aab by pattern ä; is_utf8:
$VAR1 = [
'aab'
];
Splitting: aäb by pattern ä; is_utf8: 1
$VAR1 = [
"a\x{e4}b"
];
---
via perlbug: queue: perl5 status: new
https://rt.perl.org/Ticket/Display.html?id=123469