| Store | Cart

[perl #130831] Perl's open() has broken Unicode file name support

From: James E Keenan via RT <perl...@perl.org>
Sun, 26 Feb 2017 13:20:06 -0800
On Tue, 21 Feb 2017 20:58:03 GMT, p...@cpan.org wrote:
> Function open() has broken processing of non-ASCII file names.> > Look at these two examples:> > $ perl -e 'open my $file, ">", "\N{U+FF}"'> > $ perl -e 'open my $file, ">", "\xFF"'> > First one create file with name 0xc3 0xbf (ΓΏ), second one with name 0xff> > And because those two strings "\N{U+FF}" and "\xFF" are equal they must > create same file, not two different.> > $ perl -e '"\xFF" eq "\N{U+FF}" && print "equal\n"'> equal> > Bug is in open() implementation in PP(pp_open) in file pp_sys.c.> > File name is read from perl scalar to C char* as:> > tmps = SvPV_const(sv, len);> > But after that SvUTF8(sv) is *not* used to check if char* tmps is > encoded in UTF-8 or Latin1. It pass tmps directly to do_open6() function > without SvUTF8 information.> > So to fixing this bug it is needed to define how function open should > process filename. Either as binary octets and SvPVbyte() instead of > SvPV() should be used, or as Unicode string and SvPVutf8() instead of > SvPV() should be used.> > It also means that it is needed to define what Perl_do_open6() should > expect. Its argument for file name is of type: const char *oname. It > should be either binary octets or UTF-8.> > There are basically two problems with it:> > 1) On some systems (e.g. on Linux) file name could be arbitrary sequence > of binary characters. It does not have to be valid UTF-8 representation.> > 2) Perl modules probably already uses perl Unicode scalars as argument > for file names> > And decision should still allow to open any file on VFS from 1) and > probably should not break 2). And I'm not sure if it is possible to have > both 1) and 2) together.> > Current state is worse as both 1) and 2) is broken.

ISTR seeing a fair amount of discussion of this issue on #p5p.  Would anyone care to summarize this discussion?

Thank you very much.

-- 
James E Keenan (jkee...@cpan.org)

---
via perlbug:  queue: perl5 status: new
https://rt.perl.org/Ticket/Display.html?id=130831

Recent Messages in this Thread
p...@cpan.org Mar 05, 2017 10:59 am
p...@cpan.org Mar 05, 2017 10:13 am
Zefram Mar 05, 2017 10:43 am
p...@cpan.org Mar 05, 2017 10:03 am
p...@cpan.org Mar 04, 2017 12:09 am
Zefram Mar 04, 2017 05:21 am
(via RT) Feb 21, 2017 08:58 pm
James E Keenan via RT Feb 26, 2017 09:20 pm
Tony Cook via RT Mar 01, 2017 12:18 am
p...@cpan.org Mar 01, 2017 02:42 pm
Zefram Mar 02, 2017 03:22 am
Zefram Mar 02, 2017 03:25 am
Messages in this thread