| Store | Cart

Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

From: eryk sun <eryk...@gmail.com>
Tue, 9 Feb 2016 07:27:39 -0600
On Tue, Feb 9, 2016 at 3:21 AM, Victor Stinner <vict...@gmail.com> wrote:
> 2016-02-09 1:37 GMT+01:00 eryk sun <eryk...@gmail.com>:>> For example, in codepage 932 (Japanese), it's an error if a lead byte>> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a>> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not>> uncommon). In this case the ANSI API substitutes the default character>> for Japanese, '・' (U+30FB, Katakana middle dot).>>>>     >>> locale.getpreferredencoding()>>     'cp932'>>     >>> open(b'\xe05', 'w').close()>>     >>> os.listdir('.')>>     ['・']>>     >>> os.listdir(b'.')>>     [b'\x81E']>> Hum, I'm not sure that I understand your example.
Say I create a sequence of files with the names "file_à[N].txt"
encoded in Latin-1, where N is 0-2. They all map to the same file in a
Japanese system locale:

    >>> open(b'file_\xe00.txt', 'w').close(); os.listdir('.')    ['file_・.txt']
    >>> open(b'file_\xe01.txt', 'w').close(); os.listdir('.')    ['file_・.txt']
    >>> open(b'file_\xe02.txt', 'w').close(); os.listdir('.')    ['file_・.txt']
    >>> os.listdir(b'.')    [b'file_\x81E.txt']

This isn't a problem with a single-byte codepage such as 1251. For
example, codepage 1251 doesn't map b"\x98" to any character, but
harmlessly maps it to "\x98" (SOS in the C1 Controls block).

Single-byte code pages still have the problem that when a filename is
created using the wide-character API, listing it as bytes may use
either an approximate mapping (e.g. "à" => "a" in 1251) or the
codepage default character (e.g. "\xd7" => "?" in 1251).
_______________________________________________
Python-Dev mailing list
Pyth...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/python-dev-ml%40activestate.com
Recent Messages in this Thread
Victor Stinner Feb 08, 2016 02:32 pm
Victor Stinner Feb 08, 2016 02:40 pm
Matthias Bussonnier Feb 08, 2016 04:01 pm
Brett Cannon Feb 08, 2016 05:02 pm
Alexander Walters Feb 08, 2016 05:10 pm
Victor Stinner Feb 09, 2016 10:13 am
Paul Moore Feb 09, 2016 11:35 am
Paul Moore Feb 08, 2016 06:26 pm
Victor Stinner Feb 09, 2016 01:03 pm
Steve Dower Feb 10, 2016 01:37 am
Chris Angelico Feb 10, 2016 01:41 am
Steven DAprano Feb 10, 2016 10:18 am
Victor Stinner Feb 10, 2016 10:37 am
Andrew Barnert via Python-Dev Feb 10, 2016 02:01 am
Steve Dower Feb 10, 2016 02:42 am
Stephen J. Turnbull Feb 10, 2016 04:17 am
Steve Dower Feb 10, 2016 04:40 am
Stephen J. Turnbull Feb 10, 2016 08:00 am
Paul Moore Feb 10, 2016 08:30 am
Victor Stinner Feb 10, 2016 08:45 am
Paul Moore Feb 10, 2016 09:28 am
Stephen J. Turnbull Feb 10, 2016 02:51 pm
Andrew Barnert via Python-Dev Feb 10, 2016 07:56 am
Stephen J. Turnbull Feb 10, 2016 02:50 pm
Chris Barker Feb 08, 2016 08:41 pm
eryk sun Feb 09, 2016 12:37 am
Chris Barker - NOAA Federal Feb 09, 2016 01:57 am
Paul Moore Feb 09, 2016 08:08 am
Stephen J. Turnbull Feb 09, 2016 10:00 am
Victor Stinner Feb 09, 2016 09:21 am
eryk sun Feb 09, 2016 01:27 pm
Victor Stinner Feb 09, 2016 09:22 am
eryk sun Feb 09, 2016 01:33 pm
Serhiy Storchaka Feb 10, 2016 12:41 pm
Messages in this thread