| Store | Cart

Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

From: eryk sun <eryk...@gmail.com>
Mon, 8 Feb 2016 18:37:13 -0600
On Mon, Feb 8, 2016 at 2:41 PM, Chris Barker <chri...@noaa.gov> wrote:
> Just to clarify -- what does it currently do for bytes? IIUC, Windows uses> UTF-16, so can you pass in UTF-16 bytes? Or when using bytes is is assuming> some Windows ANSI-compatible encoding? (and what does it return?)
UTF-16 is used in the [W]ide-character API. Bytes paths use the [A]NSI
codepage. For a single-byte codepage, the ANSI API rountrips, i.e. a
bytes path that's passed to CreateFileA matches the listing from
FindFirstFileA. But for a DBCS codepage arbitrary bytes paths do not
roundtrip. Invalid byte sequences map to the default character. Note
that an ASCII question mark is not always the default character. It
depends on the codepage.

For example, in codepage 932 (Japanese), it's an error if a lead byte
(i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
uncommon). In this case the ANSI API substitutes the default character
for Japanese, '・' (U+30FB, Katakana middle dot).

    >>> locale.getpreferredencoding()    'cp932'
    >>> open(b'\xe05', 'w').close()>>> os.listdir('.')    ['・']
    >>> os.listdir(b'.')    [b'\x81E']

All invalid sequences get mapped to '・', which roundtrips as
b'\x81\x45', so you can't reliably create and open files with
arbitrary bytes paths in this locale.
_______________________________________________
Python-Dev mailing list
Pyth...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/python-dev-ml%40activestate.com
Recent Messages in this Thread
Victor Stinner Feb 08, 2016 02:32 pm
Victor Stinner Feb 08, 2016 02:40 pm
Matthias Bussonnier Feb 08, 2016 04:01 pm
Brett Cannon Feb 08, 2016 05:02 pm
Alexander Walters Feb 08, 2016 05:10 pm
Victor Stinner Feb 09, 2016 10:13 am
Paul Moore Feb 09, 2016 11:35 am
Paul Moore Feb 08, 2016 06:26 pm
Victor Stinner Feb 09, 2016 01:03 pm
Steve Dower Feb 10, 2016 01:37 am
Chris Angelico Feb 10, 2016 01:41 am
Steven DAprano Feb 10, 2016 10:18 am
Victor Stinner Feb 10, 2016 10:37 am
Andrew Barnert via Python-Dev Feb 10, 2016 02:01 am
Steve Dower Feb 10, 2016 02:42 am
Stephen J. Turnbull Feb 10, 2016 04:17 am
Steve Dower Feb 10, 2016 04:40 am
Stephen J. Turnbull Feb 10, 2016 08:00 am
Paul Moore Feb 10, 2016 08:30 am
Victor Stinner Feb 10, 2016 08:45 am
Paul Moore Feb 10, 2016 09:28 am
Stephen J. Turnbull Feb 10, 2016 02:51 pm
Andrew Barnert via Python-Dev Feb 10, 2016 07:56 am
Stephen J. Turnbull Feb 10, 2016 02:50 pm
Chris Barker Feb 08, 2016 08:41 pm
eryk sun Feb 09, 2016 12:37 am
Chris Barker - NOAA Federal Feb 09, 2016 01:57 am
Paul Moore Feb 09, 2016 08:08 am
Stephen J. Turnbull Feb 09, 2016 10:00 am
Victor Stinner Feb 09, 2016 09:21 am
eryk sun Feb 09, 2016 01:27 pm
Victor Stinner Feb 09, 2016 09:22 am
eryk sun Feb 09, 2016 01:33 pm
Serhiy Storchaka Feb 10, 2016 12:41 pm
Messages in this thread