| Store | Cart

Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

From: eryk sun <eryk...@gmail.com>
Tue, 9 Feb 2016 07:33:19 -0600
On Tue, Feb 9, 2016 at 3:22 AM, Victor Stinner <vict...@gmail.com> wrote:
> 2016-02-09 1:37 GMT+01:00 eryk sun <eryk...@gmail.com>:>> For example, in codepage 932 (Japanese), it's an error if a lead byte>> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a>> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not>> uncommon). In this case the ANSI API substitutes the default character>> for Japanese, '・' (U+30FB, Katakana middle dot).>>>>     >>> locale.getpreferredencoding()>>     'cp932'>>     >>> open(b'\xe05', 'w').close()>>     >>> os.listdir('.')>>     ['・']>>     >>> os.listdir(b'.')>>     [b'\x81E']>>>> All invalid sequences get mapped to '・', which roundtrips as>> b'\x81\x45', so you can't reliably create and open files with>> arbitrary bytes paths in this locale.>> Oh, and I forgot to ask: what is your filesystem? Is it the same> behaviour for NTFS, FAT32, network shared directories, etc.?
That was tested using NTFS, but the same would apply to FAT32, exFAT,
and UDF since they all use Unicode [1]. CreateFile[A|W] wraps the
NtCreateFile system call. The NT executive is Unicode, so the system
call receives the filename using a Unicode-only OBJECT_ATTRIBUTES [2]
record. I can't say what an arbitrary non-Microsoft filesystem will do
with the U+30FB character when it processes the IRP_MJ_CREATE. I was
only concerned with ANSI<=>Unicode conversion that's implemented in
the ntdll.dll runtime library.

[1]: https://msdn.microsoft.com/en-us/library/ee681827
[2]: https://msdn.microsoft.com/en-us/library/ff557749
_______________________________________________
Python-Dev mailing list
Pyth...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/python-dev-ml%40activestate.com
Recent Messages in this Thread
Victor Stinner Feb 08, 2016 02:32 pm
Victor Stinner Feb 08, 2016 02:40 pm
Matthias Bussonnier Feb 08, 2016 04:01 pm
Brett Cannon Feb 08, 2016 05:02 pm
Alexander Walters Feb 08, 2016 05:10 pm
Victor Stinner Feb 09, 2016 10:13 am
Paul Moore Feb 09, 2016 11:35 am
Paul Moore Feb 08, 2016 06:26 pm
Victor Stinner Feb 09, 2016 01:03 pm
Steve Dower Feb 10, 2016 01:37 am
Chris Angelico Feb 10, 2016 01:41 am
Steven DAprano Feb 10, 2016 10:18 am
Victor Stinner Feb 10, 2016 10:37 am
Andrew Barnert via Python-Dev Feb 10, 2016 02:01 am
Steve Dower Feb 10, 2016 02:42 am
Stephen J. Turnbull Feb 10, 2016 04:17 am
Steve Dower Feb 10, 2016 04:40 am
Stephen J. Turnbull Feb 10, 2016 08:00 am
Paul Moore Feb 10, 2016 08:30 am
Victor Stinner Feb 10, 2016 08:45 am
Paul Moore Feb 10, 2016 09:28 am
Stephen J. Turnbull Feb 10, 2016 02:51 pm
Andrew Barnert via Python-Dev Feb 10, 2016 07:56 am
Stephen J. Turnbull Feb 10, 2016 02:50 pm
Chris Barker Feb 08, 2016 08:41 pm
eryk sun Feb 09, 2016 12:37 am
Chris Barker - NOAA Federal Feb 09, 2016 01:57 am
Paul Moore Feb 09, 2016 08:08 am
Stephen J. Turnbull Feb 09, 2016 10:00 am
Victor Stinner Feb 09, 2016 09:21 am
eryk sun Feb 09, 2016 01:27 pm
Victor Stinner Feb 09, 2016 09:22 am
eryk sun Feb 09, 2016 01:33 pm
Serhiy Storchaka Feb 10, 2016 12:41 pm
Messages in this thread