A function that employs ctypes to call the underlying UTF-8 APIs for getting command line arguments on Windows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
def win32_utf8_argv(): """Uses shell32.GetCommandLineArgvW to get sys.argv as a list of UTF-8 strings. Versions 2.5 and older of Python don't support Unicode in sys.argv on Windows, with the underlying Windows API instead replacing multi-byte characters with '?'. Returns None on failure. Example usage: >>> def main(argv=None): ... if argv is None: ... argv = win32_utf8_argv() or sys.argv ... """ try: from ctypes import POINTER, byref, cdll, c_int, windll from ctypes.wintypes import LPCWSTR, LPWSTR GetCommandLineW = cdll.kernel32.GetCommandLineW GetCommandLineW.argtypes =  GetCommandLineW.restype = LPCWSTR CommandLineToArgvW = windll.shell32.CommandLineToArgvW CommandLineToArgvW.argtypes = [LPCWSTR, POINTER(c_int)] CommandLineToArgvW.restype = POINTER(LPWSTR) cmd = GetCommandLineW() argc = c_int(0) argv = CommandLineToArgvW(cmd, byref(argc)) if argc.value > 0: # Remove Python executable if present if argc.value - len(sys.argv) == 1: start = 1 else: start = 0 return [argv[i].encode('utf-8') for i in xrange(start, argc.value)] except Exception: pass
As of writing this, Python versions 2.5 and older (and perhaps newer versions) use the non-wchar APIs for populating sys.argv on win32, which will inconveniently replace any non-ASCII characters with question marks (similar to str.encode(..., 'replace')).
For non-win32 systems and systems without ctypes, this simply returns None. If you're trying to debug this function, you may want to replace the "pass" in "except Exception:" with "raise".
This hasn't been tested on win64.
Python 2.5 file operation will use the wchar API correctly if you pass them a unicode object as filename, instead of a byte string. File operations that list directories behave similar, eg. if you pass them u'.' as directory name you get unicode filenames, if you pass '.' you get bytestrings.
It should be possible to get the correct unicode string in a platform independent way with filename = sys.args.decode(sys.getfilesystemencoding()).
OK I didn't read good enough. So you probably can't decode the bytestring arguments. Still, this function should return an unicode object, and not decode the string as utf-8. Otherwise it can't be used with file operations like open().