Welcome, guest | Sign In | My Account | Store | Cart

A function that employs ctypes to call the underlying UTF-8 APIs for getting command line arguments on Windows.

Python, 43 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def win32_utf8_argv():                                                                                               
    """Uses shell32.GetCommandLineArgvW to get sys.argv as a list of UTF-8                                           
    strings.                                                                                                         
                                                                                                                     
    Versions 2.5 and older of Python don't support Unicode in sys.argv on                                            
    Windows, with the underlying Windows API instead replacing multi-byte                                            
    characters with '?'.                                                                                             
                                                                                                                     
    Returns None on failure.                                                                                         
                                                                                                                     
    Example usage:                                                                                                   
                                                                                                                     
    >>> def main(argv=None):                                                                                         
    ...    if argv is None:                                                                                          
    ...        argv = win32_utf8_argv() or sys.argv                                                                  
    ...                                                                                                              
    """                                                                                                              
                                                                                                                     
    try:                                                                                                             
        from ctypes import POINTER, byref, cdll, c_int, windll                                                       
        from ctypes.wintypes import LPCWSTR, LPWSTR                                                                  
                                                                                                                     
        GetCommandLineW = cdll.kernel32.GetCommandLineW                                                              
        GetCommandLineW.argtypes = []                                                                                
        GetCommandLineW.restype = LPCWSTR                                                                            
                                                                                                                     
        CommandLineToArgvW = windll.shell32.CommandLineToArgvW                                                       
        CommandLineToArgvW.argtypes = [LPCWSTR, POINTER(c_int)]                                                      
        CommandLineToArgvW.restype = POINTER(LPWSTR)                                                                 
                                                                                                                     
        cmd = GetCommandLineW()                                                                                      
        argc = c_int(0)                                                                                              
        argv = CommandLineToArgvW(cmd, byref(argc))                                                                  
        if argc.value > 0:                                                                                           
            # Remove Python executable if present                                                                    
            if argc.value - len(sys.argv) == 1:                                                                      
                start = 1                                                                                            
            else:                                                                                                    
                start = 0                                                                                            
            return [argv[i].encode('utf-8') for i in                                                                 
                    xrange(start, argc.value)]                                                                       
    except Exception:                                                                                                
        pass

As of writing this, Python versions 2.5 and older (and perhaps newer versions) use the non-wchar APIs for populating sys.argv on win32, which will inconveniently replace any non-ASCII characters with question marks (similar to str.encode(..., 'replace')).

For non-win32 systems and systems without ctypes, this simply returns None. If you're trying to debug this function, you may want to replace the "pass" in "except Exception:" with "raise".

This hasn't been tested on win64.

2 comments

Martin Renold 10 years, 9 months ago  # | flag

Python 2.5 file operation will use the wchar API correctly if you pass them a unicode object as filename, instead of a byte string. File operations that list directories behave similar, eg. if you pass them u'.' as directory name you get unicode filenames, if you pass '.' you get bytestrings.

It should be possible to get the correct unicode string in a platform independent way with filename = sys.args[1].decode(sys.getfilesystemencoding()).

Martin Renold 10 years, 9 months ago  # | flag

OK I didn't read good enough. So you probably can't decode the bytestring arguments. Still, this function should return an unicode object, and not decode the string as utf-8. Otherwise it can't be used with file operations like open().