File: mergeall-products/unzipped/test/test-path-normalization-3.3/prototype-recoding-oct22/py-split-join.py
r""" ========================================================================= Mergeall demo script: prototype for path normalization. Scan arbitrary paths one component at a time, on Windows and Unix. This allows each component's NFC/NFD Unicode representation to be normalized to match its counterpart in the same path in a target tree. This coding is convoluted mostly to support wacky Windows paths and Python's uneven handling of them. os.path.join(), for example, doesn't do Windows absolute (C:\xxx) and drive-relative (C:xxx) very well; both spit off a drive (C:) but a post-split join may drop \ in both (os.path is ntpath on Windows and posixpath on Unix): >>> import ntpath as nt >>> nt.join(*r'C:\aaa\bbb'.split(nt.sep)) 'C:aaa\\bbb' >>> >>> nt.join(*r'C:aaa\bbb'.split(nt.sep)) 'C:aaa\\bbb' A similar dissonance arises for the leading / in Unix absolute and relative paths: >>> import posixpath as pt >>> pt.join(*'/aaa/bbb'.split(pt.sep)) 'aaa/bbb' >>> pt.join(*'aaa/bbb'.split(pt.sep)) 'aaa/bbb' For all the gory background details on Windows path syntax, try this: https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats (Why the mess? In general, Windows allows network storage to be accessed by arguably ad-hoc and convoluted syntax rather than Unix-style mount points in a uniform tree model, though network storage can optionally and similarly be associated with Windows drive letters too.) This coding handles almost everything, _except_ exceedingly rare Windows "\\.\UNC\" and "\\?\UNC"--which are officially not supported here barring a user request. These paths will generate a warning and skip normalizing to the target device's format. The only real alternatives are to never normalize paths, or normalize the entire path (not each component) and assume all components' Unicode encoding flavors will be the same. That is unsound, given that content may pass through a mix of hosts and apps. Context: This path-walker procedure is used when about to delete a file whose content-relative path is listed in the __added__.txt file of a deltas or backup set. The listed path is partial, and relative to the content root folder in both the FROM and TO trees. Because this path is not absolute, it cannot contain drive or network specifiers; hence, we're interested only in normalizing its component folder and file names to the same in the destination-device tree. Importantly, though, the NFC/NFD Unicode variants in path names in TO may differ arbitrarily from those in __added__.txt, because the TO and FROM trees may reside on different platforms, and may have been processed by programs with arbitrary Unicode policies over their lifespans. Also note: - The __added__.txt partial path has already had its separators changed for the target device, to make it portable. - This test script doesn't care about too-long Windows paths, but the live code will; it wraps each path in FWP() to test existence. Paths here have not yet had FWP() prefixing applied. - The "TO" root path which is prefixed to the __added__.txt path to yield the destination path we walk may come in here as relative, absolute, or other; it originates in a command line, and hasn't been made absolute prior to this procedure. Though complex, it's crucial to get this right, because the resulting normalized path will be deleted in TO. That said, most (if not all) __added__.txt paths exist without normalization, and will simply bypass this procedure entirely. At least until they do not. ========================================================================= """ trace = lambda *a: None # print or lambda *a: None def walkparts(path, mod): """ mod would be os.path in portable code it's ntpath or posixpath here to test both """ print() print('==>', '"%s"' % path) trace('===', mod.abspath(path)) # to see what it really means trace('-->', path.split(mod.sep)) # preview the parts split list trace('~~~', mod.splitdrive(path)) # required for Windows shenanigans drive, rest = mod.splitdrive(path) if drive and rest.startswith(mod.sep): # drive for abs and drive-relative sofar = drive # rest starts with sep iff abs parts = rest.split(mod.sep) trace(':::', parts) else: sofar = '' # no drive, normal components parts = path.split(mod.sep) if parts[0] == '': # empty for abs path/rest, win+ux sofar += mod.sep # make join() work, skip the empty parts = parts[1:] while parts: next, *parts = parts newpath = mod.join(sofar, next) # test/mod 'next' extension to 'sofar' here print('...', '"%s" =' % newpath, '(%s) + [%s]' % (sofar, next), '<%s>' % mod.exists(newpath)) sofar = newpath return sofar def testWindowsPaths(): print('\n\ntesting windows paths'.upper() + '=' * 40) import ntpath as mod paths = [ r'C:\Users\lutz\file.txt', # absolute r'C:Users\lutz\file.txt', # relative to drive's cwd r'\Users\lutz\file.txt', # relative to current drive's root r'Users\lutz\file.txt', # relative to process cwd r'\\Server\Share\folder\file.ext', # unc network shares, generally r'\\readyshare\USB_Storage\Temp\file.txt', # a live samba server path r'\\.\C:\Users\lutz\file.txt', # device paths r'\\?\C:\Users\lutz\file.txt', # enable long paths r'\\127.0.0.1\c$\Users\lutz\file.txt', # don't ask r'\\LOCALHOST\c$\Users\lutz\file.txt', r'\\.\UNC\127.0.0.1\c$\Users\lutz\file.txt', # FAILS r'\\.\UNC\LOCALHOST\c$\Users\lutz\file.txt', # FAILS r'\\?\UNC\LOCALHOST\c$\Users\lutz\file.txt', # FAILS r'C:\Users\lutz\Desktop\..\file.txt', # parent reference 'C:\\', # drive root 'C:', # drive relative '.', # process cwd r'.\Users\lutz', # process-cwd relative r'c:\users\Lutz\fILE.txt', # case insensitive '' # empty - cannot happen, but okay ] for path in paths: result = walkparts(path, mod) if path == '': print('~~~', result) # '' yields '\' but impossible else: assert result == path # all others should be as passed def testUnixPaths(): print('\n\ntesting unix paths'.upper() + '=' * 40) import posixpath as mod paths = [ '/Users/lutz/file.txt', # absolute 'Users/lutz/file.txt', # relative to process cwd '/Users/lutz/Desktop/../file.txt', # parent reference '/', # drive root '.', # process cwd './Users/lutz', # process-cwd relative '/users/Lutz/fILE.txt', # case sensitive (when run on Unix) '' # empty - cannot happen, but okay ] for path in paths: result = walkparts(path, mod) if path == '': print('~~~', result) # '' yields '/' but impossible else: assert result == path # all others should be as passed if __name__ == '__main__': # Goto the relative-paths root (or not) import os, sys if sys.platform.startswith('win'): os.chdir('C:\\') else: os.chdir('/') """ --------------------------------------------------------------------- Test Windows and Posix paths on either Windows or Unix, by directly using ntpath and posixpath modules (os imports one as path, by host). Both modules do path parsing on either platform, but file existance will vary: most Windows-paths won't exist on Unix, Unix paths will be treated as drive- or cwd-relative on Windows (where "/" == "\"), and case may matter on Unix but not Windows (subject to filesystems). User name may vary too; on Unix, did "su; ln -s me lutz" to equate. --------------------------------------------------------------------- """ testWindowsPaths() testUnixPaths()