File: mergeall-products/unzipped/docetc/miscnotes/pre-symlinks-code-jan2617/cpall.py

#!/usr/bin/python
# Python 3.X is recommended for trees with Unicode filenames
"""
################################################################################
Usage:
    [py[thon]] cpall.py dirFrom dirTo [-skipcruft] [-v] [-vv]
    
Recursive copy of a directory tree.  Works like a "cp -rp dirFrom/* dirTo"
Unix command, and assumes that dirFrom and dirTo are both directories.
Was written to get around fatal error messages under Windows drag-and-drop
copies (the first bad file ends the entire copy operation immediately),
but also allows for coding more customized copy operations in Python.
The "-skipcruft" option ignores (does not copy) dirFrom cruft files, as
defined by patterns in mergeall_configs.py.  "-v" and "-vv" change the copy's
verbose level to 1 (dirs) and 2 (dirs+files), from its default 0 (neither).

--------------------------------------------------------------------------------

CHANGE LOG:

FOR MERGEALL 2.0:

Copy stat info too
    Add shutil.copystat option to copyfile, to copy over the original's
    modtime (and other metadata) in addition to content.  This replaces an
    older money-patching approach.  Also for 2.0, add explicit file.close()
    calls, for use outside CPython.

FOR MERGEALL 3.0:

Windows long paths:
    Use fixlongpaths.OPEN() for long Windows file pathnames.  This avoids
    exceptions and skips during updates. 

Exception propagation:
    Allow exceptions to be propagated to caller, instead of printing error
    messages and continuing.  Required for properly cancelling a corresponding
    update when a backup copy fails.

Cruft files
    Filter out (and hence do not copy) files with names matching system cruft
    files defined in mergeall_configs.py if mode "-skipcruft" is used.  This is 
    required here for files in unique FROM dirs whose content is not inspected
    by mergeall itself before a bulk (atomic) copy to TO.  Tree copies for
    backups do not need to filter this way: they only copy data that was already
    in the TO tree.   This copytree() mode can be used when called from other
    programs too, though the cruft definition file is somewhat mergeall-specific
    as coded.

Cruft command-line arg
    Since it's already supported in call mode, added the "-skipcruft"
    command-line option to this script too, for use when run standalone.
    When used, this ignores (does not copy) metadata files like some other
    cut/paste and drag-and-drop copiers, but it's a switchable option here.
    Also added "-v" and "-vv" arguments for verbosity control.

Any error flag
    Set a global boolean to indicate that errors were reported at any point
    during a mergeall run, for a mergeall summary line.  mergeall handles its
    own errors, but tree copies gobble them here.

Mac lib error workaround
    Ignore EINVAL error num 22 ("Invalid argument") if it is raised by
    Python's shutil.copystat().  On Mac OS X, shutil.copystat() can fail
    this way due to an error raised by Mac libraries when trying to copy
    extended attributes with chflags() from a file on a Mac filesystem drive
    (e.g, HFS+) to a file on a non-Mac filesystem drive (e.g., FAT32, exFAT).
    
    This error occurs after all content has been copied, and then only in the
    final copystat() step after it has copied times correctly, so it's safe
    to ignore here in this isolated context.  We could check to ensure that
    modtimes match too, but that seems overkill, and requires ranges for FAT.
    
    Python's shutil should probably ignore this error too, though it may be a
    Mac bug (it also occurs at the shell for a "cp -p" command, which seems
    to create the attribute nonetheless). For more details and examples,
    see docetc/miscnotes/mac-chflags-error22.txt.  This arose because Mac's
    TextEdit adds an extended attribute for encoding type to .txt files...

Unix symlinks: copy, don't follow (and FIFOs okay)
    For Unix symbolic links to both files and dirs: copy the link itself,
    instead of following it.  Else, archives with intra-archive links will
    wind up with multiple copies of the linked data.  Assumes symlinks are
    relative (or else they may not work on a different machine) and uses
    py 3.3+'s follow_links to copy extra stat info from/to links themselves.

    Windows symlinks are complex and remain untested and TBD; they should
    work with the code here as is if the link's target exists.  Also note
    that FIFO files are False for both isfile() and isdir() (and the analogous
    calls in os.stat/lstat()), so they won't be copied unintentionally here.
    For background details, see docetc/miscnotes/demo-3.0-unix-symlinks.txt.
    
################################################################################
"""

from __future__ import print_function      # Added 2.X compatibility

import os, sys, shutil, errno
from fixlongpaths import OPEN              # [3.0] or 'as open', but too obscure 
from skipcruft import filterCruftNames     # [3.0] filter out metadata files

anyErrorsReported = False                  # [3.0] for summary-report indicator

maxfileload = 1000000                      # default file-copy size parameters
blksize = 1024 * 500



def copyinfo(pathFrom, pathTo):
    """
    Copy extra metadata (e.g., modtime) from pathFrom to pathTo, in
    addition to the data itself.  Most of the action here happens in
    Python's shutil module, but must allow a spurious EINVL err #22
    in copystat() to pass for Mac OS X; see [3.0] updates above.

    Also use follow_symlinks to process links themselves, when both
    from and to are links (instead of fetching and setting info from
    and to the link targets); this arg is available in py 3.3+ only.
    """
    if float(sys.version[:3]) < 3.3:                   # [3.0] don't follow
        follow = {}                                    # not available 3.2-
    else:
        follow = dict(follow_symlinks=False)

    try:
        shutil.copystat(pathFrom, pathTo, **follow)    # [2.0] copy modtime, etc
    except OSError as why:
        if why.errno != errno.EINVAL:       # [3.0] ignore err 22 on Macs: moot
            raise                           # propagate all other errnos/excs  



def copylink(pathFrom, pathTo, copystat=True):
    """
    Copy a symbolic link instead of following it.  For links to both files
    and dirs, this copies the symlink itself (the pathname of its link) to
    a new symlink, instead of copying the data that the symlink refers to.
    See [3.0] updates above for more on this extension and its purpose.
    """
    # caller handles all exceptions
    assert os.path.islink(pathFrom)

    if os.path.exists(pathTo):               # else os.symlink() will fail
        os.remove(pathTo)                    # e.g., if modtime has changed
        
    linkPath = os.readlink(pathFrom)         # from link's pathname str
    os.symlink(linkPath, pathTo)             # store pathname as new link
    if copystat:
        copyinfo(pathFrom, PathTo)           # copy extras after content



def copyfile(pathFrom, pathTo, maxfileload=maxfileload, copystat=True):
    """
    Copy one file pathFrom to pathTo, byte for byte.
    Uses binary file modes to supress Unicode decode and endline transform.
    [2.0] Add copystat() call as default, to copy original's metadata too
    [2.0] Recode for explicit close(); old: open(wb).write(open(rb).read())
    [3.0] Use extended OPEN() to support long file pathnames on Windows
    [3.0] Allow EINVL err #22 in copystat() to pass on Macs (see above)
    [3.0] Don't check to see if pathFrom is a real file here: assumes the
          caller won't pass symlinks if they are to be copied, not followed.
    """
    fileFrom = OPEN(pathFrom, 'rb')                  # need 'b' mode for both
    fileTo   = OPEN(pathTo,   'wb')                  # [2.0] open for explicit close
    try:
        if os.path.getsize(pathFrom) <= maxfileload:
            bytesFrom = fileFrom.read()              # read small files all at once
            fileTo.write(bytesFrom)
        else:                                        # read big files in chunks
            while True:
                bytesFrom = fileFrom.read(blksize)   # get one block, less at end
                if not bytesFrom: break              # empty after last chunk
                fileTo.write(bytesFrom)
    finally:
        fileTo.close()                               # [2.0] explicit for non-CPython
        fileFrom.close()                             # except or not (or with: eibti)

    if copystat:
        copyinfo(pathFrom, PathTo)                   # copy extras after content



def copytree(dirFrom, dirTo, verbose=0, strict=False, skipcruft=False):
    """
    Copy contents of dirFrom and below to dirTo, return (files, dirs) counts.
    verbose: 1=print directories, 2=also print files, 0=print neither.
    May need to use bytes for dirnames if undecodable on other platforms.
    May need to do more file-type checking on Unix: skip links, fifos, etc.
    Py 3.5+ os.scandir() may help here, but time is dominated by file copies.

    [3.0] If strict, reraise and exit all recursive levels immediately on
    on any first exception here.  mergeall backup copies pass True to cancel
    the update or delete on a backup copy failure.  mergeall non-backup
    callers instead allow this to print a message and continue the copy.
    
    [3.0] If skipcruft, skip cruft files in dirFrom.  This was added for
    mergeall bulk copies of folders to the TO drive, but can also be used
    in other programs, and when run from a command line with "-skipcruft".

    [3.0] Copy, instead of following, any symbolic links to files and dirs
    along the way.  Else may wind up copying files and dirs more than once.
    Also recode logic to rule out FIFOs, which are neither isfile nor isdir.
    """
    fcount = dcount = 0
    itemsHere = os.listdir(dirFrom)
    if skipcruft:
        itemsHere = filterCruftNames(itemsHere)             # [3.0] ignore cruft

    for filename in itemsHere:                              # for files/dirs here
        pathFrom = os.path.join(dirFrom, filename)
        pathTo   = os.path.join(dirTo,   filename)          # extend both paths
        if not os.path.isdir(pathFrom):                     # copy simple files
            try:
                if verbose > 1: print('copying file', pathFrom, 'to', pathTo)
                copyfile(pathFrom, pathTo)
                fcount += 1
            except:
                print('**Error copying', pathFrom, 'to', pathTo, '--skipped')
                print(sys.exc_info()[0], sys.exc_info()[1])
                anyErrorsReported = True   # [3.0] flag for summary line
                if strict: raise           # [3.0] reraise, else continue
        else:
            if verbose: print('copying dir ', pathFrom, 'to', pathTo)
            try:
                os.mkdir(pathTo)                            # make new subdir
                below = copytree(                           # recur into subdirs
                            pathFrom, pathTo,               # propagate excs up
                            verbose, strict, skipcruft)     
                fcount += below[0]                          # add subdir counts
                dcount += below[1]
                dcount += 1
            except:
                print('**Error creating', pathTo, '--skipped')
                print(sys.exc_info()[0], sys.exc_info()[1])
                anyErrorsReported = True   # [3.0] flag for summary line
                if strict: raise           # [3.0] reraise, else continue
    return (fcount, dcount)



def getargs():
    """
    Get and verify directory name arguments, returns default None on errors.
    """
    try:
        dirFrom, dirTo = sys.argv[1], sys.argv[2]
        assert all(arg in ['-skipcruft', '-v', '-vv'] for arg in sys.argv[3:])
    except:
        print('Usage error: '
              '[py[thon]] cpall.py dirFrom dirTo [-skipcruft] [-v] [-vv]')
    else:
        skipcruft = '-skipcruft' in sys.argv
        verbose = 2 if '-vv' in sys.argv else (1 if '-v' in sys.argv else 0)
        if not os.path.isdir(dirFrom):
            print('Error: dirFrom is not a directory')
        elif not os.path.exists(dirTo):
            os.mkdir(dirTo)
            print('Note: dirTo was created')
            return (dirFrom, dirTo, skipcruft, verbose)
        else:
            print('Warning: dirTo already exists')
            if hasattr(os.path, 'samefile'):
                same = os.path.samefile(dirFrom, dirTo)
            else:
                same = os.path.abspath(dirFrom) == os.path.abspath(dirTo)
            if same:
                print('Error: dirFrom same as dirTo')
            else:
                return (dirFrom, dirTo, skipcruft, verbose)



if __name__ == '__main__':
    """
    Stand-alone/command-line mode.
    cpall is useful both standalone and as callable functions;
    see mergeall's use of the latter to compare files and trees;
    """
    # [oct16] python/platform-specific current time (secs)
    import time
    gettime = (time.perf_counter if hasattr(time, 'perf_counter') else
              (time.clock if sys.platform.startswith('win') else time.time)) 

    # parse args, run copy
    argstuple = getargs()
    if argstuple:
        dirFrom, dirTo, skipcruft, verbose = argstuple
        print('Copying...')
        
        starttime = gettime()
        fcount, dcount = copytree(dirFrom, dirTo,
                                  skipcruft=skipcruft, verbose=verbose)
        tottime = gettime() - starttime

        dcount += 1  # for the root
        print('Copied', fcount, 'files,', dcount, 'directories', end=' ')
        print('in', tottime, 'seconds')



[Home page] Books Code Blog Python Author Train Find ©M.Lutz