File: mergeall-products/unzipped/docetc/miscnotes/pre-longpaths-code-mar0617/cpall.py

#!/usr/bin/python
# Python 3.X is recommended for trees with Unicode filenames and symlinks
"""
################################################################################
Usage:
    [py[thon]] cpall.py dirFrom dirTo [-skipcruft] [-v] [-vv]
    
Recursive copy of a directory tree.  Works like a "cp -rp dirFrom/* dirTo"
Unix command, and assumes that dirFrom and dirTo are both directories.
Was written to get around fatal error messages under Windows drag-and-drop
copies (the first bad file ends the entire copy operation immediately),
but also allows for coding more customized copy operations in Python.

The "-skipcruft" option ignores (does not copy) dirFrom cruft files, as
defined by patterns in mergeall_configs.py.  "-v" and "-vv" change the copy's
verbose level to 1 (dirs) and 2 (dirs+files), from its default 0 (neither).
Symlinks are always copied, not followed, to avoid redundant data copies.
Fifos and any other exotic non-file/dir types are unsupported and skipped.

--------------------------------------------------------------------------------
CHANGE LOG

FOR MERGEALL 2.0:

Copy stat info too
    Add shutil.copystat option to copyfile, to copy over the original's
    modtime (and other metadata) in addition to content.  This replaces an
    older money-patching approach.  Also for 2.0, add explicit file.close()
    calls, for use outside CPython.

FOR MERGEALL 3.0:

Windows long paths:
    Use fixlongpaths.OPEN() for long Windows file pathnames.  This avoids
    exceptions and skips during updates. 

Exception propagation:
    Allow exceptions to be propagated to caller, instead of printing error
    messages and continuing.  Required for properly cancelling a corresponding
    update when a backup copy fails.

Cruft files
    Filter out (and hence do not copy) files with names matching system cruft
    files defined in mergeall_configs.py if mode "-skipcruft" is used.  This is 
    required here for files in unique FROM dirs whose content is not inspected
    by mergeall itself before a bulk (atomic) copy to TO.  Tree copies for
    backups do not need to filter this way: they only copy data that was already
    in the TO tree.   This copytree() mode can be used when called from other
    programs too, though the cruft definition file is somewhat mergeall-specific
    as coded.

Cruft command-line arg
    Since it's already supported in call mode, added the "-skipcruft"
    command-line option to this script too, for use when run standalone.
    When used, this ignores (does not copy) metadata files like some other
    cut/paste and drag-and-drop copiers, but it's a switchable option here.
    Also added "-v" and "-vv" arguments for verbosity control.

Any error flag
    Set a global boolean to indicate that errors were reported at any point
    during a mergeall run, for a mergeall summary line.  mergeall handles its
    own errors, but tree copies gobble them here.

Mac lib error workaround
    Ignore EINVAL error num 22 ("Invalid argument") if it is raised by
    Python's shutil.copystat().  On Mac OS X, shutil.copystat() can fail
    this way due to an error raised by Mac libraries when trying to copy
    extended attributes with chflags() from a file on a Mac filesystem drive
    (e.g, HFS+) to a file on a non-Mac filesystem drive (e.g., FAT32, exFAT).
    
    This error occurs after all content has been copied, and then only in the
    final copystat() step after it has copied times correctly, so it's safe
    to ignore here in this isolated context.  We could check to ensure that
    modtimes match too, but that seems overkill, and requires ranges for FAT.
    
    Python's shutil should probably ignore this error too, though it may be a
    Mac bug (it also occurs at the shell for a "cp -p" command, which seems
    to create the attribute nonetheless). For more details and examples,
    see docetc/miscnotes/mac-chflags-error22.txt.  This arose because Mac's
    TextEdit adds an extended attribute for encoding type to .txt files...

Unix symlinks: copy, don't follow (and FIFOs are skipped)
    For Unix symbolic links to both files and dirs, always copy the link
    itself, instead of following it (i.e., copy the link's path, not the item
    it refers to).  Otherwise, archives with intra-archive links will wind up
    with multiple copies of the linked data, for both mergeall copies and
    backups.  This policy assumes symlinks are both relative and intra-archive,
    else they may not work on a different machine.

    In mergeall, the symlinks extension was coded as pretests to minimize
    impacts to existing code, and relies implictly on the fact that cpfile()
    and cptree() here were also augmented to check for and copy links up-front,
    before attempting to copy actual items.  The code here also uses py 3.3+'s
    follow_links, if present, to copy extra stat info from/to links themselves.

    Windows symlinks work with this code too, but require admin permission,
    and the portability of symlink paths between Windows and Unix is poor.
    Also note that FIFO files are False for _both_ isfile() and isdir() (and
    similar os.stat/lstat tools), so they won't be copied here unintentionally.
    For more background details, see these session logs:
    
      docetc/miscnotes/demo-3.0-unix-symlinks.txt
      docetc/miscnotes/demo-3.0-windows-symlinks.txt.
    
################################################################################
"""

from __future__ import print_function      # Added 2.X compatibility

import os, sys, shutil, errno
from fixlongpaths import OPEN              # [3.0] or 'as open', but too obscure 
from skipcruft import filterCruftNames     # [3.0] filter out metadata files

anyErrorsReported = False                  # [3.0] for summary-report indicator

maxfileload = 1000000                      # default file-copy size parameters
blksize = 1024 * 500



def copyinfo(pathFrom, pathTo):
    """
    ---------------------------------------------------------------------------
    Copy extra metadata (e.g., modtime) from pathFrom to pathTo, in
    addition to the data itself.  Most of the action here happens in
    Python's shutil module, but must allow a spurious EINVL err #22
    in copystat() to pass for Mac OS X; see [3.0] updates above.

    Also use "follow_symlinks" to process links themselves, when both
    from and to are links (instead of fetching and setting info from
    and to link targets).  In shutil, this arg is ignored for non-link
    items, and is available and used in Py 3.3 and later only.  Windows'
    os.utime() used by shutil.copystat() doesn't support this arg either,
    but shutil simply makes utime a no-op that ignores the arg and does
    not copy link modtimes (which is irrelevant for mergeall compares). 
    ---------------------------------------------------------------------------
    """
    # links, not their targets
    if float(sys.version[:3]) < 3.3:           # [3.0] don't follow links
        follow = {}                            # not available in py 3.2-
    else:                                      # ignored for nonlinks
        follow = dict(follow_symlinks=False)

    # copy modtime, etc.
    try:
        shutil.copystat(pathFrom, pathTo, **follow)
    except OSError as why:
        if why.errno != errno.EINVAL:       # [3.0] ignore err 22 on Macs: moot
            raise                           # propagate all other errnos/excs  



def copylink(pathFrom, pathTo, copystat=True, verbose=1):
    """
    ---------------------------------------------------------------------------
    Copy a symbolic link instead of following it.  For links to both files
    and dirs, this copies the symlink itself (the pathname of its link) to
    a new symlink, instead of copying the data that the symlink refers to.
    See [3.0] updates above for more on this extension and its purpose.

    Removes item at target if it's a link, else symlink() fails when target
    exists - unlike file open().write().  All other existing target types
    must be removed by the caller (e.g., mergeall removes dirs and files,
    and only ever calls this with an existing pathTo for link+link diffs).
    
    On Windows, links are type-specific.  os.symlink() gets type from the
    target if it exists (in TO, not FROM), else type defaults to file link
    unless target_is_directory=True is passed.  We need to pass this here,
    because there are multiple ways we may copy the link _before_ the dir
    when resolving a folder in mergeall.  This argument reflects the target
    in FROM, is ignored on Unix as of Py 3.3, and isn't present in Py 2.X.
    ---------------------------------------------------------------------------
    """
    # caller handles all exceptions
    assert os.path.islink(pathFrom)
    if verbose > 0:
        print('propagating symlink', pathFrom)

    # windows dir-link arg
    if (os.path.isdir(pathFrom) and          # not suported in 2.X
        sys.platform.startswith('win') and   # not okay on unix till 3.3
        int(sys.version[0]) >= 3):
        dirarg = dict(target_is_directory=True)
    else:
        dirarg ={}
                      
    # remove current link                    # lexists: link, not its target
    if os.path.lexists(pathTo):              # else os.symlink() will fail
        os.remove(pathTo)                    # e.g., if modtime has changed

    # copy linkpath over 
    linkPath = os.readlink(pathFrom)         # the from link's pathname str
    os.symlink(linkPath, pathTo, **dirarg)   # store pathname as new link
    if copystat:
        copyinfo(pathFrom, pathTo)           # copy extras after content



def copyfile(pathFrom, pathTo, maxfileload=maxfileload, copystat=True):
    """
    ---------------------------------------------------------------------------
    Copy one file pathFrom to pathTo, byte for byte.
    Uses binary file modes to supress Unicode decode and endline transform.
    [2.0] Add copystat() call as default, to copy original's metadata too.
    [2.0] Recode for explicit close(); old: open(wb).write(open(rb).read()).
    [3.0] Use extended OPEN() to support long file pathnames on Windows.
    [3.0] Allow EINVL err #22 in copystat() to pass on Macs (see above).
    [3.0] For symlinks to files or dirs, copy the link instead of following it.
    ---------------------------------------------------------------------------
    """
    if os.path.islink(pathFrom):                     # [3.0]: link to file (or dir)
        copylink(pathFrom, pathTo, copystat)         # copy link, don't follow it
        return                                       # minimize nesting
    
    fileFrom = OPEN(pathFrom, 'rb')                  # need 'b' mode for both
    fileTo   = OPEN(pathTo,   'wb')                  # [2.0] open for explicit close
    try:
        if os.path.getsize(pathFrom) <= maxfileload:
            bytesFrom = fileFrom.read()              # read small files all at once
            fileTo.write(bytesFrom)
        else:                                        # read big files in chunks
            while True:
                bytesFrom = fileFrom.read(blksize)   # get one block, less at end
                if not bytesFrom: break              # empty after last chunk
                fileTo.write(bytesFrom)
    finally:
        fileTo.close()                               # [2.0] explicit for non-CPython
        fileFrom.close()                             # except or not (or with: eibti)

    if copystat:
        copyinfo(pathFrom, pathTo)                   # copy extras after content



def copytree(dirFrom, dirTo, verbose=0, strict=False, skipcruft=False):
    """
    ---------------------------------------------------------------------------
    Copy contents of dirFrom and below to dirTo, return (files, dirs) counts.
    verbose: 1=print directories, 2=also print files, 0=print neither.
    May need to use bytes for dirnames if undecodable on other platforms.
    May need to do more file-type checking on Unix: skip links, fifos, etc.
    Py 3.5+ os.scandir() may help here, but time is dominated by file copies.

    [3.0] If strict, reraise and exit all recursive levels immediately on
    on any first exception here.  mergeall backup copies pass True to cancel
    the update or delete on a backup copy failure.  mergeall non-backup
    callers instead allow this to print a message and continue the copy.
    
    [3.0] If skipcruft, skip cruft files in dirFrom.  This was added for
    mergeall bulk copies of folders to the TO drive, but can also be used
    in other programs, and when run from a command line with "-skipcruft".

    [3.0] For symlinks to files or dirs, copy the link instead of following
    it.  The pretest here runs only at the top-level; nested links to dirs
    are grouped with simple files during the recursive traversal to avoid
    os.mkdir.  os.path.isfile()/isdir() both return True for real items and
    links to them.  Also recode logic to rule out FIFOs, which are neither
    isfile() nor isdir(); these are not counted as errors here - ok? (TBD).
    ---------------------------------------------------------------------------
    """
    if os.path.islink(dirFrom):           # [3.0]: link to dir (or file)
        copylink(dirFrom, dirTo)          # copy link, don't follow it
        return                            # minimize nesting

    fcount = dcount = 0
    itemsHere = os.listdir(dirFrom)
    if skipcruft:
        itemsHere = filterCruftNames(itemsHere)             # [3.0] ignore cruft

    for filename in itemsHere:                              # for files/dirs here
        pathFrom = os.path.join(dirFrom, filename)
        pathTo   = os.path.join(dirTo,   filename)          # extend both paths
        
        if os.path.isfile(pathFrom) or os.path.islink(pathFrom):
            # copy simple files, and links to files and dirs
            if verbose > 1: print('copying file', pathFrom, 'to', pathTo)
            try:
                copyfile(pathFrom, pathTo)                  # [3.0] file or link
                fcount += 1
            except:
                print('**Error copying', pathFrom, 'to', pathTo, '--skipped')
                print(sys.exc_info()[0], sys.exc_info()[1])
                anyErrorsReported = True   # [3.0] flag for summary line
                if strict: raise           # [3.0] reraise, else continue

        elif os.path.isdir(pathFrom):
            # copy entire folders: actual dirs, not links to them
            if verbose: print('copying dir ', pathFrom, 'to', pathTo)
            try:
                os.mkdir(pathTo)                            # make new subdir
                below = copytree(                           # recur into subdirs
                            pathFrom, pathTo,               # propagate excs up
                            verbose, strict, skipcruft)     
                fcount += below[0]                          # add subdir counts
                dcount += below[1]
                dcount += 1
            except:
                print('**Error creating', pathTo, '--skipped')
                print(sys.exc_info()[0], sys.exc_info()[1])
                anyErrorsReported = True   # [3.0] flag for summary line
                if strict: raise           # [3.0] reraise, else continue

        else:
            # fifo, or other non-file item: punt
            print('**Unsupported file type not copied:', pathFrom)
            
    return (fcount, dcount)



def getargs():
    """
    ---------------------------------------------------------------------------
    Get and verify directory name arguments, returns default None on errors.
    ---------------------------------------------------------------------------
    """
    try:
        dirFrom, dirTo = sys.argv[1], sys.argv[2]
        assert all(arg in ['-skipcruft', '-v', '-vv'] for arg in sys.argv[3:])
    except:
        print('Usage error: '
              '[py[thon]] cpall.py dirFrom dirTo [-skipcruft] [-v] [-vv]')
    else:
        skipcruft = '-skipcruft' in sys.argv
        verbose = 2 if '-vv' in sys.argv else (1 if '-v' in sys.argv else 0)
        if not os.path.isdir(dirFrom):
            print('Error: dirFrom is not a directory')
        elif not os.path.exists(dirTo):
            os.mkdir(dirTo)
            print('Note: dirTo was created')
            return (dirFrom, dirTo, skipcruft, verbose)
        else:
            print('Warning: dirTo already exists')
            if hasattr(os.path, 'samefile'):
                same = os.path.samefile(dirFrom, dirTo)
            else:
                same = os.path.abspath(dirFrom) == os.path.abspath(dirTo)
            if same:
                print('Error: dirFrom same as dirTo')
            else:
                return (dirFrom, dirTo, skipcruft, verbose)



if __name__ == '__main__':
    """
    ---------------------------------------------------------------------------
    Stand-alone/command-line mode.
    cpall is useful both standalone and as callable functions;
    see mergeall's use of the latter to compare files and trees;
    ---------------------------------------------------------------------------
    """
    # [oct16] python/platform-specific current time (secs)
    import time
    gettime = (time.perf_counter if hasattr(time, 'perf_counter') else
              (time.clock if sys.platform.startswith('win') else time.time)) 

    # parse args, run copy
    argstuple = getargs()
    if argstuple:
        dirFrom, dirTo, skipcruft, verbose = argstuple
        print('Copying...')
        
        starttime = gettime()
        fcount, dcount = copytree(dirFrom, dirTo,
                                  skipcruft=skipcruft, verbose=verbose)
        tottime = gettime() - starttime

        dcount += 1  # for the root
        print('Copied', fcount, 'files,', dcount, 'directories', end=' ')
        print('in', tottime, 'seconds')



[Home page] Books Code Blog Python Author Train Find ©M.Lutz