File: mergeall-products/unzipped/docetc/miscnotes/pre-longpaths-code-mar0617/backup.py

"""
==========================================================================================
backup.py:
  automatic backup/restore subsystem (part of the mergeall system [2.0])

Summary:

Make backup copies of all files and directories in the TO directory that will
be destructively replaced or deleted in-place during a mergeall run.  These 
items' prior versions in the TO tree are saved in the automatically-created 
__bkp__ folder at the top of the TO archive, with their full directory paths.
Backups are not synchronized across trees, but are automatically pruned by age 
when their number exceeds a limit.  This option makes mergeall generally safer,
as unwanted or failed changes can be later undone by restoring backup copies.

2.1 extends this model to support complete rollback of a prior run, by merging
a backup subfolder to an archive root.  It augments backups to also record files
added in a __bkp__\subfolder\__added__.txt file, and runs a special merge mode
which does not delete unique items in TO, but does delete all items in TO that
are listed in the backup's __added__.txt file (which is itself removed from TO
after the merge).  The net effect restores the TO tree's prior state, as long
as the restore run is made before any additional changes in TO.

Details:

Because a given item name may appear at multiple places in the archive tree,
replaced and deleted items are backed up in folders of this form (the mergeall
log gives a more linear list of changed items):

    TO-archive-root\__bkp__\date-time\full-archive-path-less-root\filename

Items in the __bkp__ folder are local to the TO archive copy, and save items changed in
that copy only.  Backup folders are not propagated to other archive copies by mergeall:
unlike normal archive items, they are not synchronized across trees.  Hence any changes
in __bkp__ folders are not themselves backed up.  Backup subfolders in __bkp__ are,
however, automatically pruned by age when their number exceeds a limit (changeable
in the code below).  This allows changes from any of multiple mergeall runs to be
selected for restores, if needed.

Any errors while writing a backup copy of an item causes the replacement or deletion
of that item to be skipped, as the operation is unsafe.  The item will register as
a difference on the next mergeall run, if not resolved manually.

Note that users can make arbitrary changes to their __bkp__ folders (including deleting
them altogether), as they are not synchronized and are created automatically whenever
needed.  For instance, if the default pruning limit results in too much space being
taken by __bkp__, you may delete portions of it freely.

Also note that __bkp__ folders may register differences in diffall.py runs.  As each
run has its own uniquely-named subfolders, these will usually be few, and can generally
be ignored.  Per-run folders' content differences may be useful, however, to compare and
analyze backup copies from different runs; run diffall on the subfolders themselves.

Purpose:

Though this option makes mergeall runs safer in general, it was primarily intended as an
automatic guard against propagating corrupted files across multiple archive copies, and
an better alternative to manual backup copies or multiple archive stores cycled by date
manually.  Merges generally works without issue if all but one archive copy are treated
as read-only; however, unwanted or incorrect changes may spread though archive copies
if changes are made in an archive copy and then synched back to the main copy.  In such
cases, the original's versions of files may be quickly lost, if not backed up.

By backing up items to be changed in each TO copy automatically and retaining multiple
runs' backups, bad copies can generally be undone easily by taking the most recent
backup's versions, without falling back to potentially much older full archive copies.
Backup folders also serve as a record of the most recent mergeall runs against a tree.

------------------------------------------------------------------------------------------

More about __bkp__ folders:

Stores __bkp__ folder at top of TO archive so it doesn't become disjoint from it, but
does not include __bkp__ in the mergeall synchronization -- its contents are local to
the TO archive, and not copied to/from other copies automatically.  This is so, because:

1) Propagating backups between copies in round-robin trips between three (or more)
devices seems potentially very confusing, and device source would be unclear.

2) Synchronizing would delete copes on TO not also on FROM, thereby limiting TO
from having more than one backup, in one-way propagation use cases.

3) Automatic backup pruning wouldn't be possible, as it violates the principle that
changes in the archive must all be disjoint during a single mergeall run.  If __bkp__
was included in the synchronization process, it's not impossible that one of its folders
on the TO archive would be both pruned by the backup system here and deleted due to an
absence in the FROM archive -- regardless of the timing, the second deletion attempt
would fail with an exception.

More about disjoint changes:

More generally, mergeall works at all only because its updates are mutually exclusive,
and disjoint (a given name can appear in just one change category).  Its changes are
limited to these items, in this order:

  a) Same-named files (replaced in TO)
  b) Unique files and directories in TO (deleted from TO)
  c) Unique files and directories in FROM (copied to TO)
  d) Same-named file/directory type mismatches between TO and FROM (replaced in TO)

Once any item is classified and changed in one of these categories, it is not
further inspected or altered.  Hence, each change in the TO tree is limited to
its TO tree location, and cannot be impacted by, or appear as part of, any other
changes category, regardless of the timing of changes made.  Order does matter for
renames on case-insensitive machines like Windows, but are sound as long as deletes
occur before adds (see mergeall.py's mergetrees() docstring for more details).

Backups rely on the fact that the change sets are disjoint: a file can be saved
for only one category (a, b, or d), and new copies (c) are never saved and thus
cannot overwrite saves for deletions, even for mixed-case differences on Windows.

Synchronizing the __bkp__ tree between archives would violate this disjointedness rule,
however, because automatic backup subfolder prunes may delete an item also scheduled for
deletion in category (b).  This is true even though all differences are detected before
the changes phase begins, and only removing __bkp__ items from the set of backed-up
items does not circumvent this; synchronization changes may still intersect with prunes.

Therefore, top-level __bkp__ folders are removed from the set of synchronized items
in the archives, and an archive copy retains only backups from mergeall runs where
it served the role of TO archive.  Backups can be manually copied into normal archive
folders to propagate them to other archive copies if desired, but their management
then becomes a user task; only items in the special-cased __bkp__ folders are
automatically pruned when their items count limit is exceeded.

------------------------------------------------------------------------------------------

General implementation notes:

Caveat: backup paths may be too long in some cases on some platforms.  The original file's
full archive path is recreated and required, because the same name may appear in more
than one place in the tree (and filenames created by concatenating path parts seem more
likely to exceed platform length limits).  

Does not store a single '.bkp' alongside original: would need to special-case to avoid
.bkp.bkp... accumulation, and a '.bkp' may be a valid to-be-archived user file name.

Uses copy instead of move (rename): need to retain original if code here fails, and also
want to retain the original's mod times in the copy.  os.rename also proved unreliable
on Windows, especially across devices.

All errors return with an exception: caller handles. In mergeall, the backup exception
here causes the removal or deletion to be skipped, as the operation is then unsafe.

Backs up files replaced or deleted, but does not backup new files added, as these
would be just redundant copies of new data.

Backup folder are excluded from the synchronization process by removing only top-level
__bkp__ items from os.listdir results in the comparisons procedure.  This proved more
reliable than os.rename moves to/from a temporary folder (see unused code ahead).

TBD: should backups be stored outside the archive itself?  Originally placed in __bkp__
at root of archive, and synchronized so that prior versions are not stored on just one
device only (which may fail).  The downside of this is that, when this option is used,
backups can accumulate quickly, and deletes are removed from the archive's actual folders
but linger on the device taking up space.  Could backup replaced files only, but that's
less secure.

     RESOLVED: later opted to store __bkp__ in the TO archive for association, but
     not synchronize it across other archive copies.  This was required, because
     prunes violated the disjoint updates rule which is the logical basis of mergeall.

TBD: should backups be pruned?  Currently, this is automatic, lest old backups
accumulate.  The downside of this is that the user may not want old backups removed.
As an alternative, the end user could be expected to manage backups, and could also be
expected to select the backups folder.  This would be flexible, but complicates the
command line and GUI with another directory choice, and deleting backup folders manually
seems a substantial extra admin task.

    RESOLVED: backups are stored in the automatically-created __bkp__ at the top of the
    TO archives, and are automatically pruned by age after N (default 10) copies have
    accumulated.  Given that __bkp__ is now local to each archive copy, this policy
    allows ample backup retention -- any of the last N runs can be unwound if needed.
    Users may also manually delete large backup folders before they are pruned for space.

TBD: __bkp__ folders will register differences in diffall.py runs for their top-level
per-run folders.  These can be simply ignored in reports.  diffall could skip __bkp__,
like mergeall, but this seems too tight a dependency between the two programs to enforce.

==========================================================================================
"""

from __future__ import print_function   # Py 2.X
import os, time, glob, shutil, sys, time, errno, stat  

import  mergeall    # for copyfile, copytree (from won't work here: recursive)
indent1 = '....'    # to distinguish messages here from main mergeall logic
indent2 = '____'    # rmdir retries: not just for pruning (also for uniqueto)


# [2.1] from configs file, unless absent or errors
try:
    from mergeall_configs import MAXBACKUPS 
except:
    MAXBACKUPS = 10   # keep up to this many backup folders in each archive copy's __bkp__

# [2.3] use UTF8 (or other) Unicode encoding for __added__.txt restore files in __bkp__,
# not platform default; these files contain filenames which may have arbitrary characters;
# 2.X's codecs.open files are always binary mode: must specialize line-ends here too;
ADDENC = 'utf-8'
if sys.version[0] == '3':
    unicode_open = open                         # what 3.X probably should have done?
    unicode_linesep = '\n'                      # 3.X text mode files expand \n as needed
else:
    import codecs
    unicode_open = codecs.open                  # 2.X compatibility (or use from...as...)
    unicode_linesep = os.linesep                # 2.X codecs always binary: no \n expansion

# make just 1 subfolder per run, on first file baked up;
# else may make 1 new subfolder for each second in run!
# now also used by noteadditions, to get run's subfolder:
# either backupitem or noteaddition may be called first;
datetimestamp = None

# try to prune just once per run, on first file backed up
pruned = False

# if -quiet, print backups message on first backup only [2.4]
firstbkpmsg = True



def makedirs_ifneeded(dirpath):
    """
    --------------------------------------------------------------------------------------
    [2.3] Run an os.makedirs() call portably on Python 3.X or 2.X to create any and all
    parts of a directory path as needed.  Only 3.X has the exists_ok flag to avoid an
    exception if a part already exists, and 2.X doesn't have 3.X's detailed exceptions.
    Alternative: could also call os.path.exists(dirpath) in an "if", but slightly slower.

    Note: Python's os.makedirs() in Lib\os.py is recursive, but probably doesn't need
    to be.  Because it scans a linear directory path, a simple loop should suffice, and
    yield simpler code (alas, "batteries included" means you get what's shipped in the
    box).  Recursion is really required only for arbitrary shapes, such as folder trees,
    and even then can be replaced with explicit stacks (see Learning Python 5E, p555-561).
    --------------------------------------------------------------------------------------
    """
    if sys.version[0] == '3':
        # python 3.X
        os.makedirs(dirpath, exist_ok=True)               # 'recursive' mkdir, as needed
    else:
        # python 2.X
        try:
            os.makedirs(dirpath)                          # 2.X has no exists_ok
        except OSError as E:                              # 2.X requires errno test
            import errno                                  # only need here on exc
            if E.errno != errno.EEXIST: raise             # reraise all others



def backupitem(pathto, toroot, dobackup, quiet):
    """
    ----------------------------------------------------------------------------------------
    If enabled by command-line (or by proxy via GUI toggle or console reply), make a backup
    copy of items (files and directories) in the TO tree before they are destructively
    replaced or deleted in-place.  This includes a:

    1) File or link in the TO tree
           that is about to be replaced by a newer same-named file in the FROM tree
    2) File or directory in the TO tree
           that is about to be deleted due to absence in the FROM tree.
    3) File, directory, or link in TO
           that is about to be replaced due to a dir/file/link type mismatch in the FROM tree

    Unique files in FROM copied to TO are not backed up, as this is not a destructive action,
    though they are listed by noteaddition in the backup's __added__.txt file for the run.
    
    "pathto" is where the item to be destroyed resides; "toroot" is the top of the archive in
    the TO tree, where the backup will be stored in a __bkp__ subfolder under pathto's tail.
    Does not backup if not "dobackup": backups not enabled per user or rollback.py script.
    Prints only one general backup log message if "quiet", to mimimize logfile clutter.
    
    Either this or noteaddition may be called first: prune + timestamp if not yet done.
    No longer needed, because __bkp__ not synched: "if pathto.startswith(bkproot): return".

    [3.0] Use newly-added "strict: arg to copytree() to force it to propagate its first
    file exception to here, instead of printing an error message and continuing.  We need
    to pass the eception to this function's caller, so the item's update is cancelled if
    its backup fails.  Else, we might delete a TO folder without backing up parts of it!
    
    [3.0] Don't use copytree()'s new 'skipcruft' here: we're backing up data already
    present in the TO tree.

    [3.0] Handle symlinks (to both files or dirs) by passing them to copyfile() just
    like normal/simple files; copyfile() tests for links up-front and copies specially.
    Symlinks are copied, not followed: backup the link itself, not what it refers to!
    ----------------------------------------------------------------------------------------
    """
    global firstbkpmsg

    # do nothing if "-backup" not passed by command, launcher, or rollback.py
    if not dobackup:
        return  # avoid nesting

    assert pathto.startswith(toroot)       # sanity check: changed file must be in TO tree
    todir, tofile = '?', '?'               # initialize for early exceptions
    
    try:    
        # prune old backups the first time through here (or noteaddition)
        try:
            prunebkpdirs(toroot)
        except:
            print(indent1 + 'prune failed, but backups and mergeall continued')
            print(indent1 + '%s %s' % (sys.exc_info()[0], sys.exc_info()[1]))

        # make run's subfolder timestamp first time here (or noteaddition)
        datetimestamp = makeruntimestamp()
        
        # verify or create the backup-to path in the TO copy
        bkproot = os.path.join(toroot, '__bkp__')                   # toroot is cmdargs.dirto
        todir, tofile = os.path.split(pathto)
        archtail = todir[(len(toroot) + len(os.sep)):]              # remove prefix=cmdargs.dirto
        bkppath = os.path.join(bkproot, datetimestamp, archtail)
        makedirs_ifneeded(bkppath)                                  # 'recursive' mkdir, as needed

        # copy file or dir over to backup copy 
        copytopath = os.path.join(bkppath, tofile)
        if not quiet:
            print(indent1 + 'backing up %s to %s' % (tofile, copytopath))
        elif firstbkpmsg:
            # [2.4] suppress per-file messages (superfluous?), but indicate backups mode once
            allbkpsroot = os.path.join(bkproot, datetimestamp)
            print(indent1 + 'backing up all items to %s' % allbkpsroot)    # [3.0] not bkppath!
            firstbkpmsg = False
        
        if os.path.isfile(pathto) or os.path.islink(pathto):      # [3.0] treat links like files
            mergeall.copyfile(pathto, copytopath)                 # this never catches excs
            
        elif os.path.isdir(pathto):
            os.mkdir(copytopath)
            mergeall.copytree(pathto, copytopath, strict=True)    # [3.0] fail on any except

        else:
            assert False, ('unknown file type: ' + pathto)        # e.g. fifos: punt            

    except:
        print(indent1 + '**Error backing up %s in %s' % (tofile, todir))
        raise   # reraise: handle in caller - cancel update, as it would be destructive



def makeruntimestamp():
    """
    ---------------------------------------------------------------------------------------
    Make the run's unique timestamp, to be used for its subfolder name in the TO tree's
    __bkp__ backups folder.  Factored to here, as this may be triggered by either backitem
    or noteaddition, either of which may be called first during a mergeall run.
    ---------------------------------------------------------------------------------------
    """
    global datetimestamp
    
    if not datetimestamp:
        datetimestamp = time.strftime('date%y%m%d-time%H%M%S')  # backup's unique top dir name
    return datetimestamp



def prunebkpdirs(toroot, maxbackups=MAXBACKUPS):
    """
    ---------------------------------------------------------------------------------------
    On first backup in session, auto-delete the oldest backup dir(s) in the TO archive
    if needed, keeping just the most recent N.  Most of this was adapted from the frigcal
    GUI's backup system.  TBD: this could be left to user, but seems likely to accumulate.
    Caller handles any exceptions here: this pre-merge step shouldn't be fatal - proceed.

    [3.0] Recoded to skip a failed directory and continue the prune to process other
    folders, instead of ending the prune at the first failure.  Else, this may miss
    folders in the unlikely event that a failure of a more recent backup folder (e.g.,
    permissions) prevents the prune from reaching earlier backups later on the list...
    which can only ever happen if the #backups has been reduced in the configs file.
    Users must address the failure to allow the failing folder to be pruned eventually.
    Callers are still notified with an exception, but details will be displayed here.
    ---------------------------------------------------------------------------------------
    """
    global pruned
    
    if pruned:
        return
    else:
        pruned = True
        bkproot = os.path.join(toroot, '__bkp__')
        if os.path.exists(bkproot):
            
            backuppatt  = 'date*-time*'
            currbackups = glob.glob(os.path.join(bkproot, backuppatt))
            currbackups.sort(reverse=True)
            prunes = currbackups[(maxbackups - 1):]              # earliest last, via names sort

            anyfailed = False
            for prunee in prunes:                                # globs have full paths
                print(indent1 + 'pruning', prunee)               # normally 0 or 1, unless failed
                try:
                    shutil.rmtree(prunee, onerror=rmtreeworkaround)
                except Exception:
                    anyfailed = True
                    print(indent1 + 'this prunee failed, but pruning continued')
                    print(indent1 + '%s %s' % (sys.exc_info()[0], sys.exc_info()[1]))                    

            assert not anyfailed, 'Some prunes had errors'       # [3.0] notify caller of excs



def noteaddition(pathto, toroot, dobackup):
    """
    --------------------------------------------------------------------------------------
    [2.1] Log unique FROM items (files and dirs) added to the TO tree in a text file,
    with 1 file path (relative to the TO tree's root) per line.  The logged items are
    stored in file: toroot\__bkp__\subfolder\__added__.txt.  This allows additions to be
    automatically removed by "-restore" as part of a later run to perform a complete
    rollback.  The adds file also serves as run documentation, in addition to logfiles.
    Either this or backupitem may be called first: prune + timestamp if not yet done.
    --------------------------------------------------------------------------------------
    """
    if not dobackup:
        return  # avoid nesting

    try:
        # prune old backups the first time through here (or backupitem)
        try:
            prunebkpdirs(toroot)
        except:
            print(indent1 + 'prune failed, but note-add and mergeall continued')
            print(indent1 + '%s %s' % (sys.exc_info()[0], sys.exc_info()[1]))

        # make run's subfolder timestamp first time here (or backupitem)
        datetimestamp = makeruntimestamp()

        # build and make the adds file's path 
        addsname = '__added__.txt'
        bkproot = os.path.join(toroot, '__bkp__')                    # toroot is cmdargs.dirto
        runroot = os.path.join(bkproot, datetimestamp)               # bkp subfolder for this run
        addspath = os.path.join(runroot, addsname)                   # adds file in run's subfolder
        makedirs_ifneeded(runroot)                                   # 'recursive' mkdir, as needed

        # write copied item's relative path to adds file
        # [2.3] use utf8 for filenames, not platform default
        archtail = pathto[(len(toroot) + len(os.sep)):]              # remove prefix=cmdargs.dirto
        addsfile = unicode_open(addspath, encoding=ADDENC, mode='a') # add to end of file
        try:                                                         # OLD: use default encoding in 3.X
            addsfile.write(archtail + unicode_linesep)               # 3.X expands \n, 2.X does not 
        finally:
            addsfile.close()

    except:
        # do NOT reraise: no need to cancel the update, as adds are non-destructive
        print('**Error noting add of %s' % pathto)
        
    else:
        # don't issue a trace mesage here, as it seems gratuitous in the logs (?)
        # print(indent1 + 'noted addition in TO of', archtail)
        pass



def removeprioradds(fromroot, toroot):
    """
    --------------------------------------------------------------------------------------
    [2.1] Remove items listed in fromroot's __added__.txt file from toroot tree, if
    __added__.txt and the listed files are present, as part of a complete rollback from
    backups.  This is a pre-merge step: order matters for renames on Windows (must delete
    and then add, else delete may remove different cased name).

    Assumes fromroot is a __bkp__ subfolder (or at least has an __added__.txt),
    but does not fail if not -- in all cases, ignore exceptions here.  The user may have
    deleted __added__.txt to back out removals+replacements only (not adds), and may have
    created a custom __added__.txt elsewhere in another tree to be merged.

    We need to care about closing the file on exceptions; this is now a pre-merge step.
    The __bkp__ folder's __added__.txt will be copied over to the TO root by the normal
    merge; it's deleted manually later, rather than skipping the name during the merge.

    CAVEAT: because noteaddition() records added items using the path syntax of the
    platform on which the prior mergeall ran, it's not possible to remove prior
    additions on a different platform having incompatible path syntax without editing
    either the additions file or the code here.  A file added on Windows, for example,
    will be noted with "\" in its path, which likely won't work in a restore on Linux.
    This could be addressed by always using "/" in additions file paths and running
    os.path.normpath() to convert to "\" on Windows only, but this seems a rare use case. 
    As is, restores with additions should be run on the same platform as the prior merge.

      [3.0] UPDATE: this former caveat has been lifted, by converting paths in the
      backup's __added__.txt file for the hosting platform's separators.  Thus,
      backups made on Windows can now be restored on Windows, and vice versa.

    [3.0] handle symlinks (to both files or dirs) by passing them to os.remove() just
    like normal/simple files; shutil.rmtree() has issues with symlinks at the top;
    symlinks are removed, not followed: delete the link itself, not what it refers to!

    [3.0] changed arguably-confusing message format here from "restore removed file:"
    to the more consistent "removed added file:" -- it was removed, not restored.
    --------------------------------------------------------------------------------------
    """
    addsname = '__added__.txt'
    addspath = os.path.join(fromroot, addsname)
    numfilesdel = numdirsdel = 0
    if os.path.exists(addspath) and os.path.isfile(addspath):
        # [2.3] use utf8, not platform default
        addsfile = unicode_open(addspath, encoding=ADDENC)  # propagate open() exceptions: cancel merge
        try:                                                # OLD: adds file uses default encoding in 3.X
            while True:                                     # decodes can fail - catch via while, not for
                try:
                    line = addsfile.readline()
                except:
                    print('**Error: restore cannot read added file name: file retained')
                    print(sys.exc_info()[0], sys.exc_info()[1])
                    continue
                else:
                    if not line: break  # eof

                # [3.0] make restore paths portable
                delpath = line.rstrip()
                delpath = delpath.replace('/', os.sep).replace('\\', os.sep)                    
                delpath = os.path.join(toroot, delpath)
                
                if os.path.isfile(delpath) or os.path.islink(delpath):    # [3.0] +symlinks
                    try:
                        os.remove(delpath)
                    except:
                        print('**Error: restore cannot delete file, retained:', delpath)
                        print(sys.exc_info()[0], sys.exc_info()[1])
                    else:
                        numfilesdel += 1
                        print(indent1 + 'removed added file:', delpath)

                elif os.path.isdir(delpath):
                    try:
                        shutil.rmtree(delpath, onerror=rmtreeworkaround)
                    except:
                        print('**Error: restore cannot delete dir, retained:', delpath)
                        print(sys.exc_info()[0], sys.exc_info()[1])
                    else:
                        numdirsdel += 1
                        print(indent1 + 'removed added dir:', delpath)

                else:
                    print('**Error: restore skipped missing or unknown type file:', delpath)
        except:
            print('**Error during prior adds removal')    # others? don't reraise: do merge
            print(sys.exc_info()[0], sys.exc_info()[1])
        finally:
            addsfile.close()                              # close, except or not (non-CPython)
    return (numfilesdel, numdirsdel)                      # sums: add to merge's delete counts



def dropaddsfile(toroot):
    """
    -------------------------------------------------------------------------------
    In "-restore" mode, as a post-merge step get rid of the __added__.txt that
    the normal merge may have copied over to TO's root.  This is a special
    case, but it's quicker to drop it forcibly from the root here than to check
    for it as a skipped filename at each tree level during the merge (though 
    the merge's code supports skipping __bkp__ at top, __added__.txt otherwise).
    -------------------------------------------------------------------------------
    """
    addsname = '__added__.txt'
    mergedaddspath = os.path.join(toroot, addsname)
    if os.path.exists(mergedaddspath):
        os.remove(mergedaddspath)
        return True
    else:
        return False   # don't adjust merge's counters or print message



def rmtreeworkaround(function, fullpath, exc_info):
    """
    ---------------------------------------------------------------------------------------
    Catch and try to recover from failures in Python's shutil.rmtree() folder
    removal tool, which calls this function automatically on all system errors.
    ---------------------------------------------------------------------------------------
    [2.0] PENDING DELETION FILURES:
    
    On Windows, deletes may be marked as pending, and not finalized atomically,
    leaving an item in place after the delete call returns.  This can cause
    rmtree (shutil or custom) operations to fail with a directory-not-empty
    error in rare cases, subject to devices and other activity on the machine.

    This seems a shortcoming (bug?) in shutil.rmtree for Windows, and may be
    improved in the future.  In fact, Python's own test system uses a custom
    rmtree with wait loops to avoid the issue.  Here, update failures are mostly
    harmless (leaving a difference to be resolved on the next run), and rare
    (seen on only 1 machine in 1 year's usage), but errors are better avoided.

    Short of low-level C API possibilities, the two solutions seem to be to move
    (os.rename) to a temp file and delete from there, or fall into a brief wait loop
    to watch for the file removal to be finalized.  The former is subject to some
    os.rename oddness (see ahead), and the latter is used in Python's own test system
    for rmtree calls.  Adopt the latter here -- this function is a callback on errors
    in shutil.rmtree(), which retries the rmdir in a wait but bounded loop.

    Note that this applies to, and is used by, _both_ backup folder removals here
    and general archive tree removals in mergeall.py for unique dirs in the TO
    tree; it's here because it was first observed during backup folder removals.
    The fix here copies Python's test system's wait timing technique of exponentially
    increasing delay times up to half a second.  Usually 0 or 1, but at most 10,
    retries are run, with delays from .001 to .512 seconds (to see how this is
    computed, run code [x = 0.001, while x < 1.0: print(x); x *= 2]).
    
    Caveats: Deletes that are only pending seem a curious property for a filesystem,
    and this fix feels hackish.  But this is harmless (it kicks in only on os.rmdir
    failures, adding a minor delay), and there's no budget for further research...
    Could watch for not empty (ENOTEMPTY) only, but other errors are not inconceivable.

    Related threads (though something more authoritative from Microsoft would be nice):
    http://stackoverflow.com/questions/3764072/c-win32-how-to-wait-for-a-pending-delete-to-complete
    http://bugs.python.org/issue19811

    Note: shutil.rmtree could also be replaced with the following (sans some Unix cruft):
        for (root, dirs, files) in os.walk(top, topdown=False):
            for name in files:
                os.remove(os.path.join(root, name))
            for name in dirs:
                os.rmdir(os.path.join(root, name))
    but this doesn't fix the Windows pending-deletes issue, and may be less robust
    and portable than shutil's time-honed alternative.
    ---------------------------------------------------------------------------------------   
    [3.0] READ-ONLY PERMISSION FAILURES:
    
    As a different issue, rmtree operations can also fail due to read-only files in
    the tree.  To work around this, an onerror handler like the following can be used,
    which is portable to both Unix-en and Windows, and works like a Unix "rm -rf":

    import stat
    def onerror(func, path, exc_info):           # this is portable code 
        if not os.access(path, os.W_OK):         # read-only permission?
            os.chmod(path, stat.S_IWRITE)        # change to allow writes
            func(path)                           # and retry operation 
        else:
            raise

    This workaround was also incorporated into the general onerror handler below as
    a first step, before attempting the retry loop described above.  However, this
    code is currently DISABLED (via the False), because it makes no sense to override
    read-only permissions in this single context only (what about simple file deletes?),
    and the user may have marked an item read-only on purpose to protect it.  Instead,
    users are expected to fix their read-only permissions and run mergeall again.

    It can be argued that permission changes may be okay in this context, because
    users understand that mergeall intends to remove items, and read-only items will
    leave trash behind if not fixed.  Moreover, files on Windows are sometimes marked
    read-only without any user action (e.g., camera card copies), causing rmtree to
    fail unexpectedly.  This argument was rejected in the end, in favor of mergeall's
    overarching policy that your data is your property; read-only files should never
    be deleted without user intervention, even if this incurs extra manual steps.
    ---------------------------------------------------------------------------------------
    """

    # try to fix read-only errors first [3.0]
    if False:
        # PUNT: why in this context only?
        if (not os.access(fullpath, os.W_OK)
            and function in [os.rmdir, os.remove, os.unlink]):
            try:
                print(indent2 + 'fixing read-only on', fullpath)
                os.chmod(fullpath, stat.S_IWRITE)
                function(fullpath)
            except:
                pass    # fail: try other workaround below (or not? TBD)
            else:
                return  # okay: this fix worked, proceed with rmtree

    # then try this: Windows only, directory deletes only [2.0]
    if sys.platform.startswith('win') and function == os.rmdir:
        timeout = 0.001
        while timeout < 1.0:                     # 10 tries only, increasing delays
            print(indent2 + 'retrying rmdir')    # set off, but not just for pruning
            try:
                os.rmdir(fullpath)               # rerun the failed delete
            except os.error as exc:
                if exc.errno == errno.ENOENT:    # no such file (not empty=ENOTEMPTY) 
                    return                       # it's now gone: proceed with rmtree
                else:
                    time.sleep(timeout)          # wait for a fraction of second (.001=1 msec)
                    timeout *= 2                 # and try again, with longer delay
            else:
                return                           # it's now gone: proceed with rmtree

    raise  # all other cases, or wait loop end: reraise exception to kill rmtree caller




'''
==============================================================================================
THE FOLLOWING FUNCTIONS ARE NO LONGER USED (but retained as examples and lessons)

Instead of the following, restructured mergeall's recursive comparison algorithm to
skip '__bkp__' items in the top-level os.listdir result only.

Testing in Python 3.X showed os.rename to be unreliable.  On Windows, it fails when
the directories are on different devices (e.g., a USB stick and C:, possibly due to
differing file systems).  It also generated unexplainable permission errors on one
Windows test machine, even when the source and destination were on the same file
system.  As recursive copies and deletes are slow, recoded comparisons to skip
the folders in pure Python code instead.  The shutil.move call tries os.rename and
falls back on copy+delete too, but it's prone to the same issues seen here.
==============================================================================================

import tempfile, stat


def excludebkpdirs(toroot, fromroot):
    """
    ---------------------------------------------------------------------------------------
    Remove both archive's __bkp__ dirs from consideration, before diffs detection begins.
    To avoid complicating and slowing (or rewriting) change detection, simply move
    (rename) these out to a temp dir, and restore them after change detection finishes.
    They will not register changes, and so won't be propagated to any other archives.
    
    This and restorebkpdirs are coded fairly defensively, as this requires system calls;
    os.rename has been seen to fail for a true temp dir on Windows due to permissions
    (for no readily apparent reason...), so resort to program's cwd as a fallback option.
    
    On Windows, the destination of os.rename cannot exist, even for dirs; use new subdirs.
    mkdtemp adds random 6-character sequences to dir name till unique; add pid to be sure.
    Caveat: this may run up against directory path-length limits on some platforms?
    ---------------------------------------------------------------------------------------
    """
    global tempdir, temptobkp, tempfrombkp
    tempdir = temptobkp = tempfrombkp = None
    exists, join = os.path.exists, os.path.join
    try:
        tobkp   = join(toroot, '__bkp__')
        frombkp = join(fromroot, '__bkp__')
        if exists(tobkp) or exists(frombkp):          
            try:
                tempdir = tempfile.mkdtemp(prefix='mergeall-', suffix=str(os.getpid()))
                if sys.platform.startswith('win'):
                    os.chmod(tempdir, stat.S_IWRITE)   # may require force writeable?
                open('temp.txt', 'w').write('try rename\n')
                try:
                    os.rename('temp.txt', join(tempdir, 'temp.txt'))
                except:
                    os.remove('temp.txt')
                    raise  # reraise
                else:
                    os.remove(join(tempdir, 'temp.txt'))
            except:
                print('using cwd as temp dir fallback')         # show sys.exc_info()[0,1]?
                print(sys.exc_info()[0], sys.exc_info()[1])
                tempdir = os.getcwd()                           # temp unusable; or os.curdir

            if exists(tobkp):
                print('excluding', tobkp)
                temptobkp = join(tempdir, 'to.__bkp__')
                os.rename(tobkp, temptobkp)                     # quick move, not copy

            if exists(frombkp):
                print('excluding', frombkp)
                tempfrombkp = join(tempdir, 'from.__bkp__')
                os.rename(frombkp, tempfrombkp)                 # either can exist or not
    except:
        print('Cannot move __bkp__ to temp: rerun after manually moving out of archive')
        assert False, 'mergeall changes cancelled'



def restorebkpdirs(toroot, fromroot):
    """
    ---------------------------------------------------------------------------------------
    Restore __bkp__ folders from temp dir, after diffs detection, and before changes begin.
    See excludebkpdirs above for more details.
    ---------------------------------------------------------------------------------------
    """
    global tempdir, temptobkp, tempfrombkp
    join = os.path.join
    try:
        if temptobkp:
            tobkp = join(toroot, '__bkp__')
            print('restoring', tobkp)
            os.rename(temptobkp, tobkp)
            
        if tempfrombkp:
            frombkp = join(fromroot, '__bkp__')
            print('restoring', frombkp)
            os.rename(tempfrombkp, frombkp)
    except:
        print('Cannot restore __bkp__ from temp, changes cancelled: restore from %s' % tempdir)
        assert False, 'mergeall changes cancelled'
    else:
        if tempdir != None and tempdir != os.getcwd():    # not if fallback to cwd!
            os.rmdir(tempdir)
    
    
    
def isbkpdir(path, archroot):
    """
    ---------------------------------------------------------------------------------------
    Original idea: Call this from mergeall to skip a TO or FROM __bkp__ path during diffs
    detection phase.  Because these are skipped, they won't trigger any changes in the
    changes phase.  This is required to avoid including __bkp__ in synchronization (see top
    docstring).  Later replaced with os.rename moves, which was later replaced with
    comparison recoding.
    ---------------------------------------------------------------------------------------
    """
    bkproot = os.path.join(archroot, '__bkp__')
    return os.path.normpath(path).startswith(os.path.normpath(bkproot))  # equate / and \; case?

    # that is...
    #return path[(len(archroot) + len(os.sep)):] == '__bkp__'
'''

    



[Home page] Books Code Blog Python Author Train Find ©M.Lutz