[icon]

tagpix — Combine Your Photos for Easy Access

This is the tagpix user guide. It includes an overview, usage instructions, and version changes. Whether you consider yourself a programmer or end user, you'll find resources to help get you started organizing photos with tagpix here.

Because tagpix by default moves and renames photos, users are encouraged to read this guide first—especially its caution—before running tagpix on valued photo collections. For this program's license and author, see its main script. For screenshots, click the image above. For code and examples, see the install folder. To download this program, visit its web page.

Contents

Overview

tagpix is a photo organizer that merges and labels your photos for convenient access. It collects, renames, and sorts them into a normalized folder structure, resolving duplicate content and filenames automatically in the process. This section introduces the basics of its roles and operation.

Why tagpix?

If your digital photo collection has become scattered over many folders; uses filenames that are not unique because of their origin on multiple cameras; hosts modification dates that reflect retouches instead of events; or contains arbitrary duplicates, tagpix may be the photo-organizing tool you've been looking for. Running it on your photo folders transforms them into a simple, uniform format that's ideal for both viewing and archiving.

Just as importantly, tagpix is an open-source program that makes hidden agendas impossible, and its merged result is as private as the device on which it is stored. With tagpix, access to a folder, and a few simple commands, control of your photo archives remains with you, not a proprietary, closed program or device.

What tagpix Does

tagpix transfers all the files in an entire folder tree to a flat folder, without changing their content. Along the way, it adds date-of-origin to the front of the names of files transferred to make them unique and sortable; skips any truly duplicate content, and adds a unique serial number to the end of any remaining duplicate filenames; isolates movies and other non-photo files in folders of their own; and groups all transferred items into by-year subfolders on request.

The net effect is useful for organizing the contents of disparate photo collections holding pictures and movies shot on multiple cameras over many years. By running tagpix, all the items of each media type are merged on your local computer into a single flat folder, or a set of flat by-year subfolders, for fast, convenient, and private access.

In more detail, the following list summarizes the main assets that tagpix brings to your photo-normalization jobs:

Collecting and transferring content
tagpix walks all the folders and subfolders in the source tree to find content to be collected and combined in the destination folder. Depending on user configurations, items located and transferred can be either:

Because it's fastest, moves are the default. Copy-and-delete mode has the same effect as moves, but allows items to be moved between different devices and drives (albeit, more slowly than direct moves on the same device). Copy-only mode leaves items in the source tree and works across devices, but may require manual steps to avoid reprocessing prior content on later runs.

Resolving name conflicts across cameras
tagpix resolves same-name conflicts between different cameras' content by adding a date-of-origin prefix to all transferred filenames (e.g., xxx.jpg becomes 2017-10-14__xxx.jpg in the destination, only). For photos, the prefix uses date taken, extracted from standard photo-file Exif metadata tags when available. For photos with no Exif date-taken tag, and for other types of files, the prefix instead uses either the date-taken string embedded in Android photo filenames, or else the date-modified value of the file itself.

As an example, when tagpix encounters the first of the following in a source folder, the file's name is expanded to the second form to make it unique across multiple cameras that may produce the same filename for different photos shot on different dates:

DSC03249.JPG
2018-02-05__DSC03249.JPG

Whether the added date comes from an Exif photo tag, Android filename, or the file itself, the net effect makes the names of photos shot on different dates unique in the result's flat merged folders. When date taken is available in Exif tags or Android filename, the expanded name also reflects the date of the scene capture, not the most recent retouch.

Resolving duplicate content and names
tagpix automatically detects and resolves true duplicates in the tree. When image files have the same name after adding their date prefix, tagpix first runs a full byte-by-byte comparison of their content. If the files' content is exactly the same, the redundant copy in the source tree is skipped and not added to the result. If their content differs, the new copy's filename is extended with a unique serial-number suffix (e.g., date__xxx__N.jpg) and added to the result.

As another example, if the preceding example's file has already been processed by tagpix, and a new same-named and same-dated file like the first of the following is encountered in the source folder, the new image will be either discarded if its content is the same as the file already processed, or expanded to the second form to make its name unique if its content differs:

DSC03249.JPG
2018-02-05__DSC03249__1.JPG

This means your merged folders will keep just one copy of true duplicates, but all versions of same-named and same-dated content that differs—a rare scenario across different cameras, but possible and even normal if you've retouched or resized a photo and saved it with the same filename in a different folder, and the same date of origin per its Exif tags, Android filename, or file-modification date.

As of version 2.1, tagpix also skips even rarer duplicates of duplicates, that may arise if modified copies are copied to multiple folders unmodified (just one is retained). Regardless of their source, tagpix keeps true duplicates out of the merged result automatically, and renames files with the same name and date of origin but differing content to make them unique.

Grouping by type and year
tagpix always groups merged items by content type, creating separate folders for photos, movies, and others. Photos from cameras are usually JPEG files, but are recognized by both MIME type (which keys off of filename extension) and Exif tag use. This means that both JPEGs and TIFFs using any related filename extension are treated as photos by the program; other images are considered other content. Movies are similarly classified per MIME types and segregated from photos and other content for direct access. Technically speaking, PNG and WebP images may have Exif tags too, but their tags are rarer, libraries may or may not handle their emerging tag formats, and they are not clearly photos in any event.

As an option, items in all three file-type folders can also be grouped by year of origin. If this option is selected, each content type's folder will be grouped into by-year subfolders instead of a flat list of items. Either way, the duplicate-resolution steps of the preceding two items are applied to all three content-type folders. For instance, duplicate copies of movies in the source tree are skipped too.

And (of course) more
In addition, tagpix strips prior runs' date prefixes so you can rerun it any number of times on prior results; discards redundant filename dates in photos shot on Android devices; comes with a list-only mode that allows you to preview its intentions without making any changes (a recommended first step); and generates a report that describes all the updates it performs and any files it skips.

Read on to learn how to use tagpix to organize your photos.

Usage Details

This section describes tagpix install requirements, inputs and results, usage modes, and other operational details.

Installs and Platforms

tagpix is a Python program that runs on all major platforms, and is provided in source-code form. To install the program itself, download its zipfile from the following web page's Download section and unzip it on your computer:

learning-python.com/tagpix.html
tagpix also requires installs of either a Python 3.X or 2.X to run its source code, plus the third-party Pillow (a.k.a. PIL) image library for the installed Python to access photo tags. Fetch and install these items if needed from the following sites, respectively (or search the web for other links):
www.python.org/downloads/
pypi.python.org/pypi/Pillow

tagpix will work on any platform that runs Python and Pillow, and has the required folder and file access permissions. For example, the program has been verified on Windows, Mac OS, Linux, and Android, and may work on iOS (there's more about running tagpix on mobile devices in the notebox below).

For pointers on Pillow installs, see this page. A note for developers: the exif.py tags-processing alternative to Pillow failed for some files when tested in 2013 for tagpix version 1.0, though your results may vary, and there are other Exif alternatives in the open-source domain.

tagpix on mobile: per this shot and log, tagpix works on Android devices in apps that support Python and command lines. For instance, it can be used in Termux, after running both this command (sans its "-y" if you want to be asked about changes) and pip install Pillow; as well as in Pydroid 3, after running the same pip command in its Terminal or using its Pip. This makes tagpix ideal for organizing photos on Android, though keyboards can boost usability. Also note that Android imposes proprietary access rules which limit the folders accessible to your Python app—and hence tagpix; for more on its rules which are beyond this guide's scope, see this doc. tagpix may work on iOS too (e.g., the Pythonista app bundles a version of Pillow), but this is untested, and iOS's access rules have historically been tighter than Android's.

Input Prompts

To launch, run script tagpix.py with no command-line arguments. It can be run from a console (e.g., Terminal on Unix and Command Prompt on Windows) and most Python IDEs (e.g., PyEdit as captured here, or Python's own IDLE), though IDEs may not support output-report routing described ahead. A basic run from a console looks like this:

$ python3 tagpix.py
...input run parameters at prompts...

All run parameters are requested by the following prompts at the program's console:

  1. tagpix renames and moves photos to a merged folder; proceed?
  2. Source - pathname of folder with photos to be moved?
  3. Destination - pathname of folder to move items to?
  4. Group items into by-year subfolders?
  5. List only: show new names, but do not rename or move?
  6. Delete all prior-run outputs in "output folder name"?

For all prompts except #2 and #3, type y for yes, and type n or simply press Enter (a.k.a. return) for no. Some of these prompts are self-explanatory, but here are a few details to help you get started, with the most important first:

For #2 (the source):
This is where unmerged images reside—the tree to be scanned for images to move (or copy). You can either enter an explicit folder, or press Enter to accept the default:
  • To use an explicit folder, enter the pathname of the root folder containing all the photo subfolders you wish to combine. For example, you might give the root folder just above those where you store photos from your camera cards, copies, or imports.
  • If you prefer to use the default, it is the SOURCE folder in the current working directory (e.g., in the script's own directory, if run from the same). Move or copy all your camera folders and images to there before running this script.
Whether the source folder is explicit or default, all its content and subfolders will be scanned to collect all items in the entire source-folder tree. Per transfer-modes coverage
ahead, the source folder will either be emptied or left intact after a tagpix run, according to your configurations.
For #3 (the destination):
This is where images are moved (or copied) to—the folder containing the result's MERGED folder (described in more detail ahead). You can either enter an explicit folder, or press Enter to accept the default: Whether the destination folder is explicit or default, its MERGED subfolder will hold all your combined source-tree items after the tagpix run. Per usage-modes coverage ahead, if you enter a prior run's folder at this prompt, it will be extended; if you enter a new folder, it will be generated.

Among the other prompts: #4 allows you to bunch items by their year of creation (there's more on its effect ahead); and #6 may appear up to three times for non-empty photo, movie, and other folders, and is important when rerunning tagpix (see here and here for its roles, as well as its verifications added in version 2.1; you'll want to reply n (no) unless erasing an existing archive).

Finally, to end the script immediately without making any changes, reply no to prompt #1, or enter control+C (or otherwise kill the program) at any other prompt. You can also preview changes before applying them, by replying yes to prompt #5; this enables a list-only mode that analyzes content and shows planned updates, but does not perform any.

For more comprehensive tagpix command-line usage examples, browse the examples folder included in its install package. There, you'll find console logs that demonstrate a variety of options on a variety of platforms. Perhaps the most typical use case is captured in this example.

Automated Inputs

For simplicity, all tagpix inputs are provided as console replies instead of command-line arguments, but it's still possible to automate tagpix runs by providing canned inputs for the run command. This requires a bit a shell-programming skill and can vary per both platform and shell, but it's straightforward to provide inputs with one of two general techniques. First, and most portably, you can redirect stdin (the stream from which input is read) to a file, which contains one reply per line:

$ python3 tagpix.py < inputs.txt

Second, and perhaps more conveniently, you can use a shell 'here' document to provide inputs in the run script itself. The exact syntax of this can vary, but here's a simple example coded as a Unix Bash script named runtagpix.sh; it provides canned inputs as tabbed lines between EOF markers, and suppresses spurious input prompts by routing stderr to an output sink with 2> (nit: the latter may also discard some error messages, including those of uncaught Python exceptions):

#!/bin/bash

python3 tagpix.py 2> /dev/null <<-EOF
	y
	New-unmerged
	.
	y
	y
	EOF

You wouldn't type all this at the console, of course (it's just as easy to reply to the prompts), but placing it in a script means you can run tagpix with a single command and no input replies. You won't be able to vary inputs this way, but it's noticeably simpler than typing up to eight responses on each typical updates run:

$ bash runtagpix.sh     # or just 'runtagpix.sh' if you make it executable with chmod

For complete examples of precoded scripts that automate inputs this way for both list-only and full-merge tagpix runs, study the included Bash scripts here and here.

A Brief Primer on Pathnames

In all usage modes, the pathnames you input at prompts #2 and #3 can be either relative or absolute:

To minimize the lengths of the paths you'll input, it's often helpful to first run a cd command in your console to go to the folder containing your MERGED destination folder and/or source folder, and then run tagpix there, giving folder paths relative to where you are working.

To illustrate, the following kicks off a tagpix updates run on Unix after changing to the folder containing both the MERGED results tree and a New-unmerged folder holding the new photos to add to MERGED. Both folders are in the current directory (a.k.a. .) after the cd command (relative), and the tagpix script itself is elsewhere (absolute). User-entered commands and replies are in bold font (and ~ is your user folder on Unix):

~$ cd ~/MY-STUFF/Camera/Digital-cameras-merged
~/MY-STUFF/Camera/Digital-cameras-merged$ python3 ~/MY-STUFF/Code/tagpix/tagpix.py
tagpix renames and moves photos to a merged folder; proceed? y
Source - pathname of folder with photos to be moved? New-unmerged
Destination - pathname of folder to move items to? .
Group items into by-year subfolders? y
List only: show target names, but do not rename or move? n
Delete all prior-run outputs in "./MERGED/PHOTOS"? n
Delete all prior-run outputs in "./MERGED/MOVIES"? n
Delete all prior-run outputs in "./MERGED/OTHERS"? n
...report messages show up here...

Absolute paths are generally required when running tagpix from an IDE such as PyEdit, because the IDE's current directory may not be related to your image folders, and may not be useful for relative paths; see your file explorer's copy-path option to paste a folder's absolute path at tagpix prompts easily. As usual, the tagpix.py script's path in command lines can be relative or absolute too depending on where commands are run, and is not required if the script is open and run from an IDE.

Results Report

This script's initial prompts are printed to the stderr stream, and its report is printed to stdout (see the intro to streams here). Both go to the console by default, but this two-stream model allows you to save the tagpix report to a file for later inspection—especially handy for larger runs.

To start tagpix and save just its report to a file, use a console command line like this to route stdout to a file (> shell syntax will not work when running tagpix from most IDEs):

$ python tagpix.py > report.txt

This technique works with any command-line form, and can be combined with the automated inputs we met earlier. Any special message lines in the report all begin with ***; search for this in the saved report text after a tagpix run (more on this ahead).

For a sample of report content, see the demo logs in the example-runs folder; report text is all that following the last input prompt. For a comprehensive report example from a tagpix run on a very large photo collection, including duplicates, locked-file errors, prior-run dates, and more, see this file.

Results Tree

The script's results show up in the MERGED folder nested in the destination folder (prompt #3), split into PHOTOS, MOVIES, and OTHERS subfolders that each contain merged and uniquely named content files. If you reply yes to prompt #4, these three subfolders further group their content into year subfolders. Specifically, the results are organized into a shallow tree as follows:

Destination or ./
    MERGED/
        PHOTOS/
            flat content, or year subfolders with flat content
        MOVIES/
            flat content, or year subfolders with flat content
        OTHERS/
            flat content, or year subfolders with flat content

As described earlier, all filenames at the bottom levels of the results tree include date prefixes added to make them unique (e.g., 2017-10-14__file.jpg). The dates added reflect either date-taken Exif tag values (for most shot photos), date-taken date in Android filenames (for Android photos with no Exif date), or date-modified file attributes (for all others).

For photo files, date taken is always used if present, because it both ensures that names are unique (different cameras may reuse the same names), and reflects the recorded event's date (modification date may instead be a latest-retouch date after edits, but a date-taken tag is likely to survive). Although date taken may not apply to photo scans, for most photos shot on digital cameras the expanded names chronologically identify both the photos themselves and the scenes they capture.

Items not recognized as movies or tagged photos are moved (or copied) to OTHERS. After a tagpix run, you may wish to manually remove items from OTHERS that reflect camera-specific cruft. For example, some cameras create .THM or .CTG files which are irrelevant to your content in PHOTOS and MOVIES. tagpix does not omit these automatically, because it prefers to err on the side of caution (only well-known .* hidden files and user-selected subfolders are skipped, per the next section). Be sure to delete only cruft: the OTHERS result folder may contain non-camera images like PNGs and GIFs too.

For a more graphical look at results trees, see the examples folder's screenshots of both flat and group-by-year modes.

Resolving Skips

Following a run, you should check the report's final Missed section to see if any files were skipped due to:

Normally skipped name patterns
These are not errors, and include both Unix .* hidden items, and items in subfolders matching a configurable skip pattern. They are also noted in Skipping message lines at the top of the report.
Duplicate content
These are not errors, but are skipped by design as described in Overview above. They are also noted in ***Duplicate message lines earlier in the report.
File-transfer errors
These are genuine errors, but do not stop the program: other files are processed after the error is encountered. They are also noted in ***Error message lines earlier in the report.

All items skipped are left intact in the source tree, and listed in the Missed section.

If the Missed line shows 0 skips, or if you are okay with the items skipped, delete the contents of your source folder after the run if desired; if there were no skips, it's just empty directories (but see also the mode variations note ahead).

If the Missed line's skips is not 0 and valid items were skipped due to errors, resolve their issues (e.g., fix locks or permissions, or use a shorter destination path on Windows) and rerun tagpix to transfer them. For the rerun, use the same source and destination folders as the original run, and do not delete the prior run's results (at prompts #2, #3, and #6).

Mode variations: most of the above pertains to file-move and copy-and-delete transfer modes only. When tagpix is run in copy-only mode, added in version 2.1, it does not produce a Missed line or section in the results report, because no files are removed from the source tree. Instead, the end of the report in this mode concludes with a message Nothing was removed from the source tree. To analyze skips in copy-only mode, search for messages earlier in the report, as described for the three skip categories listed above.

Usage Modes

Depending on the replies you provide to input prompts, you can use this script to either extend an existing archive or make one anew, and can do both with the aid of another program:

  1. To extend an archive (e.g., for viewing, or full optical-disc burn), for prompt #3 give the same destination-folder path as a prior run (i.e., the path to the folder containing a prior run's MERGED result folder), and answer no to #6 prompts; new source items will be moved (or copied) to the prior run's folders.
  2. To make a new archive (e.g., for an initial or incremental optical-disc burn), for prompt #3 give a new destination-folder path, perhaps with the run date in its name; source items will be moved (or copied) to the new archive's folders.
  3. To add new items to both an incremental archive for burning and an existing archive for viewing, use the preceding mode B first, and then merge the new archive's contents into an existing archive with another tool (a GUI cut/paste or drag-and-drop will generally suffice).

For an example of usage mode A, see the logs here and here. For an example of mode B, see the log here. For additional usage-mode examples, see the full examples folder. For alternative file transfer modes, see version 2.1 release notes.

Other Usage Notes

This section collects smaller usage notes and tips. Some summarize earlier coverage.

Result path lengths
The combination of folder names and date-of-origin prefixes created by tagpix can be 31 characters long, not counting photo base names (e.g., MERGED/PHOTOS/2018/2010-12-03__). If merged results exceed pathname limits on your platform, try using a shorter destination path (i.e., a folder higher on your drive).
Preventing changes
tagpix makes no changes if the source folder does not exist; the user cancels the run verification or requests a list-only run (via prompts #1 or #5); or the script is killed while waiting for any input (e.g., control+C in a console, or a kill request in an IDE). As of version 2.1, you can also prevent source-tree changes by enabling copy-only file transfer mode.
Reruns on prior results
It's safe to rerun tagpix on items and folders it created in the past, because it automatically detects and discards any extra date prefixes (the YYYY-MM-DD parts) added to filenames by prior tagpix runs. It also ensures the new and prior dates match, to avoid stripping any user-added text in the process.
Duplicates are handled automatically
Per the overview above, it's safe to run tagpix to combine trees with duplicate item copies: they are automatically skipped (for duplicate content) or renamed (for duplicates filenames).
Redundant Android dates are dropped automatically
Per the release notes ahead, tagpix discards dates added to filenames by Android cameras that are redundant with dates added by tagpix itself. This keeps your image filenames shorter and is generally what you'll want. For more control, you can also disable and customize this feature with configurations.
Rerunning after errors
It's safe to rerun the script if it exits early, or skips items due to file-transfer errors described earlier. The next run will simply rename and transfer all the items left in the source folder (but be careful not to delete the prior run's results when asked and verified by prompt #6!).
Source-folder content
tagpix always skips both hidden files whose names begin with a . (e.g., Mac OS .DS_Store files), as well as all items in subfolders whose names match the user-configurable skips pattern added in version 2.1 (described ahead). All other items in the source tree are transferred to the destination's folders. See also Resolving Skips above.
Choosing folders to merge
As a rule of thumb, files that are not movies or photos with date-taken tags may be better left out of the tree that tagpix will merge. This includes both scanned photos, whose dates will all reflect scan date instead of event date, and images such as PNGs and GIFs that have no date-taken information. You can merge these too, but scans will be renamed with their scan date (which probably won't be useful alongside photos' date-taken), and images of untagged types will wind up in the OTHERS folder instead of PHOTOS (which merits a separate note, up next).
Moving OTHERS images to PHOTOS
Speaking of the OTHERS results folder: by design, tagpix recognizes photos as images with MIME types that imply Exif tags (as described earlier), and always moves other image types to the OTHERS folder, not PHOTOS. This means that PHOTOS gets all JPEGs and TIFFs (Exif tags or not), but non-photo image types like PNGs, GIFs, and BMPs are routed to OTHERS. If you'd rather see the latter bunch in PHOTOS too, simply move them across manually after a tagpix run; because items in OTHERS are also labeled with dates, they'll work well in PHOTOS alongside your camera JPEGs.

Request for comments: if you think that combining all image types as described here should be automated with a new tagpix option, please send feedback via the Input link in this guide's bottom toolbar. To date, no user (including tagpix's creator) has asserted a need for this, and software growth sans use case is a Generally Bad Thing.

Dates, not times
Time is not included in filename prefixes, because it would make names longer, and camera-added sequence numbers will normally suffice to identify and order photos taken on the same day. Dates are more crucial, as different cameras may use the same sequence numbers. Note that Android cameras may already have a time in their filenames, which tagpix retains, and makes names as unique as sequence numbers.
Modification dates, not creation dates
When picking a date-of-origin prefix, tagpix uses a file's modification date (via Python's os.path.getmtime()) as a last resort, after trying photo Exif tags and then Android filename date (per this). Modification date reflects either the file's creation date (if it has not been edited), or its latest modification (if it has); for unretouched photos, this is normally the true date of origin.

It's worth noting that tagpix by design does not try to use a file's creation date—a datum dependent on both operating system and filesystem. Specifically, file creation date is generally available on Windows only (not on Unix, where it is weakly supported on Mac OS and no better than modification time on Linux), and even where available can sometimes be irrelevant when content changes. For background, try this discussion thread, this filesystems comparison, and Python's os.path.getctime() and os.stat(). Because tagpix works in the woefully unstandardized filesystems realm, it must use modification dates in the name of portability, interoperability, and results that are the same across all supported platforms.

Run other tools on destination folders, not source
Because tagpix's default transfer mode separates images from other content in source folders, it may impact the results of other tools that store data alongside images. For example, tagpix will destroy a thumbspage gallery in a source folder, by separating its index page, thumbnails subfolder, and images. The PyPhoto viewer may be similarly neutered, because its thumbnails-cache files and images are moved to different destinations.

This cannot be remedied (merging metadata of arbitrary tools is impossible), but you can avoid the issue altogether by applying such tools to tagpix destination folders only, not source folders. That is, run other tools on merged results, not unmerged input. Because merged destination folders are only ever extended, their content is never scattered by tagpix. Source folders are generally best used for staging photos to be later moved by tagpix, per the recommendations ahead.

Modes update: though it comes with some tradeoffs, version 2.1's new copy-only mode can now be used to extract images from a source tree without destroying it. See 2.1's release notes ahead. The preceding still applies to both the original and default file-move mode, as well as the new copy-and-delete mode.

Moves across drives and devices
tagpix uses Python's os.rename() to move files from source to destination, which is normally correct, fast, and atomic. File moves can be problematic, though, when run between different devices or filesystems. If a run's moves all fail due to differing devices, make sure your source and destination folders reside on the same writable device—copy the source folder to the same hard drive or SSD as your destination folder, before the tagpix run. This is a minor inconvenience, but makes all tagpix runs quicker, and copying new source images to a temporary staging folder is recommended practice anyhow; merging from a camera or camera card directly leaves no backup copy if anything goes wrong.

Developers notes: Python's os.replace() doesn't help here, because it still raises an exception across different drives and devices on Windows, Mac OS, and Linux (this call just avoids Windows exceptions if the target file exists on the same device). The only alternative to moves is to copy and delete, which can be much slower for large photo archives, and cross-device moves seem too rare and dangerous to justify the slowdown for all use cases—especially when a manual pre-run copy of the source folder takes roughly the same amount of time.

Modes update: though they come with some tradeoffs, version 2.1's new copy-only and copy-and-delete modes can now be used to merge across different drives and devices directly. See 2.1's release notes ahead. The preceding still applies to the original and default file-move mode.

Recent Changes

This section describes changes made in recent tagpix versions. It is meant primarily for developers and prior-version users, though additional usage-level details and context are presented along the way. tagpix is occasionally repackaged with minor documentation-only changes (e.g., to this doc and its demos), but code and functionality changes occur only in the versions listed here.

Version 2.3: Silence Pillow DOS Warning

tagpix was patched and rereleased on September 29, 2020 with two upgrades. The first was a minor UI improvement: at input prompts, typing control+C to exit now yields a user-friendly message instead of a Python exception traceback, and source-file existence is checked ASAP. For example:

~/Desktop/camera$ python3 ~/MY-STUFF/Code/tagpix/tagpix.py
tagpix renames and moves photos to a merged folder; proceed? y
Source - pathname of folder with photos to be moved? ^C
Script not run: no changes made.

~/Desktop/camera$ python3 ~/MY-STUFF/Code/tagpix/tagpix.py
tagpix renames and moves photos to a merged folder; proceed? y
Source - pathname of folder with photos to be moved? Spam
Script not run: source folder does not exist, no changes made.

The second upgrade was more urgent: code was added to silence a bogus DecompressionBombWarning message now issued senselessly by the underlying Pillow library for all large images. Specifically, when running tagpix on images larger than 89MP, the Pillow library by default prints a single DOS (denial of service) warning message in program output that looks like this (with line-breaks added here for marginal readability):

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/PIL/Image.py:2797:
DecompressionBombWarning: Image size (108000000 pixels) exceeds limit of 89478485 pixels, 
could be decompression bomb DOS attack.
  warnings.warn(

This baseless warning is completely harmless, and does not impact tagpix results (large images work either way). But it's also stupidly excessive, and needlessly confuses users of this and many other Pillow-based programs.

It was first seen for perfectly valid 108MP images shot on a Galaxy Note20 Ultra smartphone in 2020, and will crop up for large images created on many other devices and tools in widespread use. Obviously, these are not "attacks," despite the warning's language. Users who see this, however, may assume it reflects bugs or viruses.

To see the changes applied to silence the message, search for Sep-2020 in the source; the fix was trivial, but the cost of rereleasing this and other programs impacted at tagpix's host site was not. Such is life when "batteries included" meets open-source agendas.

Postscript: though scantly documented, it turns out that Pillow later turned the warning described here into a full error for images larger than twice the warning's size limit. This error takes the form of an exception that will cause client programs to fail or terminate. Despite this, its only mention seems to be in an obscure release note. To avoid kills, tagpix's warning-silencing code has been updated to use a new and broader fix—which will suffice only until Pillow tightens the screws again. This check should clearly be opt in for programs that need to care.

Version 2.2: Use and Drop Android Dates

Version 2.2—finalized on December 2018—was a minor release that addressed just one specific issue. Specifically, it was enhanced to automatically process origin dates added to photo filenames by Android cameras: it utilizes these dates if no Exif date-taken tag is present, and discards these dates (but not times) to avoid redundancy with tagpix-added dates. A utility script was also coded to drop Android filename dates on demand for users of prior tagpix releases.

This change applies only to tagpix users who have shot photos on Android devices, or may do so in the future. Given the potential magnitude of this subset, though, the rest of this section provides complete coverage. For a brief look at this change's results, see this log and shot. For the full story, read on.

The Issue

Most digital cameras assign filenames to images using a simple format that accommodates the basic but portable FAT filesystem's 8.3 naming convention. For instance, a DSC or IMG prefix followed by a sequence number suffices to identify images on a given camera, though not across different cameras—one of the main limitations tagpix solves, by expanding the first of the following forms to the second, with a date-of-origin prefix:

DSC03249.JPG
2018-02-05__DSC03249.JPG

By contrast, cameras on some Android devices (and perhaps others) add a date in photo filenames which, combined with an added time, identifies images by their moment of creation, but is redundant with that added by tagpix's own renaming logic. For example, such images' filenames are initially expanded by tagpix from the first of the following to the second:

20180205_154910.jpg
2018-02-05__20180205_154910.jpg

While the Android-added date and time (separated by _ in the first name above) might be a good idea in a world begun anew, they bifurcate the digital-photos world that is. This is a unique and nonstandard naming scheme, that stamps files with a date that makes tagpix filenames longer unnecessarily, and in most cases is fully redundant with both standard in-file Exif creation-date tags (when present and unchanged), and the date-of-origin prefix added to all photos by tagpix (when its source agrees with the Android stamp).

Using Android Dates

That said, blindly deleting the Android date in filenames is too extreme, because it may be the only record of creation date in some scenarios. For example, Android photos edited in tools that discard Exif tags won't have a date-taken tag, but will retain a creation date in their filenames that normally differs from the file's modification date (which is generally a last-edit date).

More subtly, some recent Samsung Android devices never record Exif date-taken tags for front—a.k.a. "selfie"—cameras. This is a known issue that you can explore on the web here and here. It may be a temporary bug that Samsung will fix in an update, and back cameras on these devices do record Exif dates correctly. But discarding the Android filename date of photos shot on such devices' front cameras would also drop valuable metadata found nowhere else.

Because the filename date is potentially useful in such cases, tagpix 2.2 has generalized the way it chooses a date of origin to be used for the prefix it adds to filenames. Formally, it always now tries three sources in turn, until a date is selected:

  1. Use the Exif date-taken tag, if present
  2. Else use the Android filename date, if present
  3. Else use the file's modification date as a last resort
The net effect selects the best date of origin possible for filename prefixes—a crucial part of tagpix's organizational role.

The first step above is applied to photos only (other content type doesn't have Exif tags). The other steps are run for all types of content in source trees, including photos without usable Exif tags.

The second step above is new, requires heuristics to detect dates, and applies only to a subset of users and images, but is necessary to accommodate metadata recorded outside the Exif model by a handful of devices and manufacturers. A special case to be sure, but exceptions seem as much the norm in the digital camera domain as the computer field at large!

Because step two is partly heuristic—it looks for matching strings and checks their content for valid dates—it can also be disabled by setting UseAndroidFilenameDates in the user configs file. This switch is preset to True to cover the norm; set it to False in the unusual event that filenames in your source tree appear to embed Android dates just by coincidence.

Dropping Android Dates

After the tagpix date has been selected per the prior section, tagpix 2.2 addresses the redundancy of Android filename dates with a new renaming step, run before duplicates detection and file move or copy. If enabled by setting DropAndroidFilenameDates to True in the user configs file, the tagpix.py main script now automatically renames merged photo files to drop the superfluous Android date and keep only the tagpix date (along with the Android-added time, which helps identify the photo). For instance, it shortens from the first of the following to the second:

2018-02-05__20180205_154910.jpg
2018-02-05__154910.jpg

This step is enabled by default, because it yields shorter names, and normally has no impact on duplicates processing or content access—the shorter form is no less unique or meaningful than the longer. The tagpix date is usually the same as the Android date, whether it is taken from Exif tags or filename.

As a special case, though, this new renaming step can also be specialized with switch KeepDifferingAndroidFilenameDates to drop only Android dates that are the same as the tagpix date. Though unlikely, the two dates may differ if a photo's Exif-tag date is not the same as its Android-filename date—which is generally possible only after manual changes to either, given tagpix's date-selection algorithm. In such rare cases, the tagpix and Android dates may disagree, as in the following inconsistently changed photo:

2018-08-03__20180408_073757.jpg

Set the keep switch to True in the user configs file if you wish to retain the Android date when it differs this way. This switch defaults to True to be cautious, because an auto-shortened filename carries less information in this case only. Still, this case seems too unlikely to apply to most, if any, users (and if it does apply to you, you probably understand both the perils of manual metadata changes, and the need for such an obscure switch!).

For an example of 2.2's automatic handling of Android filename dates, see the console log here, and the screenshot of its results folder here. In the end, the combination of using and dropping such dates shortens filenames of all photos shot on Android cameras, without sacrificing filename metadata when useful.

On-Demand Renaming

For more specialized roles, 2.2 also adds a new utility script _drop-redundant-dates.py, which can be run on demand to drop all Android dates in images already processed by a former version of tagpix (or a later version run with auto-renaming disabled).

This utility script is never required for users of tagpix 2.2+ if auto-renaming is enabled, and usually must be run just once by pre-2.2 users who have upgraded. It is also somewhat naive: it makes no attempt to determine if the Android date dropped differs from that of the tagpix date formerly added. Be sure to use its list-only mode to preview changes before running it to update photos; because prior versions of tagpix didn't use filename dates in the absence of Exif dates, some formerly-merged Android photos may be labeled with file-modification date instead.

One special case here: as described in the new utility script's docstring, if you're using a tool that relies on the names of images, you may need to rerun the tool after running the utility script, to pick up the new names. This requirement naturally varies per tool. For instance, the HTML viewer pages generated by the thumbspage gallery builder hardcode image filenames, which can be invalidated by later renames. On the other hand, this isn't a concern for the PyPhoto GUI viewer, which updates its thumbnails cache automatically on image changes.

This special case is also completely irrelevant when using the 2.2 automatic renaming of tagpix.py, because its renaming occurs before other tools can be run on its merged results. Where possible, use automatic renaming instead of the on-demand utility script.

Request for comments: there undoubtedly are additional device-specific photo-naming conventions beyond the Android camera pattern addressed here (e.g., some Windows screenshot names may redundantly embed date/time information too). If you'd like to see other filenames accommodated by tagpix, please send feedback via the Input link in this doc's bottom toolbar. As it stands, device manufacturers seem to be climbing over each other to come up with proprietary naming conventions with no interest in standardization or interoperability, and supporting all the constantly changing variants in this context would be akin to herding cats.

Version 2.1: Multiple Enhancements

Version 2.1—finalized on October 2018—was a major update, which generalized source-tree subfolder skips; added a simple but crucial deletion verification; improved duplicates detection; introduced new file-transfer modes that copy instead of move; and cleaned up a few dark but rare corners.

Code refactoring, user configs file
Some code was refactored to remove redundancy (including three same-work loops merged into one: see moveall()). This had no impact on program operation or results, but makes future changes easier. A new file was also added for user configurations, user_configs.py. This has only a small number of settings a present but better supports future customizations.
Subfolder skips enhancements
Version 2.1 generalizes the code that skips source-tree subfolders to use a regular expression pattern that can be more easily modified by users to skip additional folders. To extend or customize the set of subfolders skipped, modify the setting for variable IgnoreFoldersPattern in the user-configurations file user_configs.py. This pattern's new preset skips .* hidden folders; thumbs thumbnail folders created by some tools (including older versions of PyPhoto that predate its single-file caches); and _thumbspage thumbnail/viewer-page folders created by the latest thumbspage image-gallery builder.

For a demo of 2.1 subfolder skipping, see this example. Note that this matters only for subfolders having irrelevant images (e.g., thumbnails); applies only to folders in your source tree (the destination tree is not scanned for images to add to the collection); and is not required if your source folders to be skipped are named with a leading . (the pattern preset already skips all such folders, though some zip and backup tools may skip them too). The code now also correctly skips multiple matching folders when present.

Prior-output deletion verifications
Version 2.1 now verifies deletion of prior-run outputs with an extra input after each prompt #6, because the deletion is immediate (and if unintended might be catastrophic!). Reply with an n or simply press the Enter/return key to cancel the delete (a control+C at any prompt works to kill the program in general, but may be too late for weary users to apply):
Delete all prior-run outputs in "./MERGED/PHOTOS"? y
....About to delete: ARE YOU SURE? n
Delete all prior-run outputs in "./MERGED/OTHERS"? y
....About to delete: ARE YOU SURE? 
Duplicate ID numbers per file, not category
Version 2.1 now assigns unique ID sequence numbers per individual file, not across an entire content category. These numbers are used to create unique filenames, for files of the same name but different content. The original tagpix used a single per-run counter; 2.0 used 3 per-category counters; and 2.1 now counts up from 1 for each file with duplicates. This makes duplicate filenames more coherent (they are numbered strictly 1..N), but is also crucial for detecting duplicate content across all of a filenames' variants, as required by the next item; when IDs were unique within a category only, a prior run's IDs might be arbitrarily higher for a given filename than those of a later run, make same-duplicate detection difficult.
Improved handling of rare duplicate cases
Version 2.1 repairs a minor defect that was never observed in 5 years of practice, and seems about as likely to occur as lightning striking the machine running the script. But: if there were three source image files with the same filename; and two of these files' content differed from the first moved; and the two duplicate files were merged to the first-moved's destination folder by two different tagpix runs; and the numeric-ID suffix added to the two duplicate files' names happened to be the same on each of the different runs; then the filename generated for the two duplicates might be the same—causing an exception on Windows, and overwrites on Unix.

The simple fix, in moveone() of the script, is to increment the numeric-ID suffix in a loop, until the resulting filename either does not exist in the destination folder or matches an existing same-named file there by content (as formerly done in the related music-file program flatten-itunes). This avoids file overwrites in all contexts (the former defect), but also correctly skips all same-content images for a given filename—whether they match the first instance of the filename moved to the destination (as before), or any differing-content duplicate added later with a uniquely suffixed ID (new behavior).

For a short demo of the new duplicates-resolution logic in action, see this example. The new behavior—skipping duplicates having content the same as another duplicate—addresses the unlikely event of modified copies being copied to multiple folders unmodified. This works well and as it should, but is also the tagpix equivalent of a second lightning strike...

New copy-only and copy-and-delete mode options
Version 2.1 adds both copy-only and copy-and-delete file transfer modes, enabled by settings in user_configs.py. These are alternatives to the original and default file-move mode, which always removes files from the source tree by definition. The two new modes copy source files byte-for-byte to the destination, instead of directly moving them. This makes the new modes run slower, but in some roles can make manual source-content copies unnecessary, and lets you use tagpix in additional contexts:

Copy-and-delete mode
Allows tagpix to work with source and destination folders on different devices. For instance, this mode can be used to run merges between a camera card or USB flashdrive, and a PC's internal drive. Direct moves fail when source and destination folders are on different drives.
Copy-only mode
Allows tagpix to extract images from a source tree without changing the tree's contents in any way. For example, this mode can be used to collect images from gallery or viewer folders, while leaving those folders intact. Direct moves may separate, and thereby destroy, the content of such folders. Like copy-and-delete, copy-only mode can also be used when folders reside on different drives.

In short, these two new modes provide extra utility, as captured in this example. Nevertheless, the original file-move mode is still the tagpix preset default, both because moves always run faster than copies, and because this mode promotes better practice. In terms of practice:

Hence, as both general rule and recommended usage: copy your initial or new source images to a temporary staging folder to be used as the tagpix source tree, and use the default file-move mode. Unless your use case is more custom, this is still the best and safest way to use tagpix.

Version 2.0: Numerous Upgrades

Version 2.0—finalized on October 2017—was a major step up from the former, simplistic script, as summarized below.

Changes Made

Among version 2.0's foremost improvements, it now:

Parameters
Gets all run parameters as console inputs (not code variables). Command-line arguments are not used, because they are cryptic; to provide input programmatically, redirect stdin to a file of precoded replies—or a shell in-script 'here' document, as described above and later here. Per earlier, also sends prompts to stderr so stdout report text can be saved for easier review.
List-only mode
Adds an option to list planned changes only, making no changes. Use this to inspect and verify proposed changes without applying them.
Year subfolders
Adds an option to group the resulting flat folders into by-year subfolders automatically (for photos, movies, and others).
TIFFs and mimetypes
Handles non-JPEG images by using Python's mimetypes module, so other images may be treated as photos too. Still, because Exif tags are apparently used only by JPEG and TIFF images and WAV audio (PNG and WebP images may have metadata too, but their standards and support are evolving), only JPEG and TIFF mime types are treated as 'photos' here; others go to the OTHERS folder: as images, but not photos. For more details, try this page or a web search. 2.0 also uses mimetypes for movie detection, adding newer video types in case some platforms do not.
Source folder
Allows the source folder to be separate from this script's own folder. Moving huge photo archives to a temp folder can be expensive (one subject folder was 75G). To use the prior model, copy images to ./SOURCE (in the current working directory (CWD), which is the script's own folder if it's run from there), and press Enter when asked for the source folder's path.
Destination folder
Allows the results folder to be separate from this script's own folder (i.e., CWD). This in turn allows the program to extend a prior run's results when desired, instead of always making a new archive folder (see Usage Modes). To use the prior model, press Enter when asked for the destination folder's path, and copy results from ./MERGED.
Movies folder
Moves all video mime-type files to a new MOVIES subfolder, instead of lumping them in with OTHERS as before (or PHOTOS).
Additional changes
Addresses additional issues cut short here for space—see the code for more details on the following:

Open Issues

Despite its upgrades, version 2.0 left the following issues on the table (see also the later changes in 2.1 and 2.2):

Report location
This release allows its output to be routed to a file with its stderr/stdout split model, but it could instead always save the report in the MERGED root folder of the results, with an appended date/time suffix. This was not implemented because the reports might become unwelcome trash after many runs, but that rationale is open to debate.
Windows path lengths
tagpix could support too-long pathnames on Windows with the \\?\ pathname-prefix trick (like Mergeall and ziptools). But this case is rare, it can be addressed by using a shorter (higher) destination-folder path, and users may not be able to view the results in Explorer anyhow. Punt in this release, but revisit if feedback warrants (see Input in the toolbar below).

Prior to version 2.0, thumbspage was a basic, tactical script that was neither robust nor customizable. And then it was used.

Usage Caution

tagpix has been tested extensively and used successfully on extremely large photo collections, including all those of its creator, and it will likely perform well on yours too. It is provided freely because it can help you organize your photo libraries. Especially given the many ways that computers can fail, however, a word of caution is in order:

By design, this script's default operation moves and renames all photos and other files in an entire source folder tree. No automated method for undoing the changes it makes is provided, and no warranty is included with this program. Please read all usage details in this document carefully before running tagpix on your photos. It is strongly recommended to preview changes with list-only mode before applying them; and either run tagpix on a temporary copy of your source folder tree, or enable its copy-only transfer mode in file user_configs.py to avoid source-tree changes.

Lest that sound too dire, keep in mind that tagpix never changes photo content (it transfers and renames them only), and errors simply leave items in their original location in all transfer modes (a rerun can propagate them to the destination). Moreover, if you always copy/paste new images from your camera's storage to a tagpix staging folder (per the preceding notebox's recommendation), the camera's storage will automatically serve as a backup copy, regardless of this program's operation.

Still, the importance of your photos merits a complete understanding of any tool that modifies them—this one included.



[Python Logo] Top Code Page News Blog Apps Input ©M.Lutz