tagpix — Combine Your Photos for Easy Access
This is the tagpix user guide. It includes an overview, usage instructions, and version changes. Whether you consider yourself a programmer or end user, you'll find resources to help get you started organizing photos with tagpix here.
Because tagpix by default moves and renames photos, users are encouraged to read this guide first—especially its caution—before running tagpix on valued photo collections. For this program's license, see its main script. For screenshots, click the image above. For code and examples, see the install folder. To download this program, visit its web page.
tagpix is a photo organizer that merges and labels your photos for convenient access. It collects, renames, and sorts them into a normalized folder structure, resolving duplicate content and filenames automatically in the process. This section introduces the basics of its roles and operation.
If your digital photo collection has become scattered over many folders; uses filenames that are not unique because of their origin on multiple cameras; hosts modification dates that reflect retouches instead of events; or contains arbitrary duplicates, tagpix may be the photo-organizing tool you've been looking for. Running it on your photo folders transforms them into a simple, uniform format that's ideal for both viewing and archiving.
Just as importantly, tagpix is an open-source program that makes hidden agendas impossible, and its merged result is as private as the device on which it is stored. With tagpix, access to a folder, and a few simple commands, control of your photo archives remains with you, not a proprietary, closed program or device.
tagpix transfers all the files in an entire folder tree to a flat folder, without changing their content. Along the way, it adds date-of-origin to the front of the names of files transferred to make them unique and sortable; skips any truly duplicate content, and adds a unique serial number to the end of any remaining duplicate filenames; isolates movies and other non-photo files in folders of their own; and groups all transferred items into by-year subfolders on request.
The net effect is useful for organizing the contents of disparate photo collections holding pictures and movies shot on multiple cameras over many years. By running tagpix, all the items of each media type are merged on your local computer into a single flat folder, or a set of flat by-year subfolders, for fast, convenient, and private access.
In more detail, the following summarize the main assets that tagpix brings to your photo-normalization jobs.
tagpix walks all the folders and subfolders in the source tree to find content to be collected and combined in the destination folder. Depending on user configurations, items located and transferred can be either:
Because it's fastest, moves are the default. Copy-and-delete mode has the same effect as moves, but allows items to be moved between different devices and drives (albeit, more slowly than direct moves on the same device). Copy-only mode leaves items in the source tree and works across devices, but may require manual steps to avoid reprocessing prior content on later runs.
tagpix resolves same-name conflicts between different cameras' content by adding a date-of-origin prefix to all transferred filenames (e.g., "xxx.jpg" becomes "2017-10-14__xxx.jpg" in the destination, only). For photos, the prefix uses date taken, extracted from standard photo-file Exif metadata tags when available. For photos with no Exif date-taken tag, and for other types of files, the prefix instead uses either the date-taken string embedded in Android photo filenames, or else the date-modified value of the file itself.
As an example, when tagpix encounters the first of the following in a source folder, the file's name is expanded to the second form to make it unique across multiple cameras that may produce the same filename for different photos shot on different dates:
Whether the added date comes from Exif photo tags, Android filename, or the file itself, the net effect makes the names of photos shot on different dates unique in the result's flat merged folders. When date taken is available in Exif tags or Android filename, the expanded name also reflects the date of the scene capture, not the most recent retouch.
tagpix automatically detects and resolves true duplicates in the tree. When image files have the same name after adding their date prefix, it first runs a full byte-by-byte comparison of their content. If the files' content is exactly the same, the redundant copy in the source tree is skipped and not added to the result. If their content differs, the new copy's filename is extended with a unique serial-number suffix (e.g., "date__xxx__N.jpg") and added to the result.
As another example, if the preceding example's file has already been processed by tagpix, and a new same-named and same-dated file like the first of the following is encountered in the source folder, the new image will be either discarded if its content is the same as the file already processed, or expanded to the second form to make its name unique if its content differs:
This means your merged folders will keep just one copy of true duplicates, but all versions of same-named and same-dated content that differs—a rare scenario across different cameras, but possible and even normal if you've retouched or resized a photo and saved it with the same filename in a different folder, and the same date of origin per its Exif tags, Android filename, or file-modification date.
As of version 2.1, tagpix also skips even rarer duplicates of duplicates, that may arise if modified copies are copied to multiple folders unmodified. Regardless of their source, tagpix keeps true duplicates out of the merged result automatically, and renames files with the same name and date of origin but differing content to make them unique.
tagpix always groups merged items by content type, creating separate folders for photos, movies, and others. Photos from cameras are usually JPEG files, but are recognized by both MIME type (which keys off of filename extension) and Exif tag use. This means that both JPEGs and TIFFs using any related filename extension are treated as photos by the program (other images are considered other content). Movies are similarly classified per MIME types and segregated from photos and other content for direct access.
As an option, items in all three file-type folders can also be grouped by year of origin. If this option is selected, each content type's folder will be grouped into by-year subfolders instead of a flat list of items. Either way, the duplicate-resolution steps of the preceding two items are applied to all three content-type folders. For instance, duplicate copies of movies in the source tree are skipped too.
In addition, tagpix strips prior runs' date prefixes so you can rerun it any number of times on prior results; discards redundant filename dates in photos shot on Android devices; comes with a list-only mode that allows you to preview its intentions without making any changes (a recommended first step); and generates a report that describes all the updates it performs and any files it skips.
Read on to learn how to use tagpix to organize your photos.
This section describes tagpix install requirements, inputs and results, usage modes, and other operational details.
tagpix is a Python program that runs on all major platforms, and is provided in source-code form. To install the program itself, download its zipfile from the following web page and unzip it on your computer:
https://learning-python.com/tagpix.htmltagpix also requires installs of either a Python 3.X or 2.X to run its source code, plus the third-party Pillow (a.k.a. PIL) image library for the installed Python to access photo tags. Fetch and install these items if needed from the following sites, respectively (or search the web for other links):
tagpix will work on any platform that runs Python and Pillow, and has the required folder and file access permissions. For example, the program has been verified on Windows, Mac OS, and Linux (Android may impose extra access requirements beyond this guide's scope).
For pointers on Pillow installs, see this page. A note for developers: the exif.py tags-processing alternative to Pillow failed for some files when tested in 2013 for tagpix version 1.0, though your results may vary, and there are other Exif alternatives in the open-source domain.
To launch, run script tagpix.py with no command-line arguments. It can be run from a console (e.g., Terminal on Unix and Command Prompt on Windows) and most Python IDEs (e.g., PyEdit as captured here, or Python's own IDLE), though IDEs may not support report routing described ahead.
All run parameters are requested by the following prompts at the program's console:
tagpix renames and moves photos to a merged folder; proceed?
Source - pathname of folder with photos to be moved?
Destination - pathname of folder to move items to?
Group items into by-year subfolders?
List only: show new names, but do not rename or move?
Delete all prior-run outputs in "<output folder name>"?
For all prompts except #2 and #3, type "y" for yes, and type "n" or simply press Enter (return) for no.
To end the script immediately without making any changes, reply no to prompt #1, or enter control+C (or otherwise kill the program) at any other prompt. List-only mode (replying yes to #5) analyzes content and shows planned changes but does not perform them; use this to preview and verify the script's updates. Prompt #6 is important when rerunning tagpix; see ahead here and here for its roles, as well as its verifications added in version 2.1 (you'll generally want to reply "n" (no) unless erasing an existing archive).
For more comprehensive tagpix command-line usage examples, browse the examples folder included in its install package. There, you'll find console logs that demonstrate a variety of options on a variety of platforms. Perhaps the most typical use case is captured in this example.
In all usage modes, the paths you input at prompts #2 and #3 can be either relative to your current location in a console (e.g., "." for the current folder), or absolute (e.g., "/Users/you/photos" on Unix, "C:\My-Photos\unmerged on Windows). For instance, when running tagpix via command lines, you can "cd" to the folder containing your MERGED destination folder and/or source folder, and give folder paths relative to where you are working.
To illustrate, the following kicks off a tagpix updates run on Unix after changing to the folder containing both the MERGED result tree and a "New-unmerged" folder holding the new photos to add to MERGED. Both folders are in the current directory (a.k.a. ".") after the "cd" command (relative), and the tagpix script itself is elsewhere (absolute). User-entered commands and replies are in bold font:
~$ cd /MY-STUFF/Camera/Digital-cameras-merged /MY-STUFF/Camera/Digital-cameras-merged$ python3 /MY-STUFF/Code/tagpix/tagpix.py tagpix renames and moves photos to a merged folder; proceed? y Source - pathname of folder with photos to be moved? New-unmerged Destination - pathname of folder to move items to? . Group items into by-year subfolders? y List only: show target names, but do not rename or move? n Delete all prior-run outputs in "./MERGED/PHOTOS"? n Delete all prior-run outputs in "./MERGED/MOVIES"? n Delete all prior-run outputs in "./MERGED/OTHERS"? n ...report messages show up here...
Absolute paths are generally required when running tagpix from an IDE such as PyEdit, because the IDE's current directory may not be related to your image folders, and may not be useful for relative paths; see your file explorer's copy-path option to paste a folder's absolute path at tagpix prompts easily. As usual, the tagpix.py script's path in command lines can be relative or absolute too depending on where commands are run, and is not required if the script is open and run from an IDE.
This script's initial prompts are printed to the stderr stream, and its report is printed to stdout. Both go to the console by default, but this two-stream model allows you to save the tagpix report to a file for later inspection—especially handy for larger runs. To start tagpix and save just its report to a file, use a console command line like this to route stdout to a file (">" shell syntax will not work when running tagpix from most IDEs):
python tagpix.py > report.txt
Any special message lines in the report all begin with "***"; search for this in the saved report text after a tagpix run.
For a sample of report content, see the demo logs in the example runs folder; report text is all that following the last input prompt. For a comprehensive report example from a tagpix run on a very large photo collection, including duplicates, locked-file errors, prior-run dates, and more, see this file.
The script's results show up in the "MERGED" folder nested in the destination folder (prompt #3), split into "PHOTOS," "MOVIES," and "OTHERS" subfolders that each contain merged and uniquely named content files. If you reply yes to prompt #4, these three subfolders further group their content into year subfolders. Specifically, the results are organized into a shallow tree as follows:
Destination or ./ MERGED/ PHOTOS/ flat content, or year subfolders with flat content MOVIES/ flat content, or year subfolders with flat content OTHERS/ flat content, or year subfolders with flat content
As described earlier, all filenames at the bottom levels of the results tree include date prefixes added to make them unique (e.g., "2017-10-14__file.jpg"). The dates added reflect either date-taken Exif tag values (for most shot photos), date-taken date in Android filenames (for Android photos with no Exif date), or date-modified file attributes (for all others).
For photo files, date taken is always used if present, because it both ensures that names are unique (different cameras may reuse the same names), and reflects the recorded event's date (modification date may instead be a latest-retouch date after edits, but a date-taken tag is likely to survive). Although date taken may not apply to photo scans, for most photos shot on digital cameras the expanded names chronologically identify both the photos themselves and the scenes they capture.
Items not recognized as movies or tagged photos are moved (or copied) to OTHERS. After a tagpix run, you may wish to manually remove items from OTHERS that reflect camera-specific cruft. For example, some cameras create ".THM" or ".CTG" files which are irrelevant to your content in PHOTOS and MOVIES. tagpix does not omit these automatically, because it prefers to err on the side of caution (only well-known ".*" hidden files and user-selected subfolders are skipped, per the next section). Be sure to delete only cruft: the OTHERS result folder may contain non-camera images like PNGs and GIFs too.
For a more graphical look at results trees, see the examples folder's screenshots of both flat and group-by-year modes.
Following a run, you should check the report's final "Missed" section to see if any files were skipped due to:
All items skipped are left intact in the source tree, and listed in the "Missed" section.
If the "Missed" line shows "0" skips, or if you are okay with the items skipped, delete the contents of your source folder after the run if desired; if there were no skips, it's just empty directories.
If the "Missed" line's skips is not "0" and valid items were skipped due to errors, resolve their issues (e.g., fix locks or permissions, or use a shorter destination path on Windows) and rerun tagpix to transfer them. For the rerun, use the same source and destination folders, and do not delete the prior run's results (for prompts #2, #3, and #6).
Note: the above pertains to file-move and copy-and-delete transfer modes only. When tagpix is run in copy-only mode, added in version 2.1, it does not produce a "Missed" line or section in the results report, because no files are removed from the source tree. Instead, the end of the report in this mode concludes with a message "Nothing was removed from the source tree." To analyze skips in copy-only mode, search for messages earlier in the report, as described for the three categories listed above.
Depending on the replies you provide to input prompts, you can use this script to either extend an existing archive or make one anew, and can do both with the aid of another program:
For an example of usage mode A, see the examples here and here. For additional usage-mode examples, see the full examples folder. For alternative file transfer modes, see version 2.1 release notes.
This section collects smaller usage notes and tips. Some summarize earlier coverage.
It's worth noting that tagpix by design does not try to use a file's creation date—a datum dependent on both operating system and filesystem. Specifically, file creation date is generally available on Windows only (not on Unix, where it is weakly supported on Mac OS and no better than modification time on Linux), and even where available can sometimes be irrelevant when content changes. For background, try this discussion thread, this filesystems comparison, and Python's os.path.getctime() and os.stat(). Because tagpix works in the woefully unstandardized filesystems realm, it must use modification dates in the name of portability, interoperability, and results that are the same across all supported platforms.
Because tagpix's default transfer mode separates images from other content in source folders, it may impact the results of other tools that store data alongside images. For example, tagpix will destroy a thumbspage gallery in a source folder, by separating its index page, thumbnails subfolder, and images. The PyPhoto viewer may be similarly neutered, because its thumbnails-cache files and images are moved to different destinations.
This cannot be remedied (merging metadata of arbitrary tools is impossible), but you can avoid the issue altogether by applying such tools to tagpix destination folders only, not source folders. That is, run other tools on merged results, not unmerged input. Because merged destination folders are only ever extended, their content is never scattered by tagpix. Source folders are generally best used for staging photos to be later moved by tagpix, per the recommendations ahead.
tagpix uses Python's os.rename() to move files from source to destination, which is normally correct, fast, and atomic. File moves can be problematic, though, when run between different devices or filesystems. If a run's moves all fail due to differing devices, make sure your source and destination folders reside on the same writable device—copy the source folder to the same hard drive or SSD as your destination folder, before the tagpix run. This is a minor inconvenience, but makes all tagpix runs quicker, and copying new source images to a temporary staging folder is recommended practice anyhow; merging from a camera or camera card directly leaves no backup copy if anything goes wrong.
Developers notes: Python's os.replace() doesn't help here, because it still raises an exception across different drives and devices on Windows, Mac OS, and Linux (this call just avoids Windows exceptions if the target file exists on the same device). The only alternative to moves is to copy and delete, which can be much slower for large photo archives, and cross-device moves seem too rare and dangerous to justify the slowdown for all use cases—especially when a manual pre-run copy of the source folder takes roughly the same amount of time.
This section describes changes made in recent tagpix versions. It is meant primarily for developers and prior-version users, though additional usage-level details and context are presented along the way.
Version 2.2 (December 2018) was a minor release that addressed just one specific issue. Specifically, it was enhanced to automatically process origin dates added to photo filenames by Android cameras: it utilizes these dates if no Exif date-taken tag is present, and discards these dates (but not times) to avoid redundancy with tagpix-added dates. A utility script was also coded to drop Android filename dates on demand for users of prior tagpix releases.
This change applies only to tagpix users who have shot photos on Android devices, or may do so in the future. Given the potential magnitude of this subset, though, the rest of this section provides complete coverage. For a brief look at this change's results, see this log and shot. For the full story, read on.
Most digital cameras assign filenames to images using a simple format that accommodates the basic but portable FAT filesystem's 8.3 naming convention. For instance, a "DSC" or "IMG" prefix followed by a sequence number suffices to identify images on a given camera, though not across different cameras—one of the main limitations tagpix solves, by expanding the first of the following forms to the second, with a date-of-origin prefix:
By contrast, cameras on some Android devices (and perhaps others) add a date in photo filenames which, combined with an added time, identifies images by their moment of creation, but is redundant with that added by tagpix's own renaming logic. For example, such images' filenames are initially expanded by tagpix from the first of the following to the second:
While the Android-added date and time (separated by "_" in the first name above) might be a good idea in a world begun anew, they bifurcate the digital-photos world that is. This is a unique and nonstandard naming scheme, that stamps files with a date that makes tagpix filenames longer unnecessarily, and in most cases is fully redundant with both in-file Exif creation-date tags (when present and unchanged), and the date-of-origin prefix added to all photos by tagpix (when its source agrees with the Android stamp).
That said, blindly deleting the Android date in filenames is too extreme, because it may be the only record of creation date in some scenarios. For example, Android photos edited in tools that discard Exif tags won't have a date-taken tag, but will retain a creation date in their filenames that normally differs from the file's modification date (which is generally a last-edit date).
More subtly, some recent Samsung Android devices never record Exif date-taken tags for front—a.k.a. "selfie"—cameras. This is a known issue that you can explore on the web here and here. It may be a temporary bug that Samsung will fix in an update, and back cameras on these devices do record Exif dates correctly. But discarding the Android filename date of photos shot on such devices' front cameras would also drop valuable metadata found nowhere else.
Because the filename date is potentially useful in such cases, tagpix 2.2 has generalized the way it chooses a date of origin to be used for the prefix it adds to filenames. Formally, it always now tries three sources in turn, until a date is selected:
The first step above is applied to photos only (other content type doesn't have Exif tags). The other steps are run for all types of content in source trees, including photos without usable Exif tags.
The second step above is new, requires heuristics to detect dates, and applies only to a subset of users and images, but is necessary to accommodate metadata recorded outside the Exif model by a handful of devices and manufacturers. A special case to be sure, but exceptions seem as much the norm in the digital camera domain as the computer field at large!
Because step two is partly heuristic—it looks for matching strings and checks their content for valid dates—it can also be disabled by setting UseAndroidFilenameDates to False in the user configs file. This switch is preset to True to cover the norm; set it to False in the unusual event that filenames in your source tree appear to embed Android dates just by coincidence.
After the tagpix date has been selected per the prior section, tagpix 2.2 addresses the redundancy of Android filename dates with a new renaming step, run before duplicates detection and file move or copy. If enabled by setting DropAndroidFilenameDates to True in the user configs file, the tagpix.py main script now automatically renames merged photo files to drop the superfluous Android date and keep only the tagpix date (along with the Android-added time, which helps identify the photo). For instance, it shortens from the first of the following to the second:
This step is enabled by default, because it yields shorter names, and normally has no impact on duplicates processing or content access—the shorter form is no less unique or meaningful than the longer. The tagpix date is usually the same as the Android date, whether it is taken from Exif tags or filename.
As a special case, though, this new renaming step can also be specialized with switch KeepDifferingAndroidFilenameDates to drop only Android dates that are the same as the tagpix date. Though unlikely, the two dates may differ if a photo's Exif-tag date is not the same as its Android-filename date—which is generally possible only after manual changes to either, given tagpix's date-selection algorithm. In such rare cases, the tagpix and Android dates may disagree, as in the following inconsistently changed photo:
Set the keep switch to True in the user configs file if you wish to retain the Android date when it differs this way. This switch defaults to True to be cautious, because an auto-shortened filename carries less information in this case only. Still, this case seems too unlikely to apply to most, if any, users (and if it does apply to you, you probably understand both the perils of manual metadata changes, and the need for such an obscure switch!).
For an example of 2.2's automatic handling of Android filename dates, see the console log here, and the screenshot of its results folder here. In the end, the combination of using and dropping such dates shortens filenames of all photos shot on Android cameras, without sacrificing filename metadata when useful.
For more specialized roles, 2.2 also adds a new utility script _drop-redundant-dates.py, which can be run on demand to drop all Android dates in images already processed by a former version of tagpix (or a later version run with auto-renaming disabled).
This utility script is never required for users of tagpix 2.2+ if auto-renaming is enabled, and usually must be run just once by pre-2.2 users who have upgraded. It is also somewhat naive: it makes no attempt to determine if the Android date dropped differs from that of the tagpix date formerly added. Be sure to use its list-only mode to preview changes before running it to update photos; because prior versions of tagpix didn't use filename dates in the absence of Exif dates, some formerly-merged Android photos may be labeled with file-modification date instead.
One special case here: as described in the new utility script's docstring, if you're using a tool that relies on the names of images, you may need to rerun the tool after running the utility script, to pick up the new names. This requirement naturally varies per tool. For instance, the HTML viewer pages generated by the thumbspage gallery builder hardcode image filenames, which can be invalidated by later renames. On the other hand, this isn't a concern for the PyPhoto GUI viewer, which updates its thumbnails cache automatically on image changes.
This special case is also completely irrelevant when using the 2.2 automatic renaming of tagpix.py, because its renaming occurs before other tools can be run on its merged results. Where possible, use automatic renaming instead of the on-demand utility script.
Version 2.1 (October 2018) was a major update, which generalized source-tree subfolder skips; added a simple but crucial deletion verification; improved duplicates detection; introduced new file-transfer modes that copy instead of move; and cleaned up a few dark but rare corners.
For a demo of 2.1 subfolder skipping, see this example. Note that this matters only for subfolders having irrelevant images (e.g., thumbnails); applies only to folders in your source tree (the destination tree is not scanned for images to add to the collection); and is not required if your source folders to be skipped are named with a leading "." (the pattern preset already skips all such folders, though some zip and backup tools may skip them too). The code now also correctly skips multiple matching folders when present.
Delete all prior-run outputs in "./MERGED/PHOTOS"? y ....About to delete: ARE YOU SURE? n Delete all prior-run outputs in "./MERGED/OTHERS"? y ....About to delete: ARE YOU SURE?
The simple fix, in moveone() of the script, is to increment the numeric-id suffix in a loop, until the resulting filename either does not exist in the destination folder or matches an existing same-named file there by content (as formerly done in the related music-file program flatten-itunes). This avoids file overwrites in all contexts (the former defect), but also correctly skips all same-content images for a given filename—whether they match the first instance of the filename moved to the destination (as before), or any differing-content duplicate added later with a uniquely suffixed id (new behavior).
For a short demo of the new duplicates-resolution logic in action, see this example. The new behavior—skipping duplicates having content the same as another duplicate—addresses the unlikely event of modified copies being copied to multiple folders unmodified. This works well and as it should, but is also the tagpix equivalent of a second lightning strike...
In short, these two new modes provide extra utility, as captured in this example. Nevertheless, the original file-move mode is still the tagpix preset default, both because moves always run faster than copies, and because this mode promotes better practice. In terms of practice:
Hence, as both general rule and recommended usage: copy your initial or new source images to a temporary staging folder to be used as the tagpix source tree, and use the default file-move mode. Unless your use case is more custom, this is still the best and safest way to use tagpix.
Version 2.0 (October 2017) was a major step up from the former, simplistic script, as summarized below.
Among version 2.0's foremost improvements, it now:
Despite its upgrades, version 2.0 left the following issues on the table (see also the later changes in 2.1 and 2.2):
tagpix has been tested extensively and used successfully on extremely large photo collections, and will likely perform well on yours too. It is provided freely because it can help you organize your photo libraries. Especially given the many ways that computers can fail, however, a word of caution is in order:
By design, this script's default operation moves and renames all photos and other files in an entire source folder tree. No automated method for undoing the changes it makes is provided, and no warranty is included with this program. Please read all usage details in this document carefully before running tagpix on your photos. It is strongly recommended to preview changes with list-only mode before applying them; and either run tagpix on a temporary copy of your source folder tree, or enable its copy-only transfer mode in file user_configs.py to avoid source-tree changes.
Lest that sound too dire, keep in mind that tagpix never changes photo content (it transfers and renames them only), and errors simply leave items in their original location in all transfer modes (a rerun can propagate them to the destination). Moreover, if you always copy/paste new images from your camera's storage to a tagpix staging folder (per the preceding notebox's recommendation), the camera's storage will automatically serve as a backup copy, regardless of this program's operation.
Still, the importance of your photos merits a complete understanding of any tool that modifies them—this one included.