File: pymailgui-products/unzipped/PyMailGui-PP4E/fixTkBMP.py
""" ================================================================================ [4.0] Sanitize text for GUI display, replacing characters outside Tk's BMP code-point range with the standard Unicode replacement character. Without this, GUIs may be left half-drawn or hung. This code has also been deployed in frigcal (for calendar content); mergeall (for filenames in scrolled messages); and pyedit (for both standalone use embedded roles here). DISCUSSION: At least through Tk 8.6, Tk cannot display Unicode characters outside the U+0000..U+FFFF BMP (UCS-2) code-point range. This issue has been popping up increasingly in PyMailGUI, as people have begun sending newer emojis in: -Main message text (displayed in view windows and texteditor popups) -Header line text (displayed in view and possibly list windows) -Text attachments (displayed in texteditor popups) -Attachment filenames (displayed in view window part buttons) -HTML part text (displayed in view windows and texteditor popups) -Info message boxes (when filenames are included in the display) -Ooen dialogs (when tkinter saves a prior choice having emojis) Any one of these display contexts can disable a Tk-based GUI if unhandled. Specifically, an uncaught exception is raised by Python's tkinter module, which is displayed on the console (if one exits) and causes the GUI's currently-running code to exit and return to the GUI event loop: _tkinter.TclError: character U+1f60a is above the range (U+0000-U+FFFF) allowed by Tcl Some of the above contexts were addressed individually in the past with 'try' statements, but an emoji in an important attachment's filename that rendered it unreadable finally escalated this issue to global status. This can also impact other programs, including mergeall (filenames in scrolled output) and frigcal (calendar content from other programs). To address, call this function to sanitize all text passed to the GUI for display. It replaces any non-BMP characters with the standard Unicode replacement character U+FFFD, which Tk displays as a highlighted question mark diamond on Windows (and the same or similar elsewhere). This is not ideal and slows and clutters code, but email providers seem intent on rushing to proprietary characters not supported by other clients written just a few years ago, and replacements are better than exceptions. That is, emojis kill programs! They impact potentially every text display program ever written. Were Unicode jack-o-lanterns really that important? And wouldn't embedded images in HTML mails have achieved the very same goal? Alas, those who show up for standards meetings set the standards... Note: this workaround assumes Tk 8.7 will lift the BMP restriction in 2017 or later, per a dev rumor; if not, the code below should be updated (TBD). ABOUT INVALID EMAILS: Caveat: at least one email source has been seen sending UTF-16 headers text having embedded UTF-16 surrogate-pair bytes for emojis, as raw unmarked bytes without the required MIME encoding. Such invalid text is not and cannot be decoded from bytes to Unicode characters. Its UTF-16 surrogate bytes are properly interpreted as ASCII here, and display as odd fraction character symbols (the glyph of the 16-bit value used to mark surrogate pairs when encoded per UTF-16) instead of the standard Unicode replacement character. Mail clients cannot "guess" that ASCII text isn't ASCII. On the other hand, such invalid text will not crash the GUI, and PyMailGUI cannot fix broken mailers (this same mailer has sent UTF-16 text incorrectly encoded as quoted-printable UTF-8: decoding per MIME and Unicode yields only raw UTF-16 bytes!). When properly MIME-encoded, UTF-16 surrogate pairs that encode Unicode emoji characters will be correctly decoded to their Unicode code points here, and be accurately detected as outside Tk's BMP range. ================================================================================ """ from tkinter import TkVersion def fixTkBMP(text): """ Change characters outside TK's BMP range to the Unicode replacement char. Used to sanitize all text for display in the GUI, else tkinter fails. """ if TkVersion <= 8.6: text = ''.join((ch if ord(ch) <= 0xFFFF else '\uFFFD') for ch in text) return text def isNonBMP(text): """ Return true if any character (codepoint) in text is outside Tk's BMP range. Used by Open dialogs to force initialfile=None when True for prior choice. """ if TkVersion <= 8.6: return any(ord(ch) > 0xFFFF for ch in text) else: return False # and assume Tk 8.7 will make this better...