File: pyedit-products/unzipped/docetc/examples/Non-BMP-Emojis/README.txt

Demo the behavior of non-BMP Unicode symbols (a.k.a. emojis) in both
file content and name.  The Tk GUI library used by PyEdit doesn't 
support emojis through version 8.6, though it does support display,
edits, and saves of BMP Unicode symbols correctly.  

Tk 8.7 may add emoji support, but it does not yet exist, won't appear
for some time, and probably won't be supported by Python until later 
still (python.org's Mac Python is still stuck at Tk 8.5).

Various workarounds are also tried here to retain emojis in 
file content on saves, including Python's surrogateescape encoding
errors handler, and binary byte files, but all come up short:
surrogates can retain emojis in content, but all attempts result
in improper Tk display and edit of both emojis, and BMP symbols
that otherwise work well.

As a result, PyEdit opts to replace non-BMP emojis with the Unicode 
replacement character in all contexts.  This results in emoji loss
on file writes, and a popup warns users about this.  On the upside,
though, this allows users to view files containing emojis (they would 
fail with errors without the replacements), and Unicode symbols in 
the BMP's U+0000..U+FFFF range can still be used and saved correctly.

Full emoji support must await a new version of the Tk GUI library,
but see bmp-symbols-supported.* here for symbols that work today.

Update: also on the upside, PyEdit does better in these tests than
either Notepad or Wordpad on Windows.  See the viewed-x-* captures: 
Pyedit can't display an emoji but does display other Unicode symbols,
while Wordpad and Notepad botch both.  Also, Tk's file-open function
call fails on Windows for filenames having emojis, but works on Macs
(Mac's Finder also displays emojis; Windows 7's explorer cannot).



[Home page] Books Code Blog Python Author Train Find ©M.Lutz