ASPN ActiveState Programmer Network
  ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups | Web Services
SEARCH
advanced | search help

Reference
ActivePython 2.4
What's New
What's new in Python 2.3?
Contents
1 PEP 218: A Standard Set Datatype
2 PEP 255: Simple Generators
3 PEP 263: Source Code Encodings
4 PEP 273: Importing Modules from Zip Archives
5 PEP 277: Unicode file name support for Windows NT
6 PEP 278: Universal Newline Support
7 PEP 279: enumerate()
8 PEP 282: The logging Package
9 PEP 285: A Boolean Type
10 PEP 293: Codec Error Handling Callbacks
11 PEP 301: Package Index and Metadata for Distutils
12 PEP 302: New Import Hooks
13 PEP 305: Comma-separated Files
14 PEP 307: Pickle Enhancements
15 Extended Slices
16 Other Language Changes
17 New, Improved, and Deprecated Modules
18 Pymalloc: A Specialized Object Allocator
19 Build and C API Changes
20 Other Changes and Fixes
21 Porting to Python 2.3
22 Acknowledgements
About this document ...

MyASPN >> Reference >> ActivePython 2.4 >> What's New >> What's new in Python 2.3?
ActivePython 2.4 documentation

10 PEP 293: Codec Error Handling Callbacks

When encoding a Unicode string into a byte string, unencodable characters may be encountered. So far, Python has allowed specifying the error processing as either ``strict'' (raising UnicodeError), ``ignore'' (skipping the character), or ``replace'' (using a question mark in the output string), with ``strict'' being the default behavior. It may be desirable to specify alternative processing of such errors, such as inserting an XML character reference or HTML entity reference into the converted string.

Python now has a flexible framework to add different processing strategies. New error handlers can be added with codecs.register_error, and codecs then can access the error handler with codecs.lookup_error. An equivalent C API has been added for codecs written in C. The error handler gets the necessary state information such as the string being converted, the position in the string where the error was detected, and the target encoding. The handler can then either raise an exception or return a replacement string.

Two additional error handlers have been implemented using this framework: ``backslashreplace'' uses Python backslash quoting to represent unencodable characters and ``xmlcharrefreplace'' emits XML character references.

See Also:

PEP 293, Codec Error Handling Callbacks
Written and implemented by Walter Dörwald.

See About this document... for information on suggesting changes.

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState 2004 All rights reserved