| Store | Cart

Re: [Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...]

From: M.-A. Lemburg <m...@egenix.com>
Fri, 29 Aug 2014 14:18:34 +0200
On 29.08.2014 13:22, Isaac Morland wrote:
> On Fri, 29 Aug 2014, M.-A. Lemburg wrote:> >> On 29.08.2014 02:41, Stephen J. Turnbull wrote:>> Since Python allows working with lone surrogates in Unicode (they>> are valid code points) and we're using UTF-8 for marshal, we needed>> a way to make sure that Python 3 also optionally supports working>> with lone surrogates in such UTF-8 streams (nowadays called CESU-8:>> http://en.wikipedia.org/wiki/CESU-8).> > If I want that wouldn't I specify "cesu-8" as the encoding?> > i.e., instead of .decode ('utf-8') I would use .decode ('cesu-8').  Right now, trying this I get> that cesu-8 is an unknown encoding but that could be changed without affecting the behaviour of the> utf-8 codec.

Why write a new codec that's almost identical to the utf-8 codec,
if you can get the same functionality by explicitly using a
special error handler ?

From a maintenance POV that does not sound like a good approach.

> It seems to me that .decode ('utf-8') should decode exactly and only valid utf-8, including the> non-use of surrogate pairs as an intermediate encoding step.

It does in Python 3.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 29 2014)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2014-08-27: Released eGenix PyRun 2.0.1 ...       http://egenix.com/go62
2014-09-19: PyCon UK 2014, Coventry, UK ...                21 days to go
2014-09-27: PyDDF Sprint 2014 ...                          29 days to go

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
Pyth...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/python-dev-ml%40activestate.com

Recent Messages in this Thread
Nick Coghlan Aug 28, 2014 12:26 pm
Stephen J. Turnbull Aug 29, 2014 12:32 am
Nick Coghlan Aug 29, 2014 04:55 am
Stephen J. Turnbull Aug 29, 2014 12:41 am
M.-A. Lemburg Aug 29, 2014 07:48 am
Isaac Morland Aug 29, 2014 11:22 am
M.-A. Lemburg Aug 29, 2014 12:18 pm
Greg Ewing Aug 29, 2014 11:37 pm
Stephen J. Turnbull Aug 30, 2014 04:21 am
M.-A. Lemburg Aug 30, 2014 10:03 am
Messages in this thread