Nick Coghlan wrote:
> Antoine Pitrou wrote:>> M.-A. Lemburg <mal <at> egenix.com> writes:>>> Please file a bug report for this. f.readlines() (or rather>>> the io layer) should be using Py_UNICODE_ISLINEBREAK(ch)>>> for detecting line break characters.>>>> Actually, no. It has been designed from the start to only recognize the>> "standard" line break representations found in common formats/protocols (CR, LF>> and CR+LF).>> People wanting to split on arbitrary unicode line breaks should use>> str.splitlines().> > The fairly long-standing RFE relating to an arbitrarily selectable> newline separator seems relevant here:> http://bugs.python.org/issue1152248> > As with the discussion there, the problem with using str.splitlines is> that it prevents pipelining approaches that avoid reading a whole file> into memory.> > While removing the validity check from readlines() completely is> questionable (the readrecords() approach mentioned in the tracker issue> would still be better there), loosening the validity check to be based> on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it> a feature requests rather than a bug though).
I've had a look at the io implementation: this appears to be
based on the universal newline support idea which addresses
only a fixed set of "new line" character combinations and is
not as straight forward to extend to support all Unicode
line break characters as I thought.
What I don't understand is why the io layer tries to reinvent
the wheel here instead of just using the codec's .readline()
method - which *does* use .splitlines() and has full support
for all Unicode line break characters (including the CRLF
combination).
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Aug 06 2009)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
Recent Messages in this Thread |
|
Dirkjan Ochtman |
Aug 03, 2009 10:41 am |
|
Nick Coghlan |
Aug 04, 2009 09:20 am |
|
Mark Hammond |
Aug 04, 2009 11:43 pm |
|
Neil Hodgson |
Aug 05, 2009 12:44 am |
|
"Martin v. Löwis" |
Aug 05, 2009 07:35 am |
|
Mark Hammond |
Aug 05, 2009 07:44 am |
|
"Martin v. Löwis" |
Aug 05, 2009 08:09 am |
|
Paul Moore |
Aug 05, 2009 10:04 am |
|
Dirkjan Ochtman |
Aug 05, 2009 10:14 am |
|
Mark Hammond |
Aug 05, 2009 11:22 am |
|
John Arbash Meinel |
Aug 05, 2009 02:58 pm |
|
"Martin v. Löwis" |
Aug 05, 2009 06:22 pm |
|
Mark Hammond |
Aug 05, 2009 11:19 am |
|
Dirkjan Ochtman |
Aug 05, 2009 11:28 am |
|
Mark Hammond |
Aug 05, 2009 11:46 am |
|
Glenn Linderman |
Aug 05, 2009 05:43 pm |
|
Paul Moore |
Aug 05, 2009 04:24 pm |
|
Neil Hodgson |
Aug 05, 2009 08:25 am |
|
"Martin v. Löwis" |
Aug 05, 2009 08:41 am |
|
Neil Hodgson |
Aug 05, 2009 09:09 am |
|
Georg Brandl |
Aug 05, 2009 07:43 pm |
|
"Martin v. Löwis" |
Aug 05, 2009 08:13 pm |
|
Georg Brandl |
Aug 05, 2009 08:18 pm |
|
Ben Finney |
Aug 05, 2009 05:56 am |
|
Mark Hammond |
Aug 05, 2009 06:08 am |
|
Ben Finney |
Aug 05, 2009 06:50 am |
|
Mark Hammond |
Aug 05, 2009 07:31 am |
|
Ben Finney |
Aug 05, 2009 08:00 am |
|
Mark Hammond |
Aug 05, 2009 08:09 am |
|
Ben Finney |
Aug 05, 2009 09:42 am |
|
"Martin v. Löwis" |
Aug 05, 2009 08:12 am |
|
Stephen J. Turnbull |
Aug 05, 2009 02:28 pm |
|
Georg Brandl |
Aug 05, 2009 07:56 pm |
|
Mark Hammond |
Aug 06, 2009 12:34 am |
|
Stephen J. Turnbull |
Aug 06, 2009 06:00 am |
|
"Martin v. Löwis" |
Aug 06, 2009 06:40 am |
|
Stephen J. Turnbull |
Aug 06, 2009 07:12 am |
|
"Martin v. Löwis" |
Aug 05, 2009 07:45 am |
|
Dj Gilcrease |
Aug 05, 2009 06:02 am |
|
Dirkjan Ochtman |
Aug 05, 2009 08:25 am |
|
"Martin v. Löwis" |
Aug 05, 2009 08:51 am |
|
Dirkjan Ochtman |
Aug 05, 2009 09:04 am |
|
"Martin v. Löwis" |
Aug 05, 2009 09:12 am |
|
Mark Hammond |
Aug 05, 2009 09:02 am |
|
Dirkjan Ochtman |
Aug 05, 2009 09:09 am |
|
"Martin v. Löwis" |
Aug 05, 2009 09:16 am |
|
Mark Hammond |
Aug 05, 2009 09:17 am |
|
Nick Coghlan |
Aug 05, 2009 12:50 pm |
|
MRAB |
Aug 05, 2009 01:35 pm |
|
Dirkjan Ochtman |
Aug 05, 2009 01:37 pm |
|
Nick Coghlan |
Aug 05, 2009 02:12 pm |
|
Oleg Broytmann |
Aug 05, 2009 01:50 pm |
|
Oleg Broytmann |
Aug 05, 2009 01:57 pm |
|
Stephen J. Turnbull |
Aug 05, 2009 03:34 pm |
|
"Martin v. Löwis" |
Aug 05, 2009 06:37 pm |
|
Stephen J. Turnbull |
Aug 06, 2009 05:00 am |
|
"Martin v. Löwis" |
Aug 06, 2009 05:48 am |
|
Neil Hodgson |
Aug 06, 2009 10:10 pm |
|
M.-A. Lemburg |
Aug 07, 2009 08:31 am |
|
Antoine Pitrou |
Aug 07, 2009 12:12 pm |
|
M.-A. Lemburg |
Aug 07, 2009 12:48 pm |
|
Neil Hodgson |
Aug 05, 2009 10:22 pm |
|
M.-A. Lemburg |
Aug 06, 2009 08:31 am |
|
Antoine Pitrou |
Aug 06, 2009 08:51 am |
|
Nick Coghlan |
Aug 06, 2009 10:19 am |
[Python-Dev] PEP 385: the eol-type issue |
M.-A. Lemburg |
Aug 06, 2009 10:40 am |
|
M.-A. Lemburg |
Aug 06, 2009 10:46 am |
|
Antoine Pitrou |
Aug 06, 2009 11:01 am |
|
M.-A. Lemburg |
Aug 06, 2009 11:34 am |
|
Antoine Pitrou |
Aug 06, 2009 11:42 am |
|
Dirkjan Ochtman |
Aug 05, 2009 02:04 pm |