| Store | Cart

Re: [perl #122853] Guarantee 0-9, A-Z, a-z character classes

From: Aristotle Pagaltzis <paga...@gmx.de>
Thu, 30 Oct 2014 09:13:21 +0100
* Karl Williamson <pub...@khwilliamson.com> [2014-10-30 05:45]:
> You may very well be right about my cultural bias about what's in A-Z.> I've tried to imagine what I would think if my first language had had> other characters, but I can't really.>> But your idealized solution effectively says to people on EBCDIC that> they have to use a foreign character set, and that is just as> chauvinistic as my A-Z bias.

This is conflating 2 arguments.

It’s cultural bias to give special rules to ranges in the Latin alphabet
but nothing else. You could simply remove the special case if you wanted
to be egalitarian.

Of course that would make the meaning of Perl programs more ambiguous
than it is already. The reason the special case was added is so that
Perl programs don’t mean one thing on ASCII/Unicode machines and another
completely different one on EBCDIC machines. But they do mean different
things – the special case just papers over the most glaring symptom. But
to make Perl programs mean one thing, universally, you inherently have
to pick one charset over every other as their character model. Unicode
is only the obvious choice. (Heck, z/OS has capitulated (re wrapper lib
for porting Unicode-based programs); pretty much anything that comes in
contact with the internet will have to capitulate eventually.)

But those two parts of the argument are separate points.

> There are people who code solely on and for EBCDIC, and Perl should> accommodate their native way of thinking. So \x04 has to mean the> character whose code point is natively 4 on whatever platform the code> is being run on.

I’d say “all’s fair if you predeclare”, as the Perl 6 do, except, well
encoding.pm tried to offer that and it ended in tears. There would have
to be a reason that it would turn out differently in this case.

> So, by specifying a range in Unicode terminology, one could get the> portability Yves wants.

Sounds good.

Absent the existing special case, this would not suffice; I cannot
imagine a lot of people would spell A-Z as \N{U+0041}-N{U+005A} – not to
mention that if clarity is your aim, this is not the way to achieve it.
And the clear way, \N{LATIN CAPITAL LETTER A}-\N{LATIN CAPITAL LETTER Z},
err, well…

But since people can write the most common ranges portably anyway (even
if only due to a culturally biased rule), this would only be needed for
the harder-to-understand cases, where it would at worst be no worse than
the existing situation.

So, given where we are, it makes sense.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Recent Messages in this Thread
Aristotle Pagaltzis Oct 30, 2014 08:13 am
Father Chrysostomos via RT Oct 30, 2014 05:03 am
Aristotle Pagaltzis Oct 30, 2014 08:24 am
Ed Avis Oct 30, 2014 07:19 am
Abigail Nov 01, 2014 12:06 am
Karl Williamson Oct 30, 2014 04:43 am
Messages in this thread