Re: [perl #57040] pos() function doesn't handle unicode well
by Moritz Lenz other posts by this author
Jul 17 2008 2:20PM messages near this date
Re: [perl #57040] pos() function doesn't handle unicode well
|
is $1 really readonly?
Marcela Maslanova wrote:
> # New Ticket Created by Marcela Maslanova
> # Please include the string: [perl #57040]
> # in the subject line of all future correspondence about this issue.
> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=57040 >
>
>
> generated with the help of perlbug 1.36 running under perl 5.10.0.
>
>
> -----------------------------------------------------------------
> [Please enter your report here]
>
> Function pos() doesn't return correct values for unicode strings.
> For example:
> perl -e '$string = "Ä?Å¡ÄÅ?žýáÃéÅ?";while ($string =~ /Å¡/gi) {printf "Found
> Å¡ at %d\n", pos($string)-1;}';
I don't see the bug here. pos() returns byte values if you use the
string with byte semenatics (for example not upgraded UTF-8), and
codepoint values in cases of text semantics (here in the case of 'use
utf8';). In both cases substr() will work with the same semantics, so
it'll do the right thing.
I don't see how that principle is violated in your example above.
So pos() and lenth() agree that "Ä?Å¡" is four bytes long.
$ perl -wle 'print length "Ä?Å¡"'
4
Or am I missing a subtle off-by-one error?
> In this case it could be solved 'use utf8'. But the problem is still in
> other functions, which are
> using pos(). For example expand from Text::Tabs:
> perl -e'chop($ustr="\taa\t..\t\x{100}");for my
> $s("\t\x{010a}\x{010a}\t..\t","\taa\t..\t",$ustr){
> $_=$s;s/\t/print(pos(),$");"\t"/ge; print "\n"}'
> Here should be all numbers the same.
As a non-golfed version:
for my $s ( "\t\x{010a}\x{010a}\t..\t", "\taa\t..\t" ) {
$_ = $s;
s/\t/print(pos(),$");"\t"/ge;
print "\n"
}
Output:
0 2 4
0 3 6
This looks a bit weird indeed. At least to me ;-)
> [Please do not change anything below this line]
> -----------------------------------------------------------------
> ---
> Flags:
> category=core
> severity=medium
> ---
> This perlbug was built using Perl 5.10.0 in the Fedora build system.
> It is being executed now by Perl 5.10.0 - Wed Jul 2 05:13:09 EDT 2008.
>
> Site configuration information for perl 5.10.0:
>
> Configured by Red Hat, Inc. at Wed Jul 2 05:13:09 EDT 2008.
>
> Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
> Platform:
> osname=linux, osvers=2.6.18-92.1.6.el5, archname=i386-linux-thread-multi
> uname='linux x86-6 2.6.18-92.1.6.el5 #1 smp fri jun 20 02:36:06 edt
> 2008 i686 i686 i386 gnulinux '
> config_args='-des -Doptimize=-O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
> --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic
> -fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV -Dversion=5.10.0
> -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red
> Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr
> -Dprivlib=/usr/lib/perl5/5.10.0
> -Dsitelib=/usr/local/lib/perl5/site_perl/5.10.0
> -Dvendorlib=/usr/lib/perl5/vendor_perl/5.10.0
> -Darchlib=/usr/lib/perl5/5.10.0/i386-linux-thread-multi
> -Dsitearch=/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
> -Dvendorarch=/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
> -Darchname=i386-linux-thread-multi
> -Dotherlibdirs=/usr/lib/perl5/site_perl/5.10.0 -Dvendorprefix=/usr
> -Dsiteprefix=/usr/local -Duseshrplib -Dusethreads -Duseithreads
> -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm
> -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n
> -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr
> -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto
> -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto
> -Ud_setservent_r_proto -Dscriptdir=/usr/bin'
> hint=recommended, useposix=true, d_sigaction=define
> useithreads=define, usemultiplicity=define
> useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
> use64bitint=undef, use64bitall=undef, uselongdouble=undef
> usemymalloc=n, bincompat5005=undef
> Compiler:
> cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
> -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
> optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386
> -mtune=generic -fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV',
> cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
> -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
> ccversion='', gccversion='4.3.0 20080428 (Red Hat 4.3.0-8)',
> gccosandvers=''
> intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
> d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
> ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
> lseeksize=8
> alignbytes=4, prototype=define
> Linker and Libraries:
> ld='gcc', ldflags =' -L/usr/local/lib'
> libpth=/usr/local/lib /lib /usr/lib
> libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
> perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
> libc=/lib/libc-2.8.so, so=so, useshrplib=true, libperl=libperl.so
> gnulibc_version='2.8'
> Dynamic Linking:
> dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
> -Wl,-rpath,/usr/lib/perl5/5.10.0/i386-linux-thread-multi/CORE'
> cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
> --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic
> -fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV -L/usr/local/lib'
>
> Locally applied patches:
>
>
> ---
> @INC for perl 5.10.0:
> /usr/lib/perl5/5.10.0/i386-linux-thread-multi
> /usr/lib/perl5/5.10.0
> /usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
> /usr/local/lib/perl5/site_perl/5.10.0
> /usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
> /usr/lib/perl5/vendor_perl/5.10.0
> /usr/lib/perl5/vendor_perl
> /usr/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.10.0
> .
>
> ---
> Environment for perl 5.10.0:
> HOME=/home/marca
> LANG=en_US.UTF-8
> LANGUAGE=
> LD_LIBRARY_PATH (unset)
> LOGDIR (unset)
>
> PATH=/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/home/marca/bin
> PERL_BADLANG (unset)
> SHELL=/bin/bash
>
Thread:
Marcela Maslanova
Eric Brine
Moritz Lenz
|