[perl #57040] pos() function doesn't handle unicode well
by Marcela Maslanova other posts by this author
Jul 17 2008 6:49AM messages near this date
Re: [perl #57042] regression with $^R in regex in perl 5.10 from 5.8.8
|
Re: [perl #57040] pos() function doesn't handle unicode well
# New Ticket Created by Marcela Maslanova
# Please include the string: [perl #57040]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=57040 >
generated with the help of perlbug 1.36 running under perl 5.10.0.
-----------------------------------------------------------------
[Please enter your report here]
Function pos() doesn't return correct values for unicode strings.
For example:
perl -e '$string = "Ä?Å¡ÄÅ?žýáÃéÅ?";while ($string =~ /Å¡/gi) {printf "Found
Å¡ at %d\n", pos($string)-1;}';
In this case it could be solved 'use utf8'. But the problem is still in
other functions, which are
using pos(). For example expand from Text::Tabs:
perl -e'chop($ustr="\taa\t..\t\x{100}");for my
$s("\t\x{010a}\x{010a}\t..\t","\taa\t..\t",$ustr){
$_=$s;s/\t/print(pos(),$");"\t"/ge; print "\n"}'
Here should be all numbers the same.
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
This perlbug was built using Perl 5.10.0 in the Fedora build system.
It is being executed now by Perl 5.10.0 - Wed Jul 2 05:13:09 EDT 2008.
Site configuration information for perl 5.10.0:
Configured by Red Hat, Inc. at Wed Jul 2 05:13:09 EDT 2008.
Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
Platform:
osname=linux, osvers=2.6.18-92.1.6.el5, archname=i386-linux-thread-multi
uname='linux x86-6 2.6.18-92.1.6.el5 #1 smp fri jun 20 02:36:06 edt
2008 i686 i686 i386 gnulinux '
config_args='-des -Doptimize=-O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic
-fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV -Dversion=5.10.0
-Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red
Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr
-Dprivlib=/usr/lib/perl5/5.10.0
-Dsitelib=/usr/local/lib/perl5/site_perl/5.10.0
-Dvendorlib=/usr/lib/perl5/vendor_perl/5.10.0
-Darchlib=/usr/lib/perl5/5.10.0/i386-linux-thread-multi
-Dsitearch=/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
-Dvendorarch=/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
-Darchname=i386-linux-thread-multi
-Dotherlibdirs=/usr/lib/perl5/site_perl/5.10.0 -Dvendorprefix=/usr
-Dsiteprefix=/usr/local -Duseshrplib -Dusethreads -Duseithreads
-Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm
-Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n
-Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr
-Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto
-Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto
-Ud_setservent_r_proto -Dscriptdir=/usr/bin'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386
-mtune=generic -fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING
-fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
ccversion='', gccversion='4.3.0 20080428 (Red Hat 4.3.0-8)',
gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
libc=/lib/libc-2.8.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.8'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E
-Wl,-rpath,/usr/lib/perl5/5.10.0/i386-linux-thread-multi/CORE'
cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m32 -march=i386 -mtune=generic
-fasynchronous-unwind-tables -DPERL_USE_SAFE_PUTENV -L/usr/local/lib'
Locally applied patches:
---
@INC for perl 5.10.0:
/usr/lib/perl5/5.10.0/i386-linux-thread-multi
/usr/lib/perl5/5.10.0
/usr/local/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
/usr/local/lib/perl5/site_perl/5.10.0
/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi
/usr/lib/perl5/vendor_perl/5.10.0
/usr/lib/perl5/vendor_perl
/usr/lib/perl5/site_perl/5.10.0/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.10.0
.
---
Environment for perl 5.10.0:
HOME=/home/marca
LANG=en_US.UTF-8
LANGUAGE=
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/home/marca/bin
PERL_BADLANG (unset)
SHELL=/bin/bash
Thread:
Marcela Maslanova
Eric Brine
Moritz Lenz
|