ASPN ActiveState Programmer Network
  ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups | Web Services
SEARCH
advanced | search help

Reference
ActivePerl 5.8
Core Documentation
perl
perlintro
perltoc
perlreftut
perldsc
perllol
perlrequick
perlretut
perlboot
perltoot
perltooc
perlbot
perlstyle
perlcheat
perltrap
perldebtut
perlfaq1
perlfaq2
perlfaq3
perlfaq4
perlfaq5
perlfaq6
perlfaq7
perlfaq8
perlfaq9
perlsyn
perldata
perlop
perlsub
perlfunc
perlopentut
perlpacktut
perlpod
perlpodspec
perlrun
perldiag
perllexwarn
perldebug
perlvar
perlre
perlreref
perlref
perlform
perlobj
perltie
perldbmfilter
perlipc
perlfork
perlnumber
perlthrtut
perlothrtut
perlport
perllocale
perluniintro
perlunicode
perlebcdic
perlsec
perlmod
perlmodlib
perlmodstyle
perlmodinstall
perlnewmod
perlutil
perlcompile
perlfilter
perlembed
perldebguts
perlxstut
perlxs
perlclib
perlguts
perlcall
perlapi
perlintern
perliol
perlapio
perlhack
perlbook
perltodo
perlhist
perl588delta
perl587delta
perl586delta
perl585delta
perl584delta
perl583delta
perl582delta
perl581delta
perl58delta
perl573delta
perl572delta
perl571delta
perl570delta
perl561delta
perl56delta
perl5005delta
perl5004delta
perlcn
perljp
perlko
perltw
perlaix
perlamiga
perlapollo
perlbeos
perlbs2000
perlce
perlcygwin
perldgux
perldos
perlepoc
perlfreebsd
perlhpux
perlhurd
perlirix
perlmachten
perlmacos
perlmacosx
perlmint
perlmpeix
perlnetware
perlopenbsd
perlos2
perlos390
perlos400
perlplan9
perlqnx
perlsolaris
perltru64
perluts
perlvmesa
perlvms
perlvos
perlwin32

MyASPN >> Reference >> ActivePerl 5.8 >> Core Documentation
ActivePerl 5.8 documentation

perl58delta - what is new for perl v5.8.0


NAME

perl58delta - what is new for perl v5.8.0


DESCRIPTION

This document describes differences between the 5.6.0 release and the 5.8.0 release.

Many of the bug fixes in 5.8.0 were already seen in the 5.6.1 maintenance release since the two releases were kept closely coordinated (while 5.8.0 was still called 5.7.something).

Changes that were integrated into the 5.6.1 release are marked [561]. Many of these changes have been further developed since 5.6.1 was released, those are marked [561+].

You can see the list of changes in the 5.6.1 release (both from the 5.005_03 release and the 5.6.0 release) by reading the perl561delta manpage.


Highlights In 5.8.0

  • Better Unicode support

  • New IO Implementation

  • New Thread Implementation

  • Better Numeric Accuracy

  • Safe Signals

  • Many New Modules

  • More Extensive Regression Testing


Incompatible Changes

Binary Incompatibility

Perl 5.8 is not binary compatible with earlier releases of Perl.

You have to recompile your XS modules.

(Pure Perl modules should continue to work.)

The major reason for the discontinuity is the new IO architecture called PerlIO. PerlIO is the default configuration because without it many new features of Perl 5.8 cannot be used. In other words: you just have to recompile your modules containing XS code, sorry about that.

In future releases of Perl, non-PerlIO aware XS modules may become completely unsupported. This shouldn't be too difficult for module authors, however: PerlIO has been designed as a drop-in replacement (at the source code level) for the stdio interface.

Depending on your platform, there are also other reasons why we decided to break binary compatibility, please read on.

64-bit platforms and malloc

If your pointers are 64 bits wide, the Perl malloc is no longer being used because it does not work well with 8-byte pointers. Also, usually the system mallocs on such platforms are much better optimized for such large memory models than the Perl malloc. Some memory-hungry Perl applications like the PDL don't work well with Perl's malloc. Finally, other applications than Perl (such as mod_perl) tend to prefer the system malloc. Such platforms include Alpha and 64-bit HPPA, MIPS, PPC, and Sparc.

AIX Dynaloading

The AIX dynaloading now uses in AIX releases 4.3 and newer the native dlopen interface of AIX instead of the old emulated interface. This change will probably break backward compatibility with compiled modules. The change was made to make Perl more compliant with other applications like mod_perl which are using the AIX native interface.

Attributes for my variables now handled at run-time

The my EXPR : ATTRS syntax now applies variable attributes at run-time. (Subroutine and our variables still get attributes applied at compile-time.) See the attributes manpage for additional details. In particular, however, this allows variable attributes to be useful for tie interfaces, which was a deficiency of earlier releases. Note that the new semantics doesn't work with the Attribute::Handlers module (as of version 0.76).

Socket Extension Dynamic in VMS

The Socket extension is now dynamically loaded instead of being statically built in. This may or may not be a problem with ancient TCP/IP stacks of VMS: we do not know since we weren't able to test Perl in such configurations.

IEEE-format Floating Point Default on OpenVMS Alpha

Perl now uses IEEE format (T_FLOAT) as the default internal floating point format on OpenVMS Alpha, potentially breaking binary compatibility with external libraries or existing data. G_FLOAT is still available as a configuration option. The default on VAX (D_FLOAT) has not changed.

New Unicode Semantics (no more use utf8, almost)

Previously in Perl 5.6 to use Unicode one would say "use utf8" and then the operations (like string concatenation) were Unicode-aware in that lexical scope.

This was found to be an inconvenient interface, and in Perl 5.8 the Unicode model has completely changed: now the "Unicodeness" is bound to the data itself, and for most of the time "use utf8" is not needed at all. The only remaining use of "use utf8" is when the Perl script itself has been written in the UTF-8 encoding of Unicode. (UTF-8 has not been made the default since there are many Perl scripts out there that are using various national eight-bit character sets, which would be illegal in UTF-8.)

See the perluniintro manpage for the explanation of the current model, and the utf8 manpage for the current use of the utf8 pragma.

New Unicode Properties

Unicode scripts are now supported. Scripts are similar to (and superior to) Unicode blocks. The difference between scripts and blocks is that scripts are the glyphs used by a language or a group of languages, while the blocks are more artificial groupings of (mostly) 256 characters based on the Unicode numbering.

In general, scripts are more inclusive, but not universally so. For example, while the script Latin includes all the Latin characters and their various diacritic-adorned versions, it does not include the various punctuation or digits (since they are not solely Latin).

A number of other properties are now supported, including \p{L&}, \p{Any} \p{Assigned}, \p{Unassigned}, \p{Blank} [561] and \p{SpacePerl} [561] (along with their \P{...} versions, of course). See the perlunicode manpage for details, and more additions.

The In or Is prefix to names used with the \p{...} and \P{...} are now almost always optional. The only exception is that a In prefix is required to signify a Unicode block when a block name conflicts with a script name. For example, \p{Tibetan} refers to the script, while \p{InTibetan} refers to the block. When there is no name conflict, you can omit the In from the block name (e.g. \p{BraillePatterns}), but to be safe, it's probably best to always use the In).

REF(...) Instead Of SCALAR(...)

A reference to a reference now stringifies as "REF(0x81485ec)" instead of "SCALAR(0x81485ec)" in order to be more consistent with the return value of ref().

pack/unpack D/F recycled

The undocumented pack/unpack template letters D/F have been recycled for better use: now they stand for long double (if supported by the platform) and NV (Perl internal floating point type). (They used to be aliases for d/f, but you never knew that.)

glob() now returns filenames in alphabetical order

The list of filenames from glob() (or <...>) is now by default sorted alphabetically to be csh-compliant (which is what happened before in most UNIX platforms). (bsd_glob() does still sort platform natively, ASCII or EBCDIC, unless GLOB_ALPHASORT is specified.) [561]

Deprecations

  • The semantics of bless(REF, REF) were unclear and until someone proves it to make some sense, it is forbidden.

  • The obsolete chat2 library that should never have been allowed to escape the laboratory has been decommissioned.

  • Using chdir("") or chdir(undef) instead of explicit chdir() is doubtful. A failure (think chdir(some_function()) can lead into unintended chdir() to the home directory, therefore this behaviour is deprecated.

  • The builtin dump() function has probably outlived most of its usefulness. The core-dumping functionality will remain in future available as an explicit call to CORE::dump(), but in future releases the behaviour of an unqualified dump() call may change.

  • The very dusty examples in the eg/ directory have been removed. Suggestions for new shiny examples welcome but the main issue is that the examples need to be documented, tested and (most importantly) maintained.

  • The (bogus) escape sequences \8 and \9 now give an optional warning ("Unrecognized escape passed through"). There is no need to \-escape any \w character.

  • The *glob{FILEHANDLE} is deprecated, use *glob{IO} instead.

  • The package; syntax (package without an argument) has been deprecated. Its semantics were never that clear and its implementation even less so. If you have used that feature to disallow all but fully qualified variables, use strict; instead.

  • The unimplemented POSIX regex features [[.cc.]] and [[=c=]] are still recognised but now cause fatal errors. The previous behaviour of ignoring them by default and warning if requested was unacceptable since it, in a way, falsely promised that the features could be used.

  • In future releases, non-PerlIO aware XS modules may become completely unsupported. Since PerlIO is a drop-in replacement for stdio at the source code level, this shouldn't be that drastic a change.

  • Previous versions of perl and some readings of some sections of Camel III implied that the :raw "discipline" was the inverse of :crlf. Turning off "clrfness" is no longer enough to make a stream truly binary. So the PerlIO :raw layer (or "discipline", to use the Camel book's older terminology) is now formally defined as being equivalent to binmode(FH) - which is in turn defined as doing whatever is necessary to pass each byte as-is without any translation. In particular binmode(FH) - and hence :raw - will now turn off both CRLF and UTF-8 translation and remove other layers (e.g. :encoding()) which would modify byte stream.

  • The current user-visible implementation of pseudo-hashes (the weird use of the first array element) is deprecated starting from Perl 5.8.0 and will be removed in Perl 5.10.0, and the feature will be implemented differently. Not only is the current interface rather ugly, but the current implementation slows down normal array and hash use quite noticeably. The fields pragma interface will remain available. The restricted hashes interface is expected to be the replacement interface (see the Hash::Util manpage). If your existing programs depends on the underlying implementation, consider using the Class::PseudoHash manpage from CPAN.

  • The syntaxes @a->[...] and %h->{...} have now been deprecated.

  • After years of trying, suidperl is considered to be too complex to ever be considered truly secure. The suidperl functionality is likely to be removed in a future release.

  • The 5.005 threads model (module Thread) is deprecated and expected to be removed in Perl 5.10. Multithreaded code should be migrated to the new ithreads model (see the threads manpage, the threads::shared manpage and the perlthrtut manpage).

  • The long deprecated uppercase aliases for the string comparison operators (EQ, NE, LT, LE, GE, GT) have now been removed.

  • The tr///C and tr///U features have been removed and will not return; the interface was a mistake. Sorry about that. For similar functionality, see pack('U0', ...) and pack('C0', ...). [561]

  • Earlier Perls treated "sub foo (@bar)" as equivalent to "sub foo (@)". The prototypes are now checked better at compile-time for invalid syntax. An optional warning is generated ("Illegal character in prototype...") but this may be upgraded to a fatal error in a future release.

  • The exec LIST and system LIST operations now produce warnings on tainted data and in some future release they will produce fatal errors.

  • The existing behaviour when localising tied arrays and hashes is wrong, and will be changed in a future release, so do not rely on the existing behaviour. See Localising Tied Arrays and Hashes Is Broken.


Core Enhancements

Unicode Overhaul

Unicode in general should be now much more usable than in Perl 5.6.0 (or even in 5.6.1). Unicode can be used in hash keys, Unicode in regular expressions should work now, Unicode in tr/// should work now, Unicode in I/O should work now. See the perluniintro manpage for introduction and the perlunicode manpage for details.

  • The Unicode Character Database coming with Perl has been upgraded to Unicode 3.2.0. For more information, see http://www.unicode.org/ . [561+] (5.6.1 has UCD 3.0.1.)

  • For developers interested in enhancing Perl's Unicode capabilities: almost all the UCD files are included with the Perl distribution in the lib/unicore subdirectory. The most notable omission, for space considerations, is the Unihan database.

  • The properties \p{Blank} and \p{SpacePerl} have been added. "Blank" is like C isblank(), that is, it contains only "horizontal whitespace" (the space character is, the newline isn't), and the "SpacePerl" is the Unicode equivalent of \s (\p{Space} isn't, since that includes the vertical tabulator character, whereas \s doesn't.)

    See "New Unicode Properties" earlier in this document for additional information on changes with Unicode properties.

PerlIO is Now The Default

  • IO is now by default done via PerlIO rather than system's "stdio". PerlIO allows "layers" to be "pushed" onto a file handle to alter the handle's behaviour. Layers can be specified at open time via 3-arg form of open:

       open($fh,'>:crlf :utf8', $path) || ...

    or on already opened handles via extended binmode:

       binmode($fh,':encoding(iso-8859-7)');
    

    The built-in layers are: unix (low level read/write), stdio (as in previous Perls), perlio (re-implementation of stdio buffering in a portable manner), crlf (does CRLF <=> "\n" translation as on Win32, but available on any platform). A mmap layer may be available if platform supports it (mostly UNIXes).

    Layers to be applied by default may be specified via the 'open' pragma.

    See Installation and Configuration Improvements for the effects of PerlIO on your architecture name.

  • If your platform supports fork(), you can use the list form of open for pipes. For example:

        open KID_PS, "-|", "ps", "aux" or die $!;
    

    forks the ps(1) command (without spawning a shell, as there are more than three arguments to open()), and reads its standard output via the KID_PS filehandle. See the perlipc manpage.

  • File handles can be marked as accepting Perl's internal encoding of Unicode (UTF-8 or UTF-EBCDIC depending on platform) by a pseudo layer ":utf8" :

       open($fh,">:utf8","Uni.txt");
    

    Note for EBCDIC users: the pseudo layer ":utf8" is erroneously named for you since it's not UTF-8 what you will be getting but instead UTF-EBCDIC. See the perlunicode manpage, the utf8 manpage, and http://www.unicode.org/unicode/reports/tr16/ for more information. In future releases this naming may change. See the perluniintro manpage for more information about UTF-8.

  • If your environment variables (LC_ALL, LC_CTYPE, LANG) look like you want to use UTF-8 (any of the variables match /utf-?8/i), your STDIN, STDOUT, STDERR handles and the default open layer (see the open manpage) are marked as UTF-8. (This feature, like other new features that combine Unicode and I/O, work only if you are using PerlIO, but that's the default.)

    Note that after this Perl really does assume that everything is UTF-8: for example if some input handle is not, Perl will probably very soon complain about the input data like this "Malformed UTF-8 ..." since any old eight-bit data is not legal UTF-8.

    Note for code authors: if you want to enable your users to use UTF-8 as their default encoding but in your code still have eight-bit I/O streams (such as images or zip files), you need to explicitly open() or binmode() with :bytes (see open in the perlfunc manpage and binmode in the perlfunc manpage), or you can just use binmode(FH) (nice for pre-5.8.0 backward compatibility).

  • File handles can translate character encodings from/to Perl's internal Unicode form on read/write via the ":encoding()" layer.

  • File handles can be opened to "in memory" files held in Perl scalars via:

       open($fh,'>', \$variable) || ...
  • Anonymous temporary files are available without need to 'use FileHandle' or other module via

       open($fh,"+>", undef) || ...

    That is a literal undef, not an undefined value.

ithreads

The new interpreter threads ("ithreads" for short) implementation of multithreading, by Arthur Bergman, replaces the old "5.005 threads" implementation. In the ithreads model any data sharing between threads must be explicit, as opposed to the model where data sharing was implicit. See the threads manpage and the threads::shared manpage, and the perlthrtut manpage.

As a part of the ithreads implementation Perl will also use any necessary and detectable reentrant libc interfaces.

Restricted Hashes

A restricted hash is restricted to a certain set of keys, no keys outside the set can be added. Also individual keys can be restricted so that the key cannot be deleted and the value cannot be changed. No new syntax is involved: the Hash::Util module is the interface.

Safe Signals

Perl used to be fragile in that signals arriving at inopportune moments could corrupt Perl's internal state. Now Perl postpones handling of signals until it's safe (between opcodes).

This change may have surprising side effects because signals no longer interrupt Perl instantly. Perl will now first finish whatever it was doing, like finishing an internal operation (like sort()) or an external operation (like an I/O operation), and only then look at any arrived signals (and before starting the next operation). No more corrupt internal state since the current operation is always finished first, but the signal may take more time to get heard. Note that breaking out from potentially blocking operations should still work, though.

Understanding of Numbers

In general a lot of fixing has happened in the area of Perl's understanding of numbers, both integer and floating point. Since in many systems the standard number parsing functions like strtoul() and atof() seem to have bugs, Perl tries to work around their deficiencies. This results hopefully in more accurate numbers.

Perl now tries internally to use integer values in numeric conversions and basic arithmetics (+ - * /) if the arguments are integers, and tries also to keep the results stored internally as integers. This change leads to often slightly faster and always less lossy arithmetics. (Previously Perl always preferred floating point numbers in its math.)

Arrays now always interpolate into double-quoted strings [561]

In double-quoted strings, arrays now interpolate, no matter what. The behavior in earlier versions of perl 5 was that arrays would interpolate into strings if the array had been mentioned before the string was compiled, and otherwise Perl would raise a fatal compile-time error. In versions 5.000 through 5.003, the error was

        Literal @example now requires backslash

In versions 5.004_01 through 5.6.0, the error was

        In string, @example now must be written as \@example

The idea here was to get people into the habit of writing "fred\@example.com" when they wanted a literal @ sign, just as they have always written "Give me back my \$5" when they wanted a literal $ sign.

Starting with 5.6.1, when Perl now sees an @ sign in a double-quoted string, it always attempts to interpolate an array, regardless of whether or not the array has been used or declared already. The fatal error has been downgraded to an optional warning:

        Possible unintended interpolation of @example in string

This warns you that "fred@example.com" is going to turn into fred.com if you don't backslash the @. See http://www.plover.com/~mjd/perl/at-error.html for more details about the history here.

Miscellaneous Changes

  • AUTOLOAD is now lvaluable, meaning that you can add the :lvalue attribute to AUTOLOAD subroutines and you can assign to the AUTOLOAD return value.

  • The $Config{byteorder} (and corresponding BYTEORDER in config.h) was previously wrong in platforms if sizeof(long) was 4, but sizeof(IV) was 8. The byteorder was only sizeof(long) bytes long (1234 or 4321), but now it is correctly sizeof(IV) bytes long, (12345678 or 87654321). (This problem didn't affect Windows platforms.)

    Also, $Config{byteorder} is now computed dynamically--this is more robust with "fat binaries" where an executable image contains binaries for more than one binary platform, and when cross-compiling.

  • perl -d:Module=arg,arg,arg now works (previously one couldn't pass in multiple arguments.)

  • do followed by a bareword now ensures that this bareword isn't a keyword (to avoid a bug where do q(foo.pl) tried to call a subroutine called q). This means that for example instead of do format() you must write do &format().

  • The builtin dump() now gives an optional warning dump() better written as CORE::dump(), meaning that by default dump(...) is resolved as the builtin dump() which dumps core and aborts, not as (possibly) user-defined sub dump. To call the latter, qualify the call as &dump(...). (The whole dump() feature is to considered deprecated, and possibly removed/changed in future releases.)

  • chomp() and chop() are now overridable. Note, however, that their prototype (as given by prototype("CORE::chomp") is undefined, because it cannot be expressed and therefore one cannot really write replacements to override these builtins.

  • END blocks are now run even if you exit/die in a BEGIN block. Internally, the execution of END blocks is now controlled by PL_exit_flags & PERL_EXIT_DESTRUCT_END. This enables the new behaviour for Perl embedders. This will default in 5.10. See the perlembed manpage.

  • Formats now support zero-padded decimal fields.

  • Although "you shouldn't do that", it was possible to write code that depends on Perl's hashed key order (Data::Dumper does this). The new algorithm "One-at-a-Time" produces a different hashed key order. More details are in Performance Enhancements.

  • lstat(FILEHANDLE) now gives a warning because the operation makes no sense. In future releases this may become a fatal error.

  • Spurious syntax errors generated in certain situations, when glob() caused File::Glob to be loaded for the first time, have been fixed. [561]

  • Lvalue subroutines can now return undef in list context. However, the lvalue subroutine feature still remains experimental. [561+]

  • A lost warning "Can't declare ... dereference in my" has been restored (Perl had it earlier but it became lost in later releases.)

  • A new special regular expression variable has been introduced: $^N, which contains the most-recently closed group (submatch).

  • no Module; does not produce an error even if Module does not have an unimport() method. This parallels the behavior of use vis-a-vis import. [561]

  • The numerical comparison operators return undef if either operand is a NaN. Previously the behaviour was unspecified.

  • our can now have an experimental optional attribute unique that affects how global variables are shared among multiple interpreters, see our in the perlfunc manpage.

  • The following builtin functions are now overridable: each(), keys(), pop(), push(), shift(), splice(), unshift(). [561]

  • pack() / unpack() can now group template letters with () and then apply repetition/count modifiers on the groups.

  • pack() / unpack() can now process the Perl internal numeric types: IVs, UVs, NVs-- and also long doubles, if supported by the platform. The template letters are j, J, F, and D.

  • pack('U0a*', ...) can now be used to force a string to UTF-8.

  • my __PACKAGE__ $obj now works. [561]

  • POSIX::sleep() now returns the number of unslept seconds (as the POSIX standard says), as opposed to CORE::sleep() which returns the number of slept seconds.

  • printf() and sprintf() now support parameter reordering using the %\d+\$ and *\d+\$ syntaxes. For example

        printf "%2\$s %1\$s\n", "foo", "bar";
    

    will print "bar foo\n". This feature helps in writing internationalised software, and in general when the order of the parameters can vary.

  • The (\&) prototype now works properly. [561]

  • prototype(\[$@%&]) is now available to implicitly create references (useful for example if you want to emulate the tie() interface).

  • A new command-line option, -t is available. It is the little brother of -T: instead of dying on taint violations, lexical warnings are given. This is only meant as a temporary debugging aid while securing the code of old legacy applications.