|
perl58delta - what is new for perl v5.8.0
This document describes differences between the 5.6.0 release and
the 5.8.0 release.
Many of the bug fixes in 5.8.0 were already seen in the 5.6.1
maintenance release since the two releases were kept closely
coordinated (while 5.8.0 was still called 5.7.something).
Changes that were integrated into the 5.6.1 release are marked [561].
Many of these changes have been further developed since 5.6.1 was released,
those are marked [561+].
You can see the list of changes in the 5.6.1 release (both from the
5.005_03 release and the 5.6.0 release) by reading the perl561delta manpage.
Perl 5.8 is not binary compatible with earlier releases of Perl.
You have to recompile your XS modules.
(Pure Perl modules should continue to work.)
The major reason for the discontinuity is the new IO architecture
called PerlIO. PerlIO is the default configuration because without
it many new features of Perl 5.8 cannot be used. In other words:
you just have to recompile your modules containing XS code, sorry
about that.
In future releases of Perl, non-PerlIO aware XS modules may become
completely unsupported. This shouldn't be too difficult for module
authors, however: PerlIO has been designed as a drop-in replacement
(at the source code level) for the stdio interface.
Depending on your platform, there are also other reasons why
we decided to break binary compatibility, please read on.
If your pointers are 64 bits wide, the Perl malloc is no longer being
used because it does not work well with 8-byte pointers. Also,
usually the system mallocs on such platforms are much better optimized
for such large memory models than the Perl malloc. Some memory-hungry
Perl applications like the PDL don't work well with Perl's malloc.
Finally, other applications than Perl (such as mod_perl) tend to prefer
the system malloc. Such platforms include Alpha and 64-bit HPPA,
MIPS, PPC, and Sparc.
The AIX dynaloading now uses in AIX releases 4.3 and newer the native
dlopen interface of AIX instead of the old emulated interface. This
change will probably break backward compatibility with compiled
modules. The change was made to make Perl more compliant with other
applications like mod_perl which are using the AIX native interface.
Attributes for my variables now handled at run-time
The my EXPR : ATTRS syntax now applies variable attributes at
run-time. (Subroutine and our variables still get attributes applied
at compile-time.) See the attributes manpage for additional details. In particular,
however, this allows variable attributes to be useful for tie interfaces,
which was a deficiency of earlier releases. Note that the new semantics
doesn't work with the Attribute::Handlers module (as of version 0.76).
The Socket extension is now dynamically loaded instead of being
statically built in. This may or may not be a problem with ancient
TCP/IP stacks of VMS: we do not know since we weren't able to test
Perl in such configurations.
Perl now uses IEEE format (T_FLOAT) as the default internal floating
point format on OpenVMS Alpha, potentially breaking binary compatibility
with external libraries or existing data. G_FLOAT is still available as
a configuration option. The default on VAX (D_FLOAT) has not changed.
Previously in Perl 5.6 to use Unicode one would say "use utf8" and
then the operations (like string concatenation) were Unicode-aware
in that lexical scope.
This was found to be an inconvenient interface, and in Perl 5.8 the
Unicode model has completely changed: now the "Unicodeness" is bound
to the data itself, and for most of the time "use utf8" is not needed
at all. The only remaining use of "use utf8" is when the Perl script
itself has been written in the UTF-8 encoding of Unicode. (UTF-8 has
not been made the default since there are many Perl scripts out there
that are using various national eight-bit character sets, which would
be illegal in UTF-8.)
See the perluniintro manpage for the explanation of the current model,
and the utf8 manpage for the current use of the utf8 pragma.
Unicode scripts are now supported. Scripts are similar to (and superior
to) Unicode blocks. The difference between scripts and blocks is that
scripts are the glyphs used by a language or a group of languages, while
the blocks are more artificial groupings of (mostly) 256 characters based
on the Unicode numbering.
In general, scripts are more inclusive, but not universally so. For
example, while the script Latin includes all the Latin characters and
their various diacritic-adorned versions, it does not include the various
punctuation or digits (since they are not solely Latin).
A number of other properties are now supported, including \p{L&},
\p{Any} \p{Assigned}, \p{Unassigned}, \p{Blank} [561] and
\p{SpacePerl} [561] (along with their \P{...} versions, of course).
See the perlunicode manpage for details, and more additions.
The In or Is prefix to names used with the \p{...} and \P{...}
are now almost always optional. The only exception is that a In prefix
is required to signify a Unicode block when a block name conflicts with a
script name. For example, \p{Tibetan} refers to the script, while
\p{InTibetan} refers to the block. When there is no name conflict, you
can omit the In from the block name (e.g. \p{BraillePatterns}), but
to be safe, it's probably best to always use the In).
A reference to a reference now stringifies as "REF(0x81485ec)" instead
of "SCALAR(0x81485ec)" in order to be more consistent with the return
value of ref().
The undocumented pack/unpack template letters D/F have been recycled
for better use: now they stand for long double (if supported by the
platform) and NV (Perl internal floating point type). (They used
to be aliases for d/f, but you never knew that.)
glob() now returns filenames in alphabetical order
The list of filenames from glob() (or <...>) is now by default sorted
alphabetically to be csh-compliant (which is what happened before
in most UNIX platforms). (bsd_glob() does still sort platform
natively, ASCII or EBCDIC, unless GLOB_ALPHASORT is specified.) [561]
-
The semantics of bless(REF, REF) were unclear and until someone proves
it to make some sense, it is forbidden.
-
The obsolete chat2 library that should never have been allowed
to escape the laboratory has been decommissioned.
-
Using chdir("") or chdir(undef) instead of explicit chdir() is
doubtful. A failure (think chdir(some_function()) can lead into
unintended chdir() to the home directory, therefore this behaviour
is deprecated.
-
The builtin dump() function has probably outlived most of its
usefulness. The core-dumping functionality will remain in future
available as an explicit call to CORE::dump(), but in future
releases the behaviour of an unqualified dump() call may change.
-
The very dusty examples in the eg/ directory have been removed.
Suggestions for new shiny examples welcome but the main issue is that
the examples need to be documented, tested and (most importantly)
maintained.
-
The (bogus) escape sequences \8 and \9 now give an optional warning
("Unrecognized escape passed through"). There is no need to \-escape
any \w character.
-
The *glob{FILEHANDLE} is deprecated, use *glob{IO} instead.
-
The package; syntax (package without an argument) has been
deprecated. Its semantics were never that clear and its
implementation even less so. If you have used that feature to
disallow all but fully qualified variables, use strict; instead.
-
The unimplemented POSIX regex features [[.cc.]] and [[=c=]] are still
recognised but now cause fatal errors. The previous behaviour of
ignoring them by default and warning if requested was unacceptable
since it, in a way, falsely promised that the features could be used.
-
In future releases, non-PerlIO aware XS modules may become completely
unsupported. Since PerlIO is a drop-in replacement for stdio at the
source code level, this shouldn't be that drastic a change.
-
Previous versions of perl and some readings of some sections of Camel
III implied that the :raw "discipline" was the inverse of :crlf.
Turning off "clrfness" is no longer enough to make a stream truly
binary. So the PerlIO :raw layer (or "discipline", to use the Camel
book's older terminology) is now formally defined as being equivalent
to binmode(FH) - which is in turn defined as doing whatever is
necessary to pass each byte as-is without any translation. In
particular binmode(FH) - and hence :raw - will now turn off both
CRLF and UTF-8 translation and remove other layers (e.g. :encoding())
which would modify byte stream.
-
The current user-visible implementation of pseudo-hashes (the weird
use of the first array element) is deprecated starting from Perl 5.8.0
and will be removed in Perl 5.10.0, and the feature will be
implemented differently. Not only is the current interface rather
ugly, but the current implementation slows down normal array and hash
use quite noticeably. The fields pragma interface will remain
available. The restricted hashes interface is expected to
be the replacement interface (see the Hash::Util manpage). If your existing
programs depends on the underlying implementation, consider using
the Class::PseudoHash manpage from CPAN.
-
The syntaxes @a->[...] and %h->{...} have now been deprecated.
-
After years of trying, suidperl is considered to be too complex to
ever be considered truly secure. The suidperl functionality is likely
to be removed in a future release.
-
The 5.005 threads model (module Thread) is deprecated and expected
to be removed in Perl 5.10. Multithreaded code should be migrated to
the new ithreads model (see the threads manpage, the threads::shared manpage and
the perlthrtut manpage).
-
The long deprecated uppercase aliases for the string comparison
operators (EQ, NE, LT, LE, GE, GT) have now been removed.
-
The tr///C and tr///U features have been removed and will not return;
the interface was a mistake. Sorry about that. For similar
functionality, see pack('U0', ...) and pack('C0', ...). [561]
-
Earlier Perls treated "sub foo (@bar)" as equivalent to "sub foo (@)".
The prototypes are now checked better at compile-time for invalid
syntax. An optional warning is generated ("Illegal character in
prototype...") but this may be upgraded to a fatal error in a future
release.
-
The exec LIST and system LIST operations now produce warnings on
tainted data and in some future release they will produce fatal errors.
-
The existing behaviour when localising tied arrays and hashes is wrong,
and will be changed in a future release, so do not rely on the existing
behaviour. See Localising Tied Arrays and Hashes Is Broken.
Unicode in general should be now much more usable than in Perl 5.6.0
(or even in 5.6.1). Unicode can be used in hash keys, Unicode in
regular expressions should work now, Unicode in tr/// should work now,
Unicode in I/O should work now. See the perluniintro manpage for introduction
and the perlunicode manpage for details.
-
The Unicode Character Database coming with Perl has been upgraded
to Unicode 3.2.0. For more information, see http://www.unicode.org/ .
[561+] (5.6.1 has UCD 3.0.1.)
-
For developers interested in enhancing Perl's Unicode capabilities:
almost all the UCD files are included with the Perl distribution in
the lib/unicore subdirectory. The most notable omission, for space
considerations, is the Unihan database.
-
The properties \p{Blank} and \p{SpacePerl} have been added. "Blank" is like
C isblank(), that is, it contains only "horizontal whitespace" (the space
character is, the newline isn't), and the "SpacePerl" is the Unicode
equivalent of \s (\p{Space} isn't, since that includes the vertical
tabulator character, whereas \s doesn't.)
See "New Unicode Properties" earlier in this document for additional
information on changes with Unicode properties.
-
IO is now by default done via PerlIO rather than system's "stdio".
PerlIO allows "layers" to be "pushed" onto a file handle to alter the
handle's behaviour. Layers can be specified at open time via 3-arg
form of open:
open($fh,'>:crlf :utf8', $path) || ...
or on already opened handles via extended binmode:
binmode($fh,':encoding(iso-8859-7)');
The built-in layers are: unix (low level read/write), stdio (as in
previous Perls), perlio (re-implementation of stdio buffering in a
portable manner), crlf (does CRLF <=> "\n" translation as on Win32,
but available on any platform). A mmap layer may be available if
platform supports it (mostly UNIXes).
Layers to be applied by default may be specified via the 'open' pragma.
See Installation and Configuration Improvements for the effects
of PerlIO on your architecture name.
-
If your platform supports fork(), you can use the list form of open
for pipes. For example:
open KID_PS, "-|", "ps", "aux" or die $!;
forks the ps(1) command (without spawning a shell, as there are more
than three arguments to open()), and reads its standard output via the
KID_PS filehandle. See the perlipc manpage.
-
File handles can be marked as accepting Perl's internal encoding of Unicode
(UTF-8 or UTF-EBCDIC depending on platform) by a pseudo layer ":utf8" :
open($fh,">:utf8","Uni.txt");
Note for EBCDIC users: the pseudo layer ":utf8" is erroneously named
for you since it's not UTF-8 what you will be getting but instead
UTF-EBCDIC. See the perlunicode manpage, the utf8 manpage, and
http://www.unicode.org/unicode/reports/tr16/ for more information.
In future releases this naming may change. See the perluniintro manpage
for more information about UTF-8.
-
If your environment variables (LC_ALL, LC_CTYPE, LANG) look like you
want to use UTF-8 (any of the variables match /utf-?8/i), your
STDIN, STDOUT, STDERR handles and the default open layer (see the open manpage)
are marked as UTF-8. (This feature, like other new features that
combine Unicode and I/O, work only if you are using PerlIO, but that's
the default.)
Note that after this Perl really does assume that everything is UTF-8:
for example if some input handle is not, Perl will probably very soon
complain about the input data like this "Malformed UTF-8 ..." since
any old eight-bit data is not legal UTF-8.
Note for code authors: if you want to enable your users to use UTF-8
as their default encoding but in your code still have eight-bit I/O streams
(such as images or zip files), you need to explicitly open() or binmode()
with :bytes (see open in the perlfunc manpage and binmode in the perlfunc manpage), or you
can just use |