|
perldelta - what is new for perl 5.10.0
This document describes the differences between the 5.8.8 release and
the 5.10.0 release.
Many of the bug fixes in 5.10.0 were already seen in the 5.8.X maintenance
releases; they are not duplicated here and are documented in the set of
man pages named perl58[1-8]?delta.
The feature pragma is used to enable new syntax that would break Perl's
backwards-compatibility with older releases of the language. It's a lexical
pragma, like strict or warnings.
Currently the following new features are available: switch (adds a
switch statement), say (adds a say built-in function), and state
(adds a state keyword for declaring "static" variables). Those
features are described in their own sections of this document.
The feature pragma is also implicitly loaded when you require a minimal
perl version (with the use VERSION construct) greater than, or equal
to, 5.9.5. See the feature manpage for details.
-E is equivalent to -e, but it implicitly enables all
optional features (like use feature ":5.10").
A new operator // (defined-or) has been implemented.
The following expression:
$a // $b
is merely equivalent to
defined $a ? $a : $b
and the statement
$c //= $d;
can now be used instead of
$c = $d unless defined $c;
The // operator has the same precedence and associativity as ||.
Special care has been taken to ensure that this operator Do What You Mean
while not breaking old code, but some edge cases involving the empty
regular expression may now parse differently. See the perlop manpage for
details.
Perl 5 now has a switch statement. It's available when use feature
'switch' is in effect. This feature introduces three new keywords,
given, when, and default:
given ($foo) {
when (/^abc/) { $abc = 1; }
when (/^def/) { $def = 1; }
when (/^xyz/) { $xyz = 1; }
default { $nothing = 1; }
}
A more complete description of how Perl matches the switch variable
against the when conditions is given in Switch statements in the perlsyn manpage.
This kind of match is called smart match, and it's also possible to use
it outside of switch statements, via the new ~~ operator. See
Smart matching in detail in the perlsyn manpage.
This feature was contributed by Robin Houston.
- Recursive Patterns
-
It is now possible to write recursive patterns without using the (??{})
construct. This new way is more efficient, and in many cases easier to
read.
-
Each capturing parenthesis can now be treated as an independent pattern
that can be entered by using the (?PARNO) syntax (PARNO standing for
"parenthesis number"). For example, the following pattern will match
nested balanced angle brackets:
-
/
^ # start of line
( # start capture buffer 1
< # match an opening angle bracket
(?: # match one of:
(?> # don't backtrack over the inside of this group
[^<>]+ # one or more non angle brackets
) # end non backtracking group
| # ... or ...
(?1) # recurse to bracket 1 and try it again
)* # 0 or more times.
> # match a closing angle bracket
) # end capture buffer one
$ # end of line
/x
-
PCRE users should note that Perl's recursive regex feature allows
backtracking into a recursed pattern, whereas in PCRE the recursion is
atomic or "possessive" in nature. As in the example above, you can
add (?>) to control this selectively. (Yves Orton)
- Named Capture Buffers
-
It is now possible to name capturing parenthesis in a pattern and refer to
the captured contents by name. The naming syntax is (?<NAME>....).
It's possible to backreference to a named buffer with the \k<NAME>
syntax. In code, the new magical hashes %+ and %- can be used to
access the contents of the capture buffers.
-
Thus, to replace all doubled chars with a single copy, one could write
-
s/(?<letter>.)\k<letter>/$+{letter}/g
-
Only buffers with defined contents will be "visible" in the %+ hash, so
it's possible to do something like
-
foreach my $name (keys %+) {
print "content of buffer '$name' is $+{$name}\n";
}
-
The %- hash is a bit more complete, since it will contain array refs
holding values from all capture buffers similarly named, if there should
be many of them.
-
%+ and %- are implemented as tied hashes through the new module
Tie::Hash::NamedCapture.
-
Users exposed to the .NET regex engine will find that the perl
implementation differs in that the numerical ordering of the buffers
is sequential, and not "unnamed first, then named". Thus in the pattern
-
/(A)(?<B>B)(C)(?<D>D)/
-
$1 will be 'A', $2 will be 'B', $3 will be 'C' and $4 will be 'D' and not
$1 is 'A', $2 is 'C' and $3 is 'B' and $4 is 'D' that a .NET programmer
would expect. This is considered a feature. :-) (Yves Orton)
- Possessive Quantifiers
-
Perl now supports the "possessive quantifier" syntax of the "atomic match"
pattern. Basically a possessive quantifier matches as much as it can and never
gives any back. Thus it can be used to control backtracking. The syntax is
similar to non-greedy matching, except instead of using a '?' as the modifier
the '+' is used. Thus ?+, *+, ++, {min,max}+ are now legal
quantifiers. (Yves Orton)
- Backtracking control verbs
-
The regex engine now supports a number of special-purpose backtrack
control verbs: (*THEN), (*PRUNE), (*MARK), (*SKIP), (*COMMIT), (*FAIL)
and (*ACCEPT). See the perlre manpage for their descriptions. (Yves Orton)
- Relative backreferences
-
A new syntax \g{N} or \gN where "N" is a decimal integer allows a
safer form of back-reference notation as well as allowing relative
backreferences. This should make it easier to generate and embed patterns
that contain backreferences. See Capture buffers in the perlre manpage. (Yves Orton)
\K escape
-
The functionality of Jeff Pinyan's module Regexp::Keep has been added to
the core. In regular expressions you can now use the special escape \K
as a way to do something like floating length positive lookbehind. It is
also useful in substitutions like:
-
s/(foo)bar/$1/g
-
that can now be converted to
-
s/foo\Kbar//g
-
which is much more efficient. (Yves Orton)
- Vertical and horizontal whitespace, and linebreak
-
Regular expressions now recognize the \v and \h escapes that match
vertical and horizontal whitespace, respectively. \V and \H
logically match their complements.
-
\R matches a generic linebreak, that is, vertical whitespace, plus
the multi-character sequence "\x0D\x0A".
say() is a new built-in, only available when use feature 'say' is in
effect, that is similar to print(), but that implicitly appends a newline
to the printed string. See say in the perlfunc manpage. (Robin Houston)
The default variable $_ can now be lexicalized, by declaring it like
any other lexical variable, with a simple
my $_;
The operations that default on $_ will use the lexically-scoped
version of $_ when it exists, instead of the global $_.
In a map or a grep block, if $_ was previously my'ed, then the
$_ inside the block is lexical as well (and scoped to the block).
In a scope where $_ has been lexicalized, you can still have access to
the global version of $_ by using $::_, or, more simply, by
overriding the lexical declaration with our $_. (Rafael Garcia-Suarez)
A new prototype character has been added. _ is equivalent to $ but
defaults to $_ if the corresponding argument isn't supplied. (both $
and _ denote a scalar). Due to the optional nature of the argument, you
can only use it at the end of a prototype, or before a semicolon.
This has a small incompatible consequence: the prototype() function has
been adjusted to return _ for some built-ins in appropriate cases (for
example, prototype('CORE::rmdir')). (Rafael Garcia-Suarez)
UNITCHECK, a new special code block has been introduced, in addition to
BEGIN, CHECK, INIT and END.
CHECK and INIT blocks, while useful for some specialized purposes,
are always executed at the transition between the compilation and the
execution of the main program, and thus are useless whenever code is
loaded at runtime. On the other hand, UNITCHECK blocks are executed
just after the unit which defined them has been compiled. See the perlmod manpage
for more information. (Alex Gough)
A new pragma, mro (for Method Resolution Order) has been added. It
permits to switch, on a per-class basis, the algorithm that perl uses to
find inherited methods in case of a multiple inheritance hierarchy. The
default MRO hasn't changed (DFS, for Depth First Search). Another MRO is
available: the C3 algorithm. See the mro manpage for more information.
(Brandon Black)
Note that, due to changes in the implementation of class hierarchy search,
code that used to undef the *ISA glob will most probably break. Anyway,
undef'ing *ISA had the side-effect of removing the magic on the @ISA
array and should not have been done in the first place. Also, the
cache *::ISA::CACHE:: no longer exists; to force reset the @ISA cache,
you now need to use the mro API, or more simply to assign to @ISA
(e.g. with @ISA = @ISA).
readdir() may return a "short filename" on Windows
The readdir() function may return a "short filename" when the long
filename contains characters outside the ANSI codepage. Similarly
Cwd::cwd() may return a short directory name, and glob() may return short
names as well. On the NTFS file system these short names can always be
represented in the ANSI codepage. This will not be true for all other file
system drivers; e.g. the FAT filesystem stores short filenames in the OEM
codepage, so some files on FAT volumes remain unaccessible through the
ANSI APIs.
Similarly, $^X, @INC, and $ENV{PATH} are preprocessed at startup to make
sure all paths are valid in the ANSI codepage (if possible).
The Win32::GetLongPathName() function now returns the UTF-8 encoded
correct long file name instead of using replacement characters to force
the name into the ANSI codepage. The new Win32::GetANSIPathName()
function can be used to turn a long pathname into a short one only if the
long one cannot be represented in the ANSI codepage.
Many other functions in the Win32 module have been improved to accept
UTF-8 encoded arguments. Please see the Win32 manpage for details.
The built-in function readpipe() is now overridable. Overriding it permits
also to override its operator counterpart, qx// (a.k.a. ``).
Moreover, it now defaults to $_ if no argument is provided. (Rafael
Garcia-Suarez)
readline() now defaults to *ARGV if no argument is provided. (Rafael
Garcia-Suarez)
A new class of variables has been introduced. State variables are similar
to my variables, but are declared with the state keyword in place of
my. They're visible only in their lexical scope, but their value is
persistent: unlike my variables, they're not undefined at scope entry,
but retain their previous value. (Rafael Garcia-Suarez, Nicholas Clark)
To use state variables, one needs to enable them by using
use feature 'state';
or by using the -E command-line switch in one-liners.
See Persistent variables via state() in the perlsub manpage.
As a new form of syntactic sugar, it's now possible to stack up filetest
|