|
Description:
This regular expression is not very accurate, but it's fast and adequate for many situations.
Usage: Text Source
print "$2\n" while m{
< \s*
A \s+ HREF \s* = \s* (["']) (.*?) \1
\s* >
}gsix;
The license for this recipe is available here.
Discussion:
A more complete URL extractor (by Tom Christiansen) is available on CPAN at http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz
It's slower, but more accurate and still faster than a full HTML parser such as HTML::Parser.
|