ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Extracting HTML URL Links
Submitter: Ken Simpson (other recipes)
Last Updated: 2001/05/24
Version no: 1.0
Category: Networking

 

4 stars 2 vote(s)


Approved

Description:

This regular expression is not very accurate, but it's fast and adequate for many situations.

Usage: Text Source

#!/usr/bin/perl -n00

# qxurl - tchrist@perl.com

print "$2\n" while m{
    < \s*
      A \s+ HREF \s* = \s* (["']) (.*?) \1
    \s* >
}gsix;

The license for this recipe is available here.

Discussion:

A more complete URL extractor (by Tom Christiansen) is available on CPAN at http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz

It's slower, but more accurate and still faster than a full HTML parser such as HTML::Parser.



Add comment

No comments.



Highest rated recipes:

1. Breaking down a URI into ...

2. Finding Palindromes

3. Extracting HTML URL Links

4. Removing dangerous ...

5. Matching Royal Mail ...

6. Finding URLs in text -- ...

7. Validating email ...

8. Validate Domain Names

9. Extract the Korean ...

10. Remove any HTML




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.