ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Breaking down a URI into its component parts
Submitter: Ken Simpson (other recipes)
Last Updated: 2001/05/28
Version no: 1.0
Category: Networking

 

4 stars 4 vote(s)


Approved

Description:

Appendix B of IETF RFC 2396 provides this regular expression, which breaks down
a Uniform Resource Identifier (URI) into its component parts.

Usage: Text Source

my $uri = "http://www.ics.uci.edu/pub/ietf/uri/#Related";
print "$1, $2, $3, $4, $5, $6, $7, $8, $9" if
  $uri =~ m{^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?};

The license for this recipe is available here.

Discussion:

If the match is successful, a URL such as

http://www.ics.uci.edu/pub/ietf/uri/#Related

will be broken down into the following group match variables:

$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 =
$7 =
$8 = #Related
$9 = Related

In general, this regular expression breaks a URI down into the following parts,
as defined in the RFC:

scheme = $2
authority = $4
path = $5
query = $7
fragment = $9



Add comment

Number of comments: 1

My mistake, John Liu, 2002/03/14
I mistook this for a regular expression to actually catch URLs from any text, and as such, it actually does quite a poor job, because it maps everything that fits ^[^#?]

Anyway, upon re-reading the description, use this to break a URI down to its component parts, I realized my error. Anyway, use another regular expression to get your URLs.

Add comment



Highest rated recipes:

1. Breaking down a URI into ...

2. Finding Palindromes

3. Extracting HTML URL Links

4. Removing dangerous ...

5. Matching Royal Mail ...

6. Finding URLs in text -- ...

7. Validating email ...

8. Validate Domain Names

9. Extract the Korean ...

10. Remove any HTML




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.