ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Remove any HTML
Submitter: Robert Dettmann (other recipes)
Last Updated: 2001/08/02
Version no: 1.0
Category: Networking

 

3 stars 9 vote(s)


Description:

This RegExp removes any pure HTML and decoded HTML to filter out content only.

Usage: Text Source

$OnlyContent =~ s{<([^>])+>|&([^;])+;}{}gsx;

The license for this recipe is available here.

Discussion:

My own Web Authoring tool stores filtered HTML from MS-Word 2002 in a SQL Server 2000 Database. For listings in my search engines i need a short textual description from begin of HTML-content.



Add comment

Number of comments: 1

Yeah, I've got the power!, Alberto Adrian Schiano, 2003/05/12
REGEXES are amazing, don't you think? I hope they are not going to change too much with Perl 6. I like them like they are now :) .
Add comment



Highest rated recipes:

1. Breaking down a URI into ...

2. Finding Palindromes

3. Extracting HTML URL Links

4. Removing dangerous ...

5. Matching Royal Mail ...

6. Finding URLs in text -- ...

7. Validating email ...

8. Validate Domain Names

9. Extract the Korean ...

10. Remove any HTML




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.