|
Description:
This RegExp removes any pure HTML and decoded HTML to filter out content only.
Usage: Text Source
$OnlyContent =~ s{<([^>])+>|&([^;])+;}{}gsx;
The license for this recipe is available here.
Discussion:
My own Web Authoring tool stores filtered HTML from MS-Word 2002 in a SQL Server 2000 Database. For listings in my search engines i need a short textual description from begin of HTML-content.
|