ASPN ActiveState Programmer Network
ActiveState
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups


Recent Messages
List Archives
About the List
List Leaders
Subscription Options

View Subscriptions
Help

View by Topic
ActiveState
.NET Framework
Open Source
Perl
PHP
Python
Tcl
Web Services
XML & XSLT

View by Category
Database
General
SOAP
System Administration
Tools
User Interfaces
Web Programming
XML Programming


MyASPN >> Mail Archive >> xsl-list
xsl-list
[xsl] Stylesheet Optimization -- How to Make It Faster
by Jeff Sese other posts by this author
Nov 27 2006 5:40PM messages near this date
RE: [xsl] Recursively looping through a template only X number of times? | RE: [xsl] Stylesheet Optimization -- How to Make It Faster
& XSLT I have a stylesheet that puts mark-up to text nodes that matches an 
abbreviation in a reference xml file. Its working nicely but the 
processing time is very slow... i'm guessing because its processing text 
nodes. A 800kb file takes me about 25 mins to process and i have around 
800 file to process (varying file sizes, some are relatively small and 
some are fairly large). Is there any way to optimize my stylesheet so 
that it can process the files faster?

here is my stylesheet:

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:xs="http://www.w3.org/2001/XMLSchema" 
xmlns:ati="http://www.asiatype.com/xslt-functions" 
exclude-result-prefixes="xs ati"> 
<xsl:output method="xml" version="1.0" encoding="UTF-8"/> 
<xsl:variable name="abbreviations" as="element()+" 
select="document('publishers_data.xml')/root/publisher/abbrev"/> 
<xsl:template match="/"> 
<xsl:apply-templates/> 
</xsl:template> 
<xsl:template match="text()[ancestor::ab and not(ancestor::note[@id and 
@n and @lang])]"> 
<xsl:variable name="str" as="xs:string" select="."/> 
<xsl:choose> 
<xsl:when 
test="exists($abbreviations[matches($str,concat('(^|\W)(',ati:escape(.),')($|\W)'))])"> 
<xsl:variable name="search-str" as="xs:string+" 
select="$abbreviations[matches($str,concat('(^|\W)(',ati:escape(.),')($|\W)'))]"/> 
<xsl:variable name="replace" as="element()*"> 
<xsl:for-each select="$search-str"> 
<xsl:variable name="abbr" as="xs:string" select="."/> 
<abbr type="title" 
expand="{$abbreviations[.=$abbr]/following-sibling::expanded}"> <xsl:value-of 
select="$abbr"/> </abbr>
</xsl:for-each> 
</xsl:variable> 
<xsl:sequence select="ati:replace-with-nodes($str, $search-str, $replace)"/> 
</xsl:when> 
<xsl:otherwise> 
<xsl:value-of select="$str"/> 
</xsl:otherwise> 
</xsl:choose> 
</xsl:template> 
<xsl:template match="@*|element()|comment()|processing-instruction()" 
mode="#all"> 
<xsl:copy> 
<xsl:apply-templates select="@*|node()"/> 
</xsl:copy> 
</xsl:template> 
<xsl:function name="ati:replace-with-nodes" as="node()+"> 
<xsl:param name="input" as="xs:string"/> 
<xsl:param name="words-to-replace" as="xs:string*"/> 
<xsl:param name="replacement" as="node()*"/> 
<xsl:variable name="regex" select="string-join(for $w in 
$words-to-replace return concat('(', ati:escape($w), ')'),'|')"/> 
<xsl:analyze-string select="$input" regex="{$regex}"> 
<xsl:matching-substring> 
<xsl:variable name="i" as="xs:integer" select="(1 to 
count($words-to-replace))[regex-group(.)]"/> 
<xsl:sequence select="$replacement[$i]"/> 
</xsl:matching-substring> 
<xsl:non-matching-substring> 
<xsl:value-of select="."/> 
</xsl:non-matching-substring> 
</xsl:analyze-string> 
</xsl:function> 
<xsl:function name="ati:escape"> 
<xsl:param name="s" as="xs:string"/> 
<xsl:sequence 
select="replace($s,'[\\\|\.\-\^\?\*\+\(\)\{\}\[\]\$]','\\$0')"/> 
</xsl:function> 
</xsl:stylesheet> 

heres a short version of the publishers_data.xml:

<root> 
<publisher> 
<abbrev> Inschriften von Priene</abbrev>
<expanded> Inschriften von Priene</expanded>
</publisher> 
<publisher> 
<abbrev> P. Mil. Congr. XVIII</abbrev>
<expanded> Papiri documentari dell'UniversitàCattolica di Milano</expanded>
</publisher> 
<publisher> 
<abbrev> P. Jud. Des. Misc.</abbrev>
<expanded> Discoveries in the Judean Desert XXXVIII</expanded>
</publisher> 
<!-- more publishers here --> 
</root> 

heres a snippet of the source xml:

<!-- preceding::node() of ab --> 
<ab lang="grk" n="1"> 
<foreign lang="grk"> Î? γέγονε καÏ?á½° Ï?οὺÏ? Î?αρείοÏ?</foreign>
<note place="margin"> a c</note>
<lb n="5"/> 
<foreign lang="grk"> Ï?ρόνοÏ?Ï? Ï?οῦ μεÏ?á½° Î?αμβύÏ?ην βαÏ?ιλεύÏ?αÎ
½Ï?οÏ?, á½?Ï?ε καὶ 
Î?ιονύÏ?ιοÏ? ἦν ὁ Î?ιλήÏ?ιοÏ?</foreign> 
<lb/> (III), <foreign lang="grk">ἐÏ?á½¶ Ï?á¿?Ï? ξ¯ε¯ á½?λÏ?μÏ?ιάδοÏ?</foreign> 
(520/16)<foreign lang="grk"> Î? á¼±Ï?Ï?οριογράÏ?οÏ?. ῾Î?ρόδοÏ?οÏ? δὲ ὁ 
῾Î?λι-</foreign> 
<note place="margin"> v</note>
<lb/> 
<foreign lang="grk"> καρναÏ?εὺÏ? á½ Ï?έληÏ?αι Ï?ούÏ?οÏ?, νεώÏ?εροÏ? á
½¤Î½. καὶ ἦν 
á¼?κοÏ?Ï?Ï?á½´Ï? ΠρÏ?Ï?αγόροÏ?</foreign> 
<note id="n7" n="7" lang="ger"> 
<foreign lang="grk"> ὤνÎ? γέγονε γὰρ μεÏ?á¾½ αὐÏ?όν</foreign> A</note>
<lb/> 
<foreign lang="grk"> ὁ ῾Î?καÏ?αá¿?οÏ?. Ï?ρῶÏ?οÏ? δὲ á¼±Ï?Ï?ορίαν Ï?εζá
¿¶Ï? ἐξήνεγκε, 
Ï?Ï?γγραÏ?ὴν δὲ ΦερεκύδηÏ?</foreign> 
<note id="n8â??9" n="8â??9" lang="ger"> 
<foreign lang="grk"> Ï?ρῶÏ?οÏ?â??νοθεύεÏ?αι</foreign> wiederholt s. <foreign 
lang="grk"> á½¶Ï?Ï?ορá¿?Ï?αι</foreign>, s. <foreign 
lang="grk"> Ï?Ï?γγραÏ?εá¿?Ï?</foreign>.</note>
<lb/> (I 3). <foreign lang="grk">Ï?á½° γὰρ á¾½Î?κοÏ?Ï?ιλάοÏ?</foreign> (<link 
type="boj" targets="a002" n="BOJTEXT002_T_7"> 2 T 7</link>) <foreign 
lang="grk"> νοθεύεÏ?αι.</foreign>
<note id="n9" n="9" lang="ger"> 
<foreign lang="grk"> á¾½Î?κοÏ?Ï?ιλάοÏ?</foreign> Vossius <foreign 
lang="grk"> á¾½Î?γηÏ?ιλάοÏ?</foreign> Suid</note>
</ab> 
<!-- following::node() of ab --> 

all: ab nodes appear in the same level (same depth) though out.

Any suggestions are welcome.

Thanks,
--
Jeff

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@[...].com> 
--~--
Thread:
Jeff Sese
Michael Kay
David Carlisle
Jeff Sese

Privacy Policy | Email Opt-out | Feedback | Syndication
© ActiveState Software Inc. All rights reserved