[xsl] Stylesheet Optimization -- How to Make It Faster
by Jeff Sese other posts by this author
Nov 27 2006 5:40PM messages near this date
RE: [xsl] Recursively looping through a template only X number of times?
|
RE: [xsl] Stylesheet Optimization -- How to Make It Faster
& XSLT I have a stylesheet that puts mark-up to text nodes that matches an
abbreviation in a reference xml file. Its working nicely but the
processing time is very slow... i'm guessing because its processing text
nodes. A 800kb file takes me about 25 mins to process and i have around
800 file to process (varying file sizes, some are relatively small and
some are fairly large). Is there any way to optimize my stylesheet so
that it can process the files faster?
here is my stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ati="http://www.asiatype.com/xslt-functions"
exclude-result-prefixes="xs ati">
<xsl:output method="xml" version="1.0" encoding="UTF-8"/>
<xsl:variable name="abbreviations" as="element()+"
select="document('publishers_data.xml')/root/publisher/abbrev"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()[ancestor::ab and not(ancestor::note[@id and
@n and @lang])]">
<xsl:variable name="str" as="xs:string" select="."/>
<xsl:choose>
<xsl:when
test="exists($abbreviations[matches($str,concat('(^|\W)(',ati:escape(.),')($|\W)'))])">
<xsl:variable name="search-str" as="xs:string+"
select="$abbreviations[matches($str,concat('(^|\W)(',ati:escape(.),')($|\W)'))]"/>
<xsl:variable name="replace" as="element()*">
<xsl:for-each select="$search-str">
<xsl:variable name="abbr" as="xs:string" select="."/>
<abbr type="title"
expand="{$abbreviations[.=$abbr]/following-sibling::expanded}"> <xsl:value-of
select="$abbr"/> </abbr>
</xsl:for-each>
</xsl:variable>
<xsl:sequence select="ati:replace-with-nodes($str, $search-str, $replace)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$str"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="@*|element()|comment()|processing-instruction()"
mode="#all">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:function name="ati:replace-with-nodes" as="node()+">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="words-to-replace" as="xs:string*"/>
<xsl:param name="replacement" as="node()*"/>
<xsl:variable name="regex" select="string-join(for $w in
$words-to-replace return concat('(', ati:escape($w), ')'),'|')"/>
<xsl:analyze-string select="$input" regex="{$regex}">
<xsl:matching-substring>
<xsl:variable name="i" as="xs:integer" select="(1 to
count($words-to-replace))[regex-group(.)]"/>
<xsl:sequence select="$replacement[$i]"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>
<xsl:function name="ati:escape">
<xsl:param name="s" as="xs:string"/>
<xsl:sequence
select="replace($s,'[\\\|\.\-\^\?\*\+\(\)\{\}\[\]\$]','\\$0')"/>
</xsl:function>
</xsl:stylesheet>
heres a short version of the publishers_data.xml:
<root>
<publisher>
<abbrev> Inschriften von Priene</abbrev>
<expanded> Inschriften von Priene</expanded>
</publisher>
<publisher>
<abbrev> P. Mil. Congr. XVIII</abbrev>
<expanded> Papiri documentari dell'Università Cattolica di Milano</expanded>
</publisher>
<publisher>
<abbrev> P. Jud. Des. Misc.</abbrev>
<expanded> Discoveries in the Judean Desert XXXVIII</expanded>
</publisher>
<!-- more publishers here -->
</root>
heres a snippet of the source xml:
<!-- preceding::node() of ab -->
<ab lang="grk" n="1">
<foreign lang="grk"> Î? γÎγονε καÏ?á½° Ï?οὺÏ? Î?αÏείοÏ?</foreign>
<note place="margin"> a c</note>
<lb n="5"/>
<foreign lang="grk"> Ï?ÏόνοÏ?Ï? Ï?οῦ μεÏ?á½° Î?αμβύÏ?ην βαÏ?ιλεύÏ?αÎ
½Ï?οÏ?, á½?Ï?ε καὶ
Î?ιονύÏ?ιοÏ? ἦν á½ Î?ιλήÏ?ιοÏ?</foreign>
<lb/> (III), <foreign lang="grk">á¼Ï?á½¶ Ï?á¿?Ï? ξ¯ε¯ á½?λÏ?μÏ?ιάδοÏ?</foreign>
(520/16)<foreign lang="grk"> Î? á¼±Ï?Ï?οÏιογÏá½±Ï?οÏ?. ῾Î?ÏόδοÏ?οÏ? δὲ á½
῾Î?λι-</foreign>
<note place="margin"> v</note>
<lb/>
<foreign lang="grk"> καÏναÏ?εὺÏ? á½ Ï?ÎληÏ?αι Ï?ούÏ?οÏ?, νεώÏ?εÏοÏ? á
½¤Î½. καὶ ἦν
á¼?κοÏ?Ï?Ï?á½´Ï? Î ÏÏ?Ï?αγόÏοÏ?</foreign>
<note id="n7" n="7" lang="ger">
<foreign lang="grk"> ὤνÎ? γÎγονε Î³á½°Ï Î¼ÎµÏ?á¾½ αá½Ï?όν</foreign> A</note>
<lb/>
<foreign lang="grk"> ὠ῾Î?καÏ?αá¿?οÏ?. Ï?Ïá¿¶Ï?οÏ? δὲ á¼±Ï?Ï?οÏίαν Ï?εζá
¿¶Ï? á¼Î¾á½µÎ½ÎµÎ³ÎºÎµ,
Ï?Ï?γγÏαÏ?ὴν δὲ ΦεÏεκύδηÏ?</foreign>
<note id="n8â??9" n="8â??9" lang="ger">
<foreign lang="grk"> Ï?Ïá¿¶Ï?οÏ?â??νοθεύεÏ?αι</foreign> wiederholt s. <foreign
lang="grk"> á½¶Ï?Ï?οÏá¿?Ï?αι</foreign>, s. <foreign
lang="grk"> Ï?Ï?γγÏαÏ?εá¿?Ï?</foreign>.</note>
<lb/> (I 3). <foreign lang="grk">Ï?á½° Î³á½°Ï á¾½Î?κοÏ?Ï?ιλάοÏ?</foreign> (<link
type="boj" targets="a002" n="BOJTEXT002_T_7"> 2 T 7</link>) <foreign
lang="grk"> νοθεύεÏ?αι.</foreign>
<note id="n9" n="9" lang="ger">
<foreign lang="grk"> á¾½Î?κοÏ?Ï?ιλάοÏ?</foreign> Vossius <foreign
lang="grk"> á¾½Î?γηÏ?ιλάοÏ?</foreign> Suid</note>
</ab>
<!-- following::node() of ab -->
all: ab nodes appear in the same level (same depth) though out.
Any suggestions are welcome.
Thanks,
--
Jeff
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@[...].com>
--~--
Thread:
Jeff Sese
Michael Kay
David Carlisle
Jeff Sese
|