RE: [xsl] Stylesheet Optimization -- How to Make It Faster
by Michael Kay other posts by this author
Nov 28 2006 1:14AM messages near this date
[xsl] Stylesheet Optimization -- How to Make It Faster
|
Re: [xsl] Stylesheet Optimization -- How to Make It Faster
& XSLT (a) It would be a nice courtesy if you could lay out the code so that we can read it.
(b) What XSLT processor are you using?
(c) The most obvious inefficiency is here:
expand="{$abbreviations[.=$abbr]/following-sibling::expanded}"
This would benefit from use of keys.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Jeff Sese [mailto:jsese@[...].com]
> Sent: 28 November 2006 01:41
> To: Xsl-List
> Subject: [xsl] Stylesheet Optimization -- How to Make It Faster
>
> I have a stylesheet that puts mark-up to text nodes that
> matches an abbreviation in a reference xml file. Its working
> nicely but the processing time is very slow... i'm guessing
> because its processing text nodes. A 800kb file takes me
> about 25 mins to process and i have around 800 file to
> process (varying file sizes, some are relatively small and
> some are fairly large). Is there any way to optimize my
> stylesheet so that it can process the files faster?
>
> here is my stylesheet:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="2.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> xmlns:xs="http://www.w3.org/2001/XMLSchema"
> xmlns:ati="http://www.asiatype.com/xslt-functions"
> exclude-result-prefixes="xs ati">
> <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
> <xsl:variable name="abbreviations" as="element()+"
> select="document('publishers_data.xml')/root/publisher/abbrev"/>
> <xsl:template match="/">
> <xsl:apply-templates/>
> </xsl:template>
> <xsl:template match="text()[ancestor::ab and
> not(ancestor::note[@id and @n and @lang])]"> <xsl:variable
> name="str" as="xs:string" select="."/> <xsl:choose> <xsl:when
> test="exists($abbreviations[matches($str,concat('(^|\W)(',ati:
escape(.),')($|\W)'))])">
> <xsl:variable name="search-str" as="xs:string+"
> select="$abbreviations[matches($str,concat('(^|\W)(',ati:escap
e(.),')($|\W)'))]"/>
> <xsl:variable name="replace" as="element()*"> <xsl:for-each
> select="$search-str"> <xsl:variable name="abbr"
> as="xs:string" select="."/> <abbr type="title"
> expand="{$abbreviations[.=$abbr]/following-sibling::expanded}"
> <xsl:value-of
> select="$abbr"/></abbr>
> </xsl:for-each>
> </xsl:variable>
> <xsl:sequence select="ati:replace-with-nodes($str,
> $search-str, $replace)"/> </xsl:when> <xsl:otherwise>
> <xsl:value-of select="$str"/> </xsl:otherwise> </xsl:choose>
> </xsl:template> <xsl:template
> match="@*|element()|comment()|processing-instruction()"
> mode="#all">
> <xsl:copy>
> <xsl:apply-templates select="@*|node()"/> </xsl:copy>
> </xsl:template> <xsl:function name="ati:replace-with-nodes"
> as="node()+"> <xsl:param name="input" as="xs:string"/>
> <xsl:param name="words-to-replace" as="xs:string*"/>
> <xsl:param name="replacement" as="node()*"/> <xsl:variable
> name="regex" select="string-join(for $w in $words-to-replace
> return concat('(', ati:escape($w), ')'),'|')"/>
> <xsl:analyze-string select="$input" regex="{$regex}">
> <xsl:matching-substring> <xsl:variable name="i"
> as="xs:integer" select="(1 to
> count($words-to-replace))[regex-group(.)]"/>
> <xsl:sequence select="$replacement[$i]"/>
> </xsl:matching-substring> <xsl:non-matching-substring>
> <xsl:value-of select="."/> </xsl:non-matching-substring>
> </xsl:analyze-string> </xsl:function> <xsl:function
> name="ati:escape"> <xsl:param name="s" as="xs:string"/>
> <xsl:sequence
> select="replace($s,'[\\\|\.\-\^\?\*\+\(\)\{\}\[\]\$]','\\$0')"/>
> </xsl:function>
> </xsl:stylesheet>
>
> heres a short version of the publishers_data.xml:
>
> <root>
> <publisher>
> <abbrev>Inschriften von Priene</abbrev>
> <expanded>Inschriften von Priene</expanded> </publisher>
> <publisher> <abbrev>P. Mil. Congr. XVIII</abbrev>
> <expanded>Papiri documentari dell'Università Cattolica di
> Milano</expanded> </publisher> <publisher> <abbrev>P. Jud.
> Des. Misc.</abbrev> <expanded>Discoveries in the Judean
> Desert XXXVIII</expanded> </publisher>
> <!-- more publishers here -->
> </root>
>
> heres a snippet of the source xml:
>
> <!-- preceding::node() of ab -->
> <ab lang="grk" n="1">
> <foreign lang="grk">Î? γÎγονε καÏ?á½° Ï?οὺÏ? Î?αÏείοÏ?</foreign>
> <note place="margin">a c</note> <lb n="5"/> <foreign
> lang="grk">Ï?ÏόνοÏ?Ï? Ï?οῦ μεÏ?á½° Î?αμβύÏ?ην βαÏ?ιλεύÏ?ανÏ?οÏ?
, á½?Ï?ε καὶ
> Î?ιονύÏ?ιοÏ? ἦν á½ Î?ιλήÏ?ιοÏ?</foreign> <lb/>(III), <foreign
> lang="grk">á¼Ï?á½¶ Ï?á¿?Ï? ξ¯ε¯ á½?λÏ?μÏ?ιάδοÏ?</foreign> (520/16)<foreign
> lang="grk">Î? á¼±Ï?Ï?οÏιογÏá½±Ï?οÏ?. ῾Î?ÏόδοÏ?οÏ? δὲ ὠ῾Î?λι-</for
eign>
> <note place="margin">v</note> <lb/> <foreign
> lang="grk">καÏναÏ?εὺÏ? á½ Ï?ÎληÏ?αι Ï?ούÏ?οÏ?, νεώÏ?εÏοÏ? ὤν. Î
ºÎ±á½¶ ἦν
> á¼?κοÏ?Ï?Ï?á½´Ï? Î ÏÏ?Ï?αγόÏοÏ?</foreign> <note id="n7" n="7" lang="ger">
> <foreign lang="grk">ὤνÎ? γÎγονε Î³á½°Ï Î¼ÎµÏ?á¾½ αá½Ï?όν</foreign>
> A</note> <lb/> <foreign lang="grk">ὠ῾Î?καÏ?αá¿?οÏ?. Ï?Ïá¿¶Ï?οÏ? δὲ
> á¼±Ï?Ï?οÏίαν Ï?εζῶÏ? á¼Î¾á½µÎ½ÎµÎ³ÎºÎµ, Ï?Ï?γγÏαÏ?ὴν δὲ ΦεÏεκύ
δηÏ?</foreign>
> <note id="n8â??9" n="8â??9" lang="ger"> <foreign
> lang="grk">Ï?Ïá¿¶Ï?οÏ?â??νοθεύεÏ?αι</foreign> wiederholt s. <foreign
> lang="grk">á½¶Ï?Ï?οÏá¿?Ï?αι</foreign>, s. <foreign
> lang="grk">Ï?Ï?γγÏαÏ?εá¿?Ï?</foreign>.</note>
> <lb/>(I 3). <foreign lang="grk">Ï?á½° Î³á½°Ï á¾½Î?κοÏ?Ï?ιλάοÏ?</foreign>
> (<link type="boj" targets="a002" n="BOJTEXT002_T_7">2 T
> 7</link>) <foreign lang="grk">νοθεύεÏ?αι.</foreign> <note
> id="n9" n="9" lang="ger"> <foreign
> lang="grk">á¾½Î?κοÏ?Ï?ιλάοÏ?</foreign> Vossius <foreign
> lang="grk">á¾½Î?γηÏ?ιλάοÏ?</foreign> Suid</note> </ab>
> <!-- following::node() of ab -->
>
> all: ab nodes appear in the same level (same depth) though out.
>
> Any suggestions are welcome.
>
> Thanks,
> --
> Jeff
>
> --~------------------------------------------------------------------
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
> To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
> or e-mail: <mailto:xsl-list-unsubscribe@[...].com>
> --~--
>
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@[...].com>
--~--
Thread:
Jeff Sese
Michael Kay
David Carlisle
Jeff Sese
|