XSLT: how do I parse a document to retain only interesting nodes, along with all parents, at any depth? -


given document below:

<root> <a>     <b>         <c>sometext</c>     </b>     <b>         <d/>     </b>     <b>         <e>             <f>some interesting__text more</f>         </e>     </b> </a> <h>     <g>another piece of very_interesting__text</g> </h> </root> 

i following out:

<root> <a>     <b>         <e>             <f>some interesting__text more</f>         </e>     </b> </a> <h>     <g>another piece of very_interesting__text</g> </h> <interesting>interesting__text</interesting> <interesting>very_interesting__text</interesting> </root> 

essentially, need out parent nodes of node contains interesting text, can matched using regex \w+__\w+. bonus, interesting pieces added somewhere @ end of document.

the nodes can contain interesting pieces can named anything, dependencies on specific node names cannot part of solution.

i'm thinking xslt way @ this, having trouble putting stylesheet. obviously, in code, prefer stylesheet using others in script, simplify things some.

thanks in advance.

edit: there error in sample xml, comment asked why tag being transformed tag - corrected in above.

here xslt 2.0 suggestion:

<xsl:stylesheet version="2.0"         xmlns:xsl="http://www.w3.org/1999/xsl/transform">  <xsl:param name="pattern" select="'\w+__\w+'"/>  <xsl:output indent="yes"/>  <xsl:variable name="text" select="//text()[matches(., $pattern)]"/> <xsl:variable name="nodes" select="$text/ancestor-or-self::node()"/>  <xsl:template match="/*">   <xsl:copy>     <xsl:apply-templates select="@* , node()[$nodes intersect .]"/>     <xsl:apply-templates select="$text" mode="interesting"/>   </xsl:copy> </xsl:template>  <xsl:template match="node()">   <xsl:copy>     <xsl:apply-templates select="@* , node()[$nodes intersect .]"/>   </xsl:copy> </xsl:template>  <xsl:template match="text()" mode="interesting">   <interesting><xsl:value-of select="."/></interesting> </xsl:template>  </xsl:stylesheet> 

using saxon 9.5, transforms

<root> <a>     <b>         <c>sometext</c>     </b>     <b>         <d/>     </b>     <b>         <e>             <f>some interesting__text more</f>         </e>     </b> </a> <f>     <g>another piece of very_interesting__text</g> </f> </root> 

into

<root>    <a>       <b>          <e>             <f>some interesting__text more</f>          </e>       </b>    </a>    <f>       <g>another piece of very_interesting__text</g>    </f>    <interesting>some interesting__text more</interesting>    <interesting>another piece of very_interesting__text</interesting> </root> 

the sample nodes don't have attribute, if can have them in real xml add template

<xsl:template match="@*">   <xsl:copy/> </xsl:template> 

to filter text nodes in collection of interesting elements can use analyze-string changing template text() to

<xsl:template match="text()" mode="interesting">   <interesting>     <xsl:analyze-string select="." regex="{$pattern}">       <xsl:matching-substring>         <xsl:value-of select="."/>       </xsl:matching-substring>     </xsl:analyze-string>   </interesting> </xsl:template> 

the result changed to

<root>    <a>       <b>          <e>             <f>some interesting__text more</f>          </e>       </b>    </a>    <f>       <g>another piece of very_interesting__text</g>    </f>    <interesting>interesting__text</interesting>    <interesting>interesting__text</interesting> </root> 

you might need change or adapt pattern if very_ substring should extracted.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -