- The so-called "identity transformation" available at
TransformerFactory.newTransformer()
is anything but the identity when applied to XHTML, until certain non-default configuration is applied. Specifically, you have to do all this:
xfmEng.setOutputProperty( OutputKeys.DOCTYPE_SYSTEM, "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" );
xfmEng.setOutputProperty( OutputKeys.DOCTYPE_PUBLIC, "-//W3C//DTD XHTML 1.0 Transitional//EN" );
xfmEng.setOutputProperty( OutputKeys.METHOD, "html" );
xfmEng.setOutputProperty( OutputKeys.OMIT_XML_DECLARATION, "yes" );
or you get tons of<!-- ... -->
garbage before the real document. The garbage seems to live in the w3c.org dtd files for xhtml. - Even with all that, you still end up with the very non-identity transformation of input like
<script src=...></script>
becoming<script src=.../>
. The latter is actually malformed according to many browsers. Forget newTransformer() and use an xslt-based transformation like
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet> - Speaking of the w3c.org dtd files, there's still some nasty stuff going on behind the scenes; when any transformer created via
TransformerFactory.newTransformer()
orTemplates.newTransformer()
starts processing XHTML, it actually goes and grabs those extremely well-known DTDs off the web from their URIs at w3c.org. Every document, even with the same transformer, engenders a new set of GETs to w3c. Pretty ridiculous. Here's how to get around that:
package MyPackage;
import org.xml.sax.SAXNotRecognizedException;
import org.xml.sax.SAXNotSupportedException;
import com.sun.org.apache.xerces.internal.impl.Constants;
import com.sun.org.apache.xerces.internal.parsers.SAXParser;
public class MyTransform {
// ...
public static class MySAXParser extends SAXParser {
public MySAXParser() {
super();
try {
setFeature( Constants.SAX_FEATURE_PREFIX + Constants.VALIDATION_FEATURE, false );
setFeature( Constants.XERCES_FEATURE_PREFIX + Constants.LOAD_EXTERNAL_DTD_FEATURE, false );
} catch ( SAXNotRecognizedException sne ) {
} catch ( SAXNotSupportedException sse ) {
}
}
}
// in the code that uses the transformer:
System.setProperty( "org.xml.sax.driver", "MyPackage.MyTransform$MySAXParser" );
TransformerFactory.newInstance().newTransformer().transform( stmIn, stmOut );
// ...
}
Thursday, May 15, 2008
Identity transformation, my butt
Some lovely trivia I have recently discovered about the default implementations of XSLT transformations in the JDK 1.5:
Subscribe to:
Post Comments (Atom)
1 comment:
Yes: it is much better to use Saxon instead. :-)
Post a Comment