Whether or not you think "mixed" content in XML is ever a good idea, you may need to handle it using JAXB one day. Recall that for JAXB to parse a mixed content XML element to a class C, you use an @XmlMixed
annotation on a field of C of type List< Serializable >
, combined with either @XmlAnyElement
or @XmlElements
. In each case, the resulting list will contain Strings representing the text nodes and objects representing the element nodes, in the same order as they appear in the XML text. Thus
<thing>stuff<nested/>entities<alsoNested/></thing>maps to an instance of
@XmlRootElement class Thing { @XmlMixed @XmlAnyElement List< Serializable > lserComponents; }which looks like
{ lserComponents : [ "stuff", { localName : "nested" }, "entities", { localName: "alsoNested" } ] }Unfortunately, if the only content other than nested elements happens to be white space, as in
<thing><nested/> <alsoNested/></thing>you get the odd bound object
{ lserComponents : [ { localName : "nested" }, { localName: "alsoNested" }, "" ] }If you care about white space, and who doesn't these days in the throes of late-stage Reaganomics, you need a trick when you actually go to parse the XML.
First, we create a SAX 2.0 ContentHandler implementation that delegates all events to a JAXB UnmarshallerHandler, but modifies all the whitespace slightly:
class WhitespaceAwareUnmarshallerHandler implements ContentHandler { private final UnmarshallerHandler uh; public WhitespaceAwareUnmarshallerHandler( UnmarshallerHandler uh ) { this.uh = uh; } /** * Replace all-whitespace character blocks with the character '\u000B', * which satisfies the following properties: * * 1. "\u000B".matches( "\\s" ) == true * 2. when parsing XmlMixed content, JAXB does not suppress the whitespace **/ public void characters( char[] ch, int start, int length ) throws SAXException { for ( int i = start + length - 1; i >= start; --i ) if ( !Character.isWhitespace( ch[ i ] ) ) { uh.characters( ch, start, length ); return; } Arrays.fill( ch, start, start + length, '\u000B' ); uh.characters( ch, start, length ); } /* what follows is just blind delegation monkey code */ public void ignorableWhitespace( char[] ch, int start, int length ) throws SAXException { uh.characters( ch, start, length ); } public void endDocument() throws SAXException { uh.endDocument(); } public void endElement( String uri, String localName, String name ) throws SAXException { uh.endElement( uri, localName, name ); } public void endPrefixMapping( String prefix ) throws SAXException { uh.endPrefixMapping( prefix ); } public void processingInstruction( String target, String data ) throws SAXException { uh.processingInstruction( target, data ); } public void setDocumentLocator( Locator locator ) { uh.setDocumentLocator( locator ); } public void skippedEntity( String name ) throws SAXException { uh.skippedEntity( name ); } public void startDocument() throws SAXException { uh.startDocument(); } public void startElement( String uri, String localName, String name, Attributes atts ) throws SAXException { uh.startElement( uri, localName, name, atts ); } public void startPrefixMapping( String prefix, String uri ) throws SAXException { uh.startPrefixMapping( prefix, uri ); } }Then at parse time, instead of the usual
ctx.createUnmarhaller().unmarshal( strData )
, we substitute our special handler to do the parsing:
public class JAXBUtil { @SuppressWarnings( "unchecked" ) public static < T > T unmarshal( JAXBContext ctx, String strData, boolean flgWhitespaceAware ) throws Exception { UnmarshallerHandler uh = ctx.createUnmarshaller().getUnmarshallerHandler(); XMLReader xr = new WstxSAXParser(); // use your favorite SAX 2.0 parser xr.setContentHandler( flgWhitespaceAware ? new WhitespaceAwareUnmarshallerHandler( uh ) : uh ); xr.parse( new InputSource( new StringReader( strData ) ) ); return ( T )uh.getResult(); } }
3 comments:
Good reading, still. But please write some more! :-)
Exactly the problem i have now. If i do like you mentioned, it gets me a NPE at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.expectText(UnmarshallingContext.java:512)
at com.sun.xml.bind.v2.runtime.unmarshaller.SAXConnector.characters(SAXConnector.java:150)
at de.sxp.parsertest.WhitespaceAwareUnmarshallerHandler.characters(WhitespaceAwareUnmarshallerHandler.java:66)
Nevermind ... forgot the remaining method bodys ...
Post a Comment