Whether or not you think "mixed" content in XML is ever a good idea, you may need to handle it using JAXB one day. Recall that for JAXB to parse a mixed content XML element to a class C, you use an @XmlMixed annotation on a field of C of type List< Serializable >, combined with either @XmlAnyElement or @XmlElements. In each case, the resulting list will contain Strings representing the text nodes and objects representing the element nodes, in the same order as they appear in the XML text. Thus
<thing>stuff<nested/>entities<alsoNested/></thing>maps to an instance of
@XmlRootElement
class Thing {
@XmlMixed @XmlAnyElement
List< Serializable > lserComponents;
}
which looks like
{ lserComponents : [ "stuff", { localName : "nested" }, "entities", { localName: "alsoNested" } ] }
Unfortunately, if the only content other than nested elements happens to be white space, as in
<thing><nested/> <alsoNested/></thing>you get the odd bound object
{ lserComponents : [ { localName : "nested" }, { localName: "alsoNested" }, "" ] }
If you care about white space, and who doesn't these days in the throes of late-stage Reaganomics, you need a trick when you actually go to parse the XML.
First, we create a SAX 2.0 ContentHandler implementation that delegates all events to a JAXB UnmarshallerHandler, but modifies all the whitespace slightly:
class WhitespaceAwareUnmarshallerHandler implements ContentHandler {
private final UnmarshallerHandler uh;
public WhitespaceAwareUnmarshallerHandler( UnmarshallerHandler uh ) {
this.uh = uh;
}
/**
* Replace all-whitespace character blocks with the character '\u000B',
* which satisfies the following properties:
*
* 1. "\u000B".matches( "\\s" ) == true
* 2. when parsing XmlMixed content, JAXB does not suppress the whitespace
**/
public void characters(
char[] ch, int start, int length
) throws SAXException {
for ( int i = start + length - 1; i >= start; --i )
if ( !Character.isWhitespace( ch[ i ] ) ) {
uh.characters( ch, start, length );
return;
}
Arrays.fill( ch, start, start + length, '\u000B' );
uh.characters( ch, start, length );
}
/* what follows is just blind delegation monkey code */
public void ignorableWhitespace( char[] ch, int start, int length ) throws SAXException { uh.characters( ch, start, length ); }
public void endDocument() throws SAXException { uh.endDocument(); }
public void endElement( String uri, String localName, String name ) throws SAXException { uh.endElement( uri, localName, name ); }
public void endPrefixMapping( String prefix ) throws SAXException { uh.endPrefixMapping( prefix ); }
public void processingInstruction( String target, String data ) throws SAXException { uh.processingInstruction( target, data ); }
public void setDocumentLocator( Locator locator ) { uh.setDocumentLocator( locator ); }
public void skippedEntity( String name ) throws SAXException { uh.skippedEntity( name ); }
public void startDocument() throws SAXException { uh.startDocument(); }
public void startElement( String uri, String localName, String name, Attributes atts ) throws SAXException { uh.startElement( uri, localName, name, atts ); }
public void startPrefixMapping( String prefix, String uri ) throws SAXException { uh.startPrefixMapping( prefix, uri ); }
}
Then at parse time, instead of the usual ctx.createUnmarhaller().unmarshal( strData ), we substitute our special handler to do the parsing:
public class JAXBUtil {
@SuppressWarnings( "unchecked" )
public static < T > T unmarshal(
JAXBContext ctx, String strData, boolean flgWhitespaceAware
) throws Exception {
UnmarshallerHandler uh = ctx.createUnmarshaller().getUnmarshallerHandler();
XMLReader xr = new WstxSAXParser(); // use your favorite SAX 2.0 parser
xr.setContentHandler( flgWhitespaceAware ? new WhitespaceAwareUnmarshallerHandler( uh ) : uh );
xr.parse( new InputSource( new StringReader( strData ) ) );
return ( T )uh.getResult();
}
}