Package org.apache.xerces.dom
Class DOMNormalizer
java.lang.Object
org.apache.xerces.dom.DOMNormalizer
- All Implemented Interfaces:
org.apache.xerces.xni.XMLDocumentHandler
This class adds implementation for normalizeDocument method.
It acts as if the document was going through a save and load cycle, putting
the document in a "normal" form. The actual result depends on the features being set
and governing what operations actually take place. See setNormalizationFeature for details.
Noticeably this method normalizes Text nodes, makes the document "namespace wellformed",
according to the algorithm described below in pseudo code, by adding missing namespace
declaration attributes and adding or changing namespace prefixes, updates the replacement
tree of EntityReference nodes, normalizes attribute values, etc.
Mutation events, when supported, are generated to reflect the changes occuring on the
document.
See Namespace normalization for details on how namespace declaration attributes and prefixes
are normalized.
NOTE: There is an initial support for DOM revalidation with XML Schema as a grammar.
The tree might not be validated correctly if entityReferences, CDATA sections are
present in the tree. The PSVI information is not exposed, normalized data (including element
default content is not available).
EXPERIMENTAL:
- This class should not be considered stable. It is likely to be altered or replaced in the future.
- Version:
- $Id: DOMNormalizer.java 1710695 2015-10-26 20:48:54Z mrglavas $
- Author:
- Elena Litani, IBM, Neeraj Bajaj, Sun Microsystems, inc.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final RuntimeExceptionIf the user stops the process, this exception will be thrown.protected static final booleanDebug namespace fix up algorithmprotected static final booleanDebug document handler eventsprotected static final booleanDebug normalize documentstatic final org.apache.xerces.xni.XMLStringEmpty string to pass to the validator.protected final ArrayListlist of attributesprotected final DOMNormalizer.XMLAttributesProxyprotected DOMConfigurationImplprotected Nodefor setting the PSVIprotected CoreDocumentImplprotected DOMErrorHandlererror handler.protected final org.apache.xerces.xni.NamespaceContextStores all namespace bindings on the current elementprotected final DOMLocatorImplDOM Locator - for namespace fixup algorithmprotected final org.apache.xerces.xni.NamespaceContextThe namespace context of this document: stores namespaces in scopeprotected booleanprotected booleanprotected final org.apache.xerces.xni.QNameprotected SymbolTablesymbol tableprotected RevalidationHandlerValidation handler represents validator instance.protected static final Stringprefix added by namespace fixup algorithm should follow a pattern "NS" + index -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected final voidaddNamespaceDecl(String prefix, String uri, ElementImpl element) Adds a namespace attribute or replaces the value of existing namespace attribute with the given prefix and value for URI.voidcharacters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Character content.voidcomment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) A comment.voiddoctypeDecl(String rootElement, String publicId, String systemId, org.apache.xerces.xni.Augmentations augs) Notifies of the presence of the DOCTYPE line in the document.voidemptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs) An empty element.voidendCDATA(org.apache.xerces.xni.Augmentations augs) The end of a CDATA section.voidendDocument(org.apache.xerces.xni.Augmentations augs) The end of the document.voidendElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) The end of an element.voidendGeneralEntity(String name, org.apache.xerces.xni.Augmentations augs) This method notifies the end of a general entity.protected final voidexpandEntityRef(Node parent, Node reference) org.apache.xerces.xni.parser.XMLDocumentSourceReturns the document source.voidignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Ignorable whitespace.static final voidisAttrValueWF(DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, NamedNodeMap attributes, Attr a, String value, boolean xml11Version) NON-DOM: check if attribute value is well-formedstatic final voidisCDataWF(DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String datavalue, boolean isXML11Version) Check if CDATA section is well-formedstatic final voidisCommentWF(DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String datavalue, boolean isXML11Version) NON-DOM: check if value of the comment is well-formedstatic final voidisXMLCharWF(DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String datavalue, boolean isXML11Version) NON-DOM: check for valid XML characters as per the XML versionprotected final voidnamespaceFixUp(ElementImpl element, AttributeMap attributes) protected voidnormalizeDocument(CoreDocumentImpl document, DOMConfigurationImpl config) Normalizes document.protected NodenormalizeNode(Node node) This method acts as if the document was going through a save and load cycle, putting the document in a "normal" form.voidprocessingInstruction(String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs) A processing instruction.static final voidreportDOMError(DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String message, short severity, String type) Reports a DOM error to the user handler.voidsetDocumentSource(org.apache.xerces.xni.parser.XMLDocumentSource source) Sets the document source.voidstartCDATA(org.apache.xerces.xni.Augmentations augs) The start of a CDATA section.voidstartDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.NamespaceContext namespaceContext, org.apache.xerces.xni.Augmentations augs) The start of the document.voidstartElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs) The start of an element.voidstartGeneralEntity(String name, org.apache.xerces.xni.XMLResourceIdentifier identifier, String encoding, org.apache.xerces.xni.Augmentations augs) This method notifies the start of a general entity.voidNotifies of the presence of a TextDecl line in an entity.protected final voidupdateQName(Node node, org.apache.xerces.xni.QName qname) voidxmlDecl(String version, String encoding, String standalone, org.apache.xerces.xni.Augmentations augs) Notifies of the presence of an XMLDecl line in the document.
-
Field Details
-
DEBUG_ND
protected static final boolean DEBUG_NDDebug normalize document- See Also:
-
DEBUG
protected static final boolean DEBUGDebug namespace fix up algorithm- See Also:
-
DEBUG_EVENTS
protected static final boolean DEBUG_EVENTSDebug document handler events- See Also:
-
PREFIX
prefix added by namespace fixup algorithm should follow a pattern "NS" + index- See Also:
-
fConfiguration
-
fDocument
-
fAttrProxy
-
fQName
protected final org.apache.xerces.xni.QName fQName -
fValidationHandler
Validation handler represents validator instance. -
fSymbolTable
symbol table -
fErrorHandler
error handler. may be null. -
fNamespaceValidation
protected boolean fNamespaceValidation -
fPSVI
protected boolean fPSVI -
fNamespaceContext
protected final org.apache.xerces.xni.NamespaceContext fNamespaceContextThe namespace context of this document: stores namespaces in scope -
fLocalNSBinder
protected final org.apache.xerces.xni.NamespaceContext fLocalNSBinderStores all namespace bindings on the current element -
fAttributeList
list of attributes -
fLocator
DOM Locator - for namespace fixup algorithm -
fCurrentNode
for setting the PSVI -
abort
If the user stops the process, this exception will be thrown. -
EMPTY_STRING
public static final org.apache.xerces.xni.XMLString EMPTY_STRINGEmpty string to pass to the validator.
-
-
Constructor Details
-
DOMNormalizer
public DOMNormalizer()
-
-
Method Details
-
normalizeDocument
Normalizes document. Note: reset() must be called before this method. -
normalizeNode
This method acts as if the document was going through a save and load cycle, putting the document in a "normal" form. The actual result depends on the features being set and governing what operations actually take place. See setNormalizationFeature for details. Noticeably this method normalizes Text nodes, makes the document "namespace wellformed", according to the algorithm described below in pseudo code, by adding missing namespace declaration attributes and adding or changing namespace prefixes, updates the replacement tree of EntityReference nodes,normalizes attribute values, etc.- Parameters:
node- Modified node or null. If node is returned, we need to normalize again starting on the node returned.- Returns:
- the normalized Node
-
expandEntityRef
-
namespaceFixUp
-
addNamespaceDecl
Adds a namespace attribute or replaces the value of existing namespace attribute with the given prefix and value for URI. In case prefix is empty will add/update default namespace declaration.- Parameters:
prefix-uri-- Throws:
IOException
-
isCDataWF
public static final void isCDataWF(DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String datavalue, boolean isXML11Version) Check if CDATA section is well-formed- Parameters:
datavalue-isXML11Version- = true if XML 1.1
-
isXMLCharWF
public static final void isXMLCharWF(DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String datavalue, boolean isXML11Version) NON-DOM: check for valid XML characters as per the XML version- Parameters:
datavalue-isXML11Version- = true if XML 1.1
-
isCommentWF
public static final void isCommentWF(DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String datavalue, boolean isXML11Version) NON-DOM: check if value of the comment is well-formed- Parameters:
datavalue-isXML11Version- = true if XML 1.1
-
isAttrValueWF
public static final void isAttrValueWF(DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, NamedNodeMap attributes, Attr a, String value, boolean xml11Version) NON-DOM: check if attribute value is well-formed- Parameters:
attributes-a-value-
-
reportDOMError
public static final void reportDOMError(DOMErrorHandler errorHandler, DOMErrorImpl error, DOMLocatorImpl locator, String message, short severity, String type) Reports a DOM error to the user handler. If the error is fatal, the processing will be always aborted. -
updateQName
-
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.NamespaceContext namespaceContext, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException The start of the document.- Specified by:
startDocumentin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
locator- The document locator, or null if the document location cannot be reported during the parsing of this document. However, it is strongly recommended that a locator be supplied that can at least report the system identifier of the document.encoding- The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).namespaceContext- The namespace context in effect at the start of this document. This object represents the current context. Implementors of this class are responsible for copying the namespace bindings from the the current context (and its parent contexts) if that information is important.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
xmlDecl
public void xmlDecl(String version, String encoding, String standalone, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Notifies of the presence of an XMLDecl line in the document. If present, this method will be called immediately following the startDocument call.- Specified by:
xmlDeclin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
version- The XML version.encoding- The IANA encoding name of the document, or null if not specified.standalone- The standalone value, or null if not specified.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
doctypeDecl
public void doctypeDecl(String rootElement, String publicId, String systemId, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Notifies of the presence of the DOCTYPE line in the document.- Specified by:
doctypeDeclin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
rootElement- The name of the root element.publicId- The public identifier if an external DTD or null if the external DTD is specified using SYSTEM.systemId- The system identifier if an external DTD, null otherwise.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
comment
public void comment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException A comment.- Specified by:
commentin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
text- The text in the comment.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by application to signal an error.
-
processingInstruction
public void processingInstruction(String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException A processing instruction. Processing instructions consist of a target name and, optionally, text data. The data is only meaningful to the application.Typically, a processing instruction's data will contain a series of pseudo-attributes. These pseudo-attributes follow the form of element attributes but are not parsed or presented to the application as anything other than text. The application is responsible for parsing the data.
- Specified by:
processingInstructionin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
target- The target.data- The data or null if none specified.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
startElement
public void startElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException The start of an element.- Specified by:
startElementin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
element- The name of the element.attributes- The element attributes.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
emptyElement
public void emptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException An empty element.- Specified by:
emptyElementin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
element- The name of the element.attributes- The element attributes.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
startGeneralEntity
public void startGeneralEntity(String name, org.apache.xerces.xni.XMLResourceIdentifier identifier, String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException This method notifies the start of a general entity.Note: This method is not called for entity references appearing as part of attribute values.
- Specified by:
startGeneralEntityin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
name- The name of the general entity.identifier- The resource identifier.encoding- The auto-detected IANA encoding name of the entity stream. This value will be null in those situations where the entity encoding is not auto-detected (e.g. internal entities or a document entity that is parsed from a java.io.Reader).augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
textDecl
public void textDecl(String version, String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Notifies of the presence of a TextDecl line in an entity. If present, this method will be called immediately following the startEntity call.Note: This method will never be called for the document entity; it is only called for external general entities referenced in document content.
Note: This method is not called for entity references appearing as part of attribute values.
- Specified by:
textDeclin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
version- The XML version, or null if not specified.encoding- The IANA encoding name of the entity.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
endGeneralEntity
public void endGeneralEntity(String name, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException This method notifies the end of a general entity.Note: This method is not called for entity references appearing as part of attribute values.
- Specified by:
endGeneralEntityin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
name- The name of the entity.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
characters
public void characters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Character content.- Specified by:
charactersin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
text- The content.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
ignorableWhitespace
public void ignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Ignorable whitespace. For this method to be called, the document source must have some way of determining that the text containing only whitespace characters should be considered ignorable. For example, the validator can determine if a length of whitespace characters in the document are ignorable based on the element content model.- Specified by:
ignorableWhitespacein interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
text- The ignorable whitespace.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
endElement
public void endElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException The end of an element.- Specified by:
endElementin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
element- The name of the element.augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
startCDATA
public void startCDATA(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException The start of a CDATA section.- Specified by:
startCDATAin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
endCDATA
public void endCDATA(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException The end of a CDATA section.- Specified by:
endCDATAin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
endDocument
public void endDocument(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException The end of the document.- Specified by:
endDocumentin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Parameters:
augs- Additional information that may include infoset augmentations- Throws:
org.apache.xerces.xni.XNIException- Thrown by handler to signal an error.
-
setDocumentSource
public void setDocumentSource(org.apache.xerces.xni.parser.XMLDocumentSource source) Sets the document source.- Specified by:
setDocumentSourcein interfaceorg.apache.xerces.xni.XMLDocumentHandler
-
getDocumentSource
public org.apache.xerces.xni.parser.XMLDocumentSource getDocumentSource()Returns the document source.- Specified by:
getDocumentSourcein interfaceorg.apache.xerces.xni.XMLDocumentHandler
-