com.g11ntoolkit.tokfile
Class LTXMLTFParser

java.lang.Object
  |
  +--org.xml.sax.helpers.DefaultHandler
        |
        +--com.g11ntoolkit.tokfile.LTXMLTFParser
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class LTXMLTFParser
extends org.xml.sax.helpers.DefaultHandler

Parses a Tokenized File in XML file format.

Uses SAX and processes the callbacks in the parsing lifecycle.

A G11NToolKit object is created or modified for each element in the file. When the parse is complete, there will be a TokFile object in memory with all of the appropriate objects within it. It will be ready to write out or process further.

Version:
2005/06/30
Author:
Bill Rich, Wilandra Consulting LLC
Copyright © 2002-2005, Wilandra Consulting LLC. All rights reserved.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

See License Agreement.


Field Summary
(package private)  TokContext context
          Contains the context for each of the string blocks.
(package private)  boolean inCon
          Indicates that a con entry in the context table is being processed.
(package private)  boolean inContextTable
          Indicates that the context table section of the file is being processed.
(package private)  boolean inSource
          Indicates that the source section of the file is being processed.
(package private)  boolean inTokFile
          Indicates that a string file is being processed.
private static java.util.logging.Logger log
          The log used for all messages from this class.
protected static java.util.ResourceBundle mrb
          Messages used by the tools and classes.
(package private)  java.io.OutputStreamWriter osw
          The output writer to use when writing out the file, if needed.
(package private)  java.lang.String sourceString
          Contains a string of characters from the source file that were not tokenized.
(package private)  Token token
          Contains the tokens for each of the string blocks.
(package private)  TokFile tokfile
          Contains the object representing the String File being processed.
protected static java.util.ResourceBundle vrb
          Constants and variables used by the tools and classes.
private static java.util.ResourceBundle xrb
          Constants, messages, and variables used by the tools and classes for XML processing.
 
Constructor Summary
LTXMLTFParser()
           
 
Method Summary
 void characters(char[] ch, int start, int length)
          Processes character data (within an element).
 void endDocument()
          Processes the end of a Document parse.
 void endElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String rawName)
          Indicates the end of an element.
 void endPrefixMapping(java.lang.String prefix)
          Processes the end of a prefix mapping.
 TokFile getTokFile()
          Returns the TokFile object.
 void ignorableWhitespace(char[] ch, int start, int length)
          Processes whitespace that can be ignored in the originating document.
 void processingInstruction(java.lang.String target, java.lang.String data)
          Processes a processing instruction (other than the XML declaration) when it is encountered.
 void setOSW(java.io.OutputStreamWriter anOSW)
          Sets output writer.
 void setTokFile(TokFile aTokFile)
          Set the TokFile object to the specified TokFile.
protected  org.xml.sax.Attributes sortAttributes(org.xml.sax.Attributes attrs)
          Returns a sorted list of attributes.
 void startDocument()
          Processes the start of a Document parse.
 void startElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String rawName, org.xml.sax.Attributes attrs)
          Processes the occurrence of an actual element.
 void startPrefixMapping(java.lang.String prefix, java.lang.String uri)
          Processes the beginning of an XML Namespace prefix mapping.
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, resolveEntity, setDocumentLocator, skippedEntity, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

private static java.util.logging.Logger log
The log used for all messages from this class.


mrb

protected static java.util.ResourceBundle mrb
Messages used by the tools and classes.


vrb

protected static java.util.ResourceBundle vrb
Constants and variables used by the tools and classes.


xrb

private static java.util.ResourceBundle xrb
Constants, messages, and variables used by the tools and classes for XML processing.


tokfile

TokFile tokfile
Contains the object representing the String File being processed.


token

Token token
Contains the tokens for each of the string blocks.


sourceString

java.lang.String sourceString
Contains a string of characters from the source file that were not tokenized.


context

TokContext context
Contains the context for each of the string blocks.


inTokFile

boolean inTokFile
Indicates that a string file is being processed.


inSource

boolean inSource
Indicates that the source section of the file is being processed.


inContextTable

boolean inContextTable
Indicates that the context table section of the file is being processed.


inCon

boolean inCon
Indicates that a con entry in the context table is being processed.


osw

java.io.OutputStreamWriter osw
The output writer to use when writing out the file, if needed.

The output writer can be set when it is needed. If the resulting object structure will not be written to a file then this never needs to be set.

Constructor Detail

LTXMLTFParser

public LTXMLTFParser()
Method Detail

setOSW

public void setOSW(java.io.OutputStreamWriter anOSW)
Sets output writer.

Parameters:
anOSW - an OutputStreamWriter specifying the output writer to use

setTokFile

public void setTokFile(TokFile aTokFile)
Set the TokFile object to the specified TokFile.

Parameters:
aTokFile - a TokFile specifying the TokFile object to use as the target
See Also:
getTokFile()

getTokFile

public TokFile getTokFile()
Returns the TokFile object.

Returns:
a TokFile object containing the parsed information
See Also:
setTokFile(com.g11ntoolkit.tokfile.TokFile)

processingInstruction

public void processingInstruction(java.lang.String target,
                                  java.lang.String data)
                           throws org.xml.sax.SAXException
Processes a processing instruction (other than the XML declaration) when it is encountered.

Specified by:
processingInstruction in interface org.xml.sax.ContentHandler
Overrides:
processingInstruction in class org.xml.sax.helpers.DefaultHandler
Parameters:
target - a String specifying the target of the PI
data - a String containing all data sent to the PI.

This typically looks like one or more attribute value pairs.

Throws:
org.xml.sax.SAXException - when things go wrong

startPrefixMapping

public void startPrefixMapping(java.lang.String prefix,
                               java.lang.String uri)
                        throws org.xml.sax.SAXException
Processes the beginning of an XML Namespace prefix mapping.

Although this typically occurs within the root element of an XML document, it can occur at any point within the document. Note that a prefix mapping on an element triggers this callback before the callback for the actual element itself (startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)) occurs.

Specified by:
startPrefixMapping in interface org.xml.sax.ContentHandler
Overrides:
startPrefixMapping in class org.xml.sax.helpers.DefaultHandler
Parameters:
prefix - a String specifying the prefix used for the namespace being reported
uri - a String specifying the URI for the namespace being reported
Throws:
org.xml.sax.SAXException - when things go wrong

endPrefixMapping

public void endPrefixMapping(java.lang.String prefix)
                      throws org.xml.sax.SAXException
Processes the end of a prefix mapping.

This is when the namespace reported in a startPrefixMapping(java.lang.String, java.lang.String) callback is no longer available.

Specified by:
endPrefixMapping in interface org.xml.sax.ContentHandler
Overrides:
endPrefixMapping in class org.xml.sax.helpers.DefaultHandler
Parameters:
prefix - a String specifying the prefix of the namespace being reported
Throws:
org.xml.sax.SAXException - when things go wrong

startDocument

public void startDocument()
                   throws org.xml.sax.SAXException
Processes the start of a Document parse.

Called once when the document is first opened. Precedes all callbacks in all SAX Handlers.

Specified by:
startDocument in interface org.xml.sax.ContentHandler
Overrides:
startDocument in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException - when things go wrong

endDocument

public void endDocument()
                 throws org.xml.sax.SAXException
Processes the end of a Document parse.

Called once when the document is closed. This occurs after all callbacks in all SAX Handlers.

Specified by:
endDocument in interface org.xml.sax.ContentHandler
Overrides:
endDocument in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException - when things go wrong

startElement

public void startElement(java.lang.String namespaceURI,
                         java.lang.String localName,
                         java.lang.String rawName,
                         org.xml.sax.Attributes attrs)
                  throws org.xml.sax.SAXException
Processes the occurrence of an actual element.

Includes the element's attributes, with the exception of XML vocabulary specific attributes, such as xmlns:[namespace prefix] and xsi:schemaLocation.

Code is added to this method for each element that has specific processing needs when it starts. Other code to handle the end of an element is in the endElement(java.lang.String, java.lang.String, java.lang.String) method.

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Parameters:
namespaceURI - a String specifying the namespace URI this element is associated with, or an empty String
localName - a String specifying the name of the element (with no namespace prefix, if one is present)
rawName - a String specifying the XML 1.0 version of element name: [namespace prefix]:[localName]
attrs - an Attributes object containing a list of attributes for this element
Throws:
org.xml.sax.SAXException - when things go wrong

endElement

public void endElement(java.lang.String namespaceURI,
                       java.lang.String localName,
                       java.lang.String rawName)
                throws org.xml.sax.SAXException
Indicates the end of an element.

Shows that the </[element name]> tag has been reached. Note that the parser does not distinguish between empty elements and non-empty elements, so this occurs uniformly.

Code is added to this method for each element that has specific processing needs when it ends. Other code to handle the start of an element is in the startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes) method.

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler
Parameters:
namespaceURI - a String specifying the namespace URI this element is associated with, or an empty String
localName - a String specifying the name of the element (with no namespace prefix, if one is present)
rawName - a String specifying the XML 1.0 version of element name: [namespace prefix]:[localName]
Throws:
org.xml.sax.SAXException - when things go wrong

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Processes character data (within an element).

We are only interested in character data if it is for a String or Comment element and the element is contained in an StrBlock element.

Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler
Parameters:
ch - a char[] specifying a character array that contains the character data
start - an int specifying the index in the array where the data starts
length - an int specifying the length of the string
Throws:
org.xml.sax.SAXException - when things go wrong

ignorableWhitespace

public void ignorableWhitespace(char[] ch,
                                int start,
                                int length)
                         throws org.xml.sax.SAXException
Processes whitespace that can be ignored in the originating document.

Typically invoked only when validation is occurring in the parsing process.

Specified by:
ignorableWhitespace in interface org.xml.sax.ContentHandler
Overrides:
ignorableWhitespace in class org.xml.sax.helpers.DefaultHandler
Parameters:
ch - a char[] specifying a character array that contains the character data
start - an int specifying the index in the array where the data starts
length - an int specifying the length of the ignorable white space
Throws:
org.xml.sax.SAXException - when things go wrong

sortAttributes

protected org.xml.sax.Attributes sortAttributes(org.xml.sax.Attributes attrs)
Returns a sorted list of attributes.

Parameters:
attrs - an Attributes object specifying the attributes to sort
Returns:
an LTSAXAttributes object containing the sorted attributes