Package org.apache.uima.tools.components
Class XmlDetagger
java.lang.Object
org.apache.uima.analysis_component.AnalysisComponent_ImplBase
org.apache.uima.analysis_component.Annotator_ImplBase
org.apache.uima.analysis_component.CasAnnotator_ImplBase
org.apache.uima.tools.components.XmlDetagger
- All Implemented Interfaces:
AnalysisComponent
A multi-sofa annotator that does XML detagging. Reads XML data from the input Sofa (named
"xmlDocument"); this data can be stored in the CAS as a string or array, or it can be a URI to a
remote file. The XML is parsed using the JVM's default parser, and the plain-text content is
written to a new sofa called "plainTextDocument".
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate Stringstatic final StringName of optional configuration parameter that contains the name of an XML tag that appears in the input file.private SAXParserFactoryprivate Type -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic AnalysisEngineDescriptionParses and returns the descriptor for this Analysis Gnein.static URLvoidinitialize(UimaContext aContext) Performs any startup tasks required by this component.voidInputs a CAS to the AnalysisComponent.voidtypeSystemInit(TypeSystem aTypeSystem) Informs this annotator that the CAS TypeSystem has changed.Methods inherited from class org.apache.uima.analysis_component.CasAnnotator_ImplBase
getRequiredCasInterface, processMethods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase
getCasInstancesRequired, hasNext, nextMethods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase
batchProcessComplete, collectionProcessComplete, destroy, getContext, getLogger, getResultSpecification, reconfigure, setResultSpecification
-
Field Details
-
PARAM_TEXT_TAG
Name of optional configuration parameter that contains the name of an XML tag that appears in the input file. Only text that falls within this XML tag will be considered part of the "document" that it is added to the CAS by this CAS Initializer. If not specified, the entire file will be considered the document.- See Also:
-
parserFactory
-
sourceDocInfoType
-
mXmlTagContainingText
-
-
Constructor Details
-
XmlDetagger
public XmlDetagger()
-
-
Method Details
-
initialize
Description copied from interface:AnalysisComponentPerforms any startup tasks required by this component. The framework calls this method only once, just after the AnalysisComponent has been instantiated.The framework supplies this AnalysisComponent with a reference to the
UimaContextthat it will use, for example to access configuration settings or resources. This AnalysisComponent should store a reference to its theUimaContextfor later use.- Specified by:
initializein interfaceAnalysisComponent- Overrides:
initializein classAnalysisComponent_ImplBase- Parameters:
aContext- Provides access to services and resources managed by the framework. This includes configuration parameters, logging, and access to external resources.- Throws:
ResourceInitializationException- if this AnalysisComponent cannot initialize successfully.
-
typeSystemInit
Description copied from class:CasAnnotator_ImplBaseInforms this annotator that the CAS TypeSystem has changed. The Analysis Engine calls this from PrimitiveAnalysisEngine_impl which-calls CasAnnotator_ImplBase.process which-calls checkTypeSystemChangeIn this method, the Annotator should use the
TypeSystemto resolve the names of Type and Features to the actualTypeandFeatureobjects, which can then be used during processing.- Overrides:
typeSystemInitin classCasAnnotator_ImplBase- Parameters:
aTypeSystem- the new type system to use as input to your initialization- Throws:
AnalysisEngineProcessException- if the provided type system is missing types or features required by this annotator
-
process
Description copied from class:CasAnnotator_ImplBaseInputs a CAS to the AnalysisComponent. This method should be overriden by subclasses to perform analysis of the CAS.- Specified by:
processin classCasAnnotator_ImplBase- Parameters:
aCAS- A CAS that this AnalysisComponent should process.- Throws:
AnalysisEngineProcessException- if a problem occurs during processing
-
getDescription
Parses and returns the descriptor for this Analysis Gnein. The descriptor is stored in the uima-core.jar file and located using the ClassLoader.- Returns:
- an object containing all of the information parsed from the descriptor.
- Throws:
InvalidXMLException- if the descriptor is invalid or missing
-
getDescriptorURL
-