Class ArtifactProducer
java.lang.Object
org.apache.uima.collection.impl.cpm.engine.ArtifactProducer
- All Implemented Interfaces:
Runnable
Component responsible for continuously filling a work queue with bundles containing Cas'es. The
queue is shared with a Processing Pipeline that consumes bundles of Cas. As soon as the the
bundle is removed from the queue, this component fetches data from configured Collection Reader
and enques it onto the queue. This component facilitates asynchronous reading and processing of
CAS by seperate threads running in the CPE.
When end of processing is reached due to CPM shutdown or max number of entities are processed a
special token, called EOFToken is placed onto a queue. It marks end of processing for Processing
Units. No more data is expected to be placed on the work queue. The Processing Threads upon
seeing the EOFToken are expected to complete processing and do necessary cleanup.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate ArrayListThe callback listeners.private CAS[]The cas list.private CPECasPoolThe cas pool.private BaseCollectionReaderThe collection reader.private CPMEngineThe cpm.private MapThe cpm stat table.private longThe entity count.private ProcessTraceThe global shared process trace.private booleanThe is running.private String[]The last doc id.private longThe max to process.private intThe reader fetch size.intThe thread state.private HashtableThe timedout docs.private UimaTimerThe timer.private longThe total fetch time.private BoundedWorkQueueThe work queue. -
Constructor Summary
ConstructorsConstructorDescriptionArtifactProducer(CPMEngine acpm) Instantiates and initializes this instance.ArtifactProducer(CPMEngine acpm, CPECasPool aPool) Construct instance of this class with a reference to the cpe engine and a pool of cas'es. -
Method Summary
Modifier and TypeMethodDescriptionvoidcleanup()Null out fields of this object.private booleanDetermines if the CPM has processed configured number of entities.voidFills the queue up to capacity.longReturns total time spent when fetching entities from a CollectionReader.Gets the last doc id.voidinvalidate(CAS[] aCasList) Invalidate.booleanChecks if is running.private voidnotifyListeners(CAS aCas, Exception anException) Notify registered callback listeners of a given exception.private voidPlace terminating EOFToken into a Work Queue.private Object[]readNext(int fetchSize) Reads next set of entities from the CollectionReader.voidrun()Runs this thread until the CPM halts or the CollectionReader has no more entities.voidsetCollectionReader(BaseCollectionReader aCollectionReader) Assign CollectionReader to be used for reading.voidsetCPMStatTable(Map aStatTable) Add table that will contain statistics gathered while reading entities from a Collection This table is used for non-uima reports.voidsetNumEntitiesToProcess(long aNumToProcess) Assign total number of entities to process.voidsetProcessTrace(ProcessTrace aProcTrace) Sets the process trace.voidsetUimaTimer(UimaTimer aTimer) Plug in Custom Timer to time events.voidsetWorkQueue(BoundedWorkQueue aQueue) Assigns a queue where the artifacts produced by this component will be deposited.
-
Field Details
-
threadState
public int threadStateThe thread state. -
casPool
The cas pool. -
workQueue
The work queue. -
collectionReader
The collection reader. -
readerFetchSize
private int readerFetchSizeThe reader fetch size. -
casList
The cas list. -
entityCount
private long entityCountThe entity count. -
maxToProcess
private long maxToProcessThe max to process. -
cpm
The cpm. -
cpmStatTable
The cpm stat table. -
lastDocId
The last doc id. -
totalFetchTime
private long totalFetchTimeThe total fetch time. -
timer
The timer. -
callbackListeners
The callback listeners. -
timedoutDocs
The timedout docs. -
isRunning
private boolean isRunningThe is running.
-
-
Constructor Details
-
ArtifactProducer
Instantiates and initializes this instance.- Parameters:
acpm- the acpm
-
ArtifactProducer
Construct instance of this class with a reference to the cpe engine and a pool of cas'es.- Parameters:
acpm- - reference to the cpeaPool- - pool of cases
-
-
Method Details
-
isRunning
public boolean isRunning()Checks if is running.- Returns:
- true, if is running
-
setUimaTimer
Plug in Custom Timer to time events.- Parameters:
aTimer- - custom timer
-
setProcessTrace
Sets the process trace.- Parameters:
aProcTrace- the new process trace
-
getCollectionReaderTotalFetchTime
public long getCollectionReaderTotalFetchTime()Returns total time spent when fetching entities from a CollectionReader. This provides a way of gauging throughput of a particular CR.- Returns:
- total time spent when fetching entities. -1 when the fetch time is unknown.
-
cleanup
public void cleanup()Null out fields of this object. Call this only when this object is no longer needed. -
setNumEntitiesToProcess
public void setNumEntitiesToProcess(long aNumToProcess) Assign total number of entities to process.- Parameters:
aNumToProcess- - number of entities to read from the Collection Reader
-
setCollectionReader
Assign CollectionReader to be used for reading.- Parameters:
aCollectionReader- - collection reader as source of data
-
setWorkQueue
Assigns a queue where the artifacts produced by this component will be deposited.- Parameters:
aQueue- - queue for the artifacts this class is producing
-
setCPMStatTable
Add table that will contain statistics gathered while reading entities from a Collection This table is used for non-uima reports.- Parameters:
aStatTable- the new CPM stat table
-
endOfProcessingReached
private boolean endOfProcessingReached()Determines if the CPM has processed configured number of entities. Called after each fetch from the Collection Reader.- Returns:
- true - all configurted entities processed, false otherwise
-
fillQueue
Fills the queue up to capacity. This is called before activating ProcessingPipeline as means of optimizing processing. When pipelines start up there are already entities in the work queue to process.- Throws:
Exception- the exception
-
readNext
Reads next set of entities from the CollectionReader. This method may return more than one Cas at a time.- Parameters:
fetchSize- the fetch size- Returns:
- - The Object returned from the method depends on the type of the CollectionReader. Either CASData[] or CASObject[] initialized with document metadata and content is returned. If the CollectionReader has no more entities (EOF), null is returned.
- Throws:
IOException- - error while reading corpusCollectionException- -
-
run
public void run()Runs this thread until the CPM halts or the CollectionReader has no more entities. It continuously fills the work queue with entities returned by the CollectionReader. -
notifyListeners
Notify registered callback listeners of a given exception.- Parameters:
aCas- the a casanException- - exception to propagate to callback listeners
-
placeEOFToken
private void placeEOFToken()Place terminating EOFToken into a Work Queue. Any thread reading this token from the queue is responsible for terminating itself. -
getLastDocId
Gets the last doc id.- Returns:
- the last doc id
-
invalidate
Invalidate.- Parameters:
aCasList- the a cas list
-