Package org.apache.uima.cas.impl
Class BinaryCasSerDes4.Serializer
java.lang.Object
org.apache.uima.cas.impl.BinaryCasSerDes4.Serializer
- Enclosing class:
BinaryCasSerDes4
Class instantiated once per serialization Multiple serializations in parallel supported, with
multiple instances of this
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ByteArrayOutputStream[]private final CASImplprivate final BinaryCasSerDesprivate final DataOutputStreamprivate final BinaryCasSerDes4.CompressLevelprivate final BinaryCasSerDes4.CompressStratprivate final DataOutputStreamprivate final CommonSerDesSequentialprivate final booleanprivate final DataOutputStream[]private final DataOutputStreamprivate final DataOutputStreamprivate final DataOutputStreamprivate final DataOutputStreamprivate final Obj2IntIdentityHashMap<TOP> convert between FSs and "sequential" numbers This is for compression efficiency and also is needed for backwards compatibility with v2 serialization forms, where index information was written using "sequential" numbers Note: This may be identity map, but may not in the case for V3 where some FSs are GC'd Contrast with fs2addr and addr2fs in csds - these use the pseudo v2 addresses as the intprivate final DataOutputStreamprivate intend of heap, in v2 pseudo-addr coordinates = addr of last + length of lastprivate intstart of heap, in v2 pseudo-addr coordinatesprivate final booleanprivate final booleanprivate final MarkerImplprivate booleanprivate final OptimizeStringsprivate TOPprivate final TOP[]For differencing when reading and writing.private final DataOutputStreamprivate final SerializationMeasuresprivate final DataOutputStreamprivate final DataOutputStreamprivate final DataOutputStreamprivate final DataOutputStreamprivate PositiveIntSetSet of FSes on which UimaSerializable _save_to_cas_data has already been called. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivateSerializer(CASImpl cas, DataOutputStream serializedOut, MarkerImpl mark, SerializationMeasures sm, BinaryCasSerDes4.CompressLevel compressLevel, BinaryCasSerDes4.CompressStrat compressStrategy, boolean isTsi) -
Method Summary
Modifier and TypeMethodDescriptionprivate voidMethod: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streamsprivate intcompressFsxPart(int[] fsIndexes, int fsNdxStart, CommonSerDesSequential csds) private intencodeIntSign(int v) private voidextractStrings(TOP fs) add strings to the optimizestrings object If delta, only process for fs's that are new; modified string values picked up when scanning FsChange itemsprivate voidFor delta, for each fsChange element, extract any stringsprivate intprivate intprivate intprivate booleanisNoPrevArrayValue(CommonArrayFS prevCommonArray) private voidForm 4 serialization is tied to the layout of V2 Feature Structures in heaps.private voidserializeArray(TOP fs) private intprivate voidserializeByKind(TOP fs, FeatureImpl feat) private voidprivate voidwriteDiff(int kind, int v, int prev) Encoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = deltaprivate voidwriteDouble(long raw) private voidwriteFloat(int raw) Need to support NAN sets, 0x7fc....private voidprivate voidwriteLong(long v, long prev) private voidString encoding Length = 0 - used for null, no offset written Length = 1 - used for "", no offset written Length > 0 (subtract 1): used for actual string length Length < 0 - use (-length) as slot index (minimum is 1, slot 0 is NULL) For length > 0, write also the offset.private voidWrite the compressed string table(s)private voidwriteUnsignedByte(DataOutputStream s, int v) private voidwriteVnumber(int kind, int v) private voidwriteVnumber(int kind, long v) private voidwriteVnumber(DataOutputStream s, int v) private voidwriteVnumber(DataOutputStream s, long v)
-
Field Details
-
serializedOut
-
baseCas
-
bcsd
-
mark
-
sm
-
baosZipSources
-
dosZipSources
-
heapStart
private int heapStartstart of heap, in v2 pseudo-addr coordinates -
heapEnd
private int heapEndend of heap, in v2 pseudo-addr coordinates = addr of last + length of last -
isDelta
private final boolean isDelta -
isTsi
private final boolean isTsi -
doMeasurement
private final boolean doMeasurement -
os
-
compressLevel
-
compressStrategy
-
prevFsByType
For differencing when reading and writing. Also used for arrays to difference the 0th element. -
prevFs
-
only1CommonString
private boolean only1CommonString -
byte_dos
-
typeCode_dos
-
strOffset_dos
-
strLength_dos
-
float_Mantissa_Sign_dos
-
float_Exponent_dos
-
double_Mantissa_Sign_dos
-
double_Exponent_dos
-
fsIndexes_dos
-
control_dos
-
strSeg_dos
-
csds
-
fs2seq
convert between FSs and "sequential" numbers This is for compression efficiency and also is needed for backwards compatibility with v2 serialization forms, where index information was written using "sequential" numbers Note: This may be identity map, but may not in the case for V3 where some FSs are GC'd Contrast with fs2addr and addr2fs in csds - these use the pseudo v2 addresses as the int -
uimaSerializableSavedToCas
Set of FSes on which UimaSerializable _save_to_cas_data has already been called.
-
-
Constructor Details
-
Serializer
private Serializer(CASImpl cas, DataOutputStream serializedOut, MarkerImpl mark, SerializationMeasures sm, BinaryCasSerDes4.CompressLevel compressLevel, BinaryCasSerDes4.CompressStrat compressStrategy, boolean isTsi) - Parameters:
cas- -serializedOut- -mark- -sm- -compressLevel- -compressStrategy- -
-
-
Method Details
-
serialize
Form 4 serialization is tied to the layout of V2 Feature Structures in heaps. It does not walk the indexes to serialize just those FSs that are reachable. For V3, it scans the CASImpl.id2fs information and serializes those (except those which have been GC'd). The seq numbers of the target incrementing sequentially will be different from the source id's if some FSs were GC'd. To determine for delta what new strings and new- Throws:
IOException
-
writeStringInfo
Write the compressed string table(s)- Throws:
IOException
-
writeFs
- Throws:
IOException
-
serializeIndexedFeatureStructures
- Throws:
IOException
-
compressFsxPart
private int compressFsxPart(int[] fsIndexes, int fsNdxStart, CommonSerDesSequential csds) throws IOException - Throws:
IOException
-
serializeArray
- Throws:
IOException
-
getPrevArray0HeapRef
private int getPrevArray0HeapRef() -
getPrevArray0Int
private int getPrevArray0Int() -
isNoPrevArrayValue
-
serializeByKind
- Throws:
IOException
-
serializeArrayLength
- Throws:
IOException
-
collectAndZip
Method: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streams- Throws:
IOException- passthru
-
writeLong
- Throws:
IOException
-
writeString
String encoding Length = 0 - used for null, no offset written Length = 1 - used for "", no offset written Length > 0 (subtract 1): used for actual string length Length < 0 - use (-length) as slot index (minimum is 1, slot 0 is NULL) For length > 0, write also the offset.- Throws:
IOException- passthru
-
writeFloat
Need to support NAN sets, 0x7fc.... for NAN 0xff8.... for NAN, negative infinity 0x7f8 for NAN, positive infinity Because 0 occurs frequently, we reserve exp of 0 for the value 0- Parameters:
raw- the number to write- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeUnsignedByte
- Throws:
IOException
-
writeDouble
- Throws:
IOException
-
encodeIntSign
private int encodeIntSign(int v) -
writeDiff
Encoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = delta- Parameters:
kind- the kind of sloti- runs from iHeap + 3 to end of array- Throws:
IOException- passthru
-
extractStrings
add strings to the optimizestrings object If delta, only process for fs's that are new; modified string values picked up when scanning FsChange items- Parameters:
fs- feature structure
-
extractStringsFromModifications
For delta, for each fsChange element, extract any strings- Parameters:
fsChange-
-
fs2seq
-