Packages

package vertica

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. Protected

Package Members

  1. package file

Type Members

  1. class ExternalOffsetVerticaFileBatcher[R] extends RecordBatcher[ExternalOffsetVerticaFileRecordBatch] with Logging

    A file based Vertica record batcher that generates a new file ID before starting a new batch from a given ID sequence, formats records using a given formatter and writes them to files.

    A file based Vertica record batcher that generates a new file ID before starting a new batch from a given ID sequence, formats records using a given formatter and writes them to files.

    A SEQUENCE is required for generating the _file_id foreign key values, create it as follows:

    CREATE SEQUENCE file_id_sequence;
    R

    Type of records written to files.

  2. case class ExternalOffsetVerticaFileRecordBatch(file: File, fileId: Long, recordRanges: Seq[StreamRange], copyStatementTemplate: String) extends FileRecordBatch with VerticaRecordBatch with Product with Serializable

    A file based Vertica record batch with an extra pre-generated file ID that is used as a foreign key when storing the records and offsets to separate tables.

  3. class ExternalOffsetVerticaFileStorage extends InDataOffsetBatchStorage[ExternalOffsetVerticaFileRecordBatch] with Logging

    A Vertica file storage implementation that loads data to some table and commits offsets to a separate dedicated offset table.

    A Vertica file storage implementation that loads data to some table and commits offsets to a separate dedicated offset table. The commit happens in a single transaction, offsets and data can be joined using a file ID that is stored in both the data table and the offset table. The offset table contains only ranges of offsets from each topic partition contained in the file. Its structure should look as follows (all names can be customized):

      CREATE TABLE file_offsets (
      _file_id INT NOT NULL,
      _consumer_group VARCHAR(128) NOT NULL,
      _topic VARCHAR(128) NOT NULL,
      _partition INT NOT NULL,
      _start_offset INT NOT NULL,
      _start_watermark TIMESTAMP NOT NULL,
      _end_offset INT NOT NULL,
      _end_watermark TIMESTAMP NOT NULL
    );

    Compared to the InRowOffsetVerticaFileStorage this implementation does not preserve individual row offsets, however it is much less expensive in terms of data usage in the licensing scheme, as the license is calculated based on the size of the data as it occupies converted to strings, ignoring compression and encoding. Thus while it does not cost much to store the topic name, partition and offset next to each row physically (this data compresses very well), it can be significant when auditing data usage for licensing.

  4. case class InRowOffsetVerticaFileRecordBatch(file: File, recordRanges: Seq[StreamRange], copyStatementTemplate: String) extends FileRecordBatch with VerticaRecordBatch with Product with Serializable
  5. class InRowOffsetVerticaFileRecordBatcher[R] extends FileRecordBatcher[R, InRowOffsetVerticaFileRecordBatch, VerticaFileBuilder[R]]

    A record batcher that passes records through a custom record formatter and forms batches by writing the resulting records to files using a provided file builder.

    A record batcher that passes records through a custom record formatter and forms batches by writing the resulting records to files using a provided file builder.

    R

    Type of records being written to files.

  6. class InRowOffsetVerticaFileStorage extends InDataOffsetBatchStorage[InRowOffsetVerticaFileRecordBatch] with Logging

    A Vertica storage implementation, stores offsets in rows of data.

    A Vertica storage implementation, stores offsets in rows of data. Queries Vertica upon initialization in order to retrieve committed stream positions.

    Users should keep in mind that the data usage in the licensing audit is calculated treating everything as a string, thus storing the topic, partition and offset next to each row might be very expensive licensing-wise. For a cheaper alternative see the ExternalOffsetVerticaFileStorage.

  7. sealed trait VerticaLoadMethod extends AnyRef
  8. trait VerticaRecordBatch extends RecordBatch

    A record batch that can be loaded to Vertica, implementers must define a COPY statement generator method.

Ungrouped