Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package adform
    Definition Classes
    com
  • package streamloader

    The entry point of the stream loader library is the StreamLoader class, which requires a $KafkaSource and a $Sink.

    The entry point of the stream loader library is the StreamLoader class, which requires a $KafkaSource and a $Sink. Once started it will subscribe to the provided topics and will start polling and sinking records. The sink has to be able to persist records and to look up committed offsets (technically this is optional, but without it there would be no way to provide any delivery guarantees). A large class of sinks are batch based, implemented as $RecordBatchingSink. This sink accumulate batches of records using some $RecordBatcher and once ready, stores them to some underlying $RecordBatchStorage. A common type of batch is file based, i.e. a batcher might write records to a temporary file and once the file is full the sink commits the file to some underlying storage, such as a database or a distributed file system like HDFS.

    A sketch of the class hierarchy illustrating the main classes and interfaces can be seen below.



    For concrete storage implementations see the clickhouse, hadoop, s3 and vertica packages. They also contain more file builder implementations than just the $CsvFileBuilder included in the core library.

    Definition Classes
    adform
  • package sink
    Definition Classes
    streamloader
  • package batch
    Definition Classes
    sink
  • package storage
    Definition Classes
    batch
  • InDataOffsetBatchStorage
  • RecordBatchStorage
  • StagedOffsetCommit
  • TwoPhaseCommitBatchStorage
  • TwoPhaseCommitMetadata

package storage

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. Protected

Type Members

  1. abstract class InDataOffsetBatchStorage[-B <: RecordBatch] extends RecordBatchStorage[B] with Logging

    Record batch storage that commits offsets atomically together with data and also stores them to Kafka on a best effort basis.

    Record batch storage that commits offsets atomically together with data and also stores them to Kafka on a best effort basis. On lookup offsets are retrieved from the storage, the offsets in Kafka are not used. No recovery is needed in this case as batch storing is assumed to be atomic.

  2. trait RecordBatchStorage[-B <: RecordBatch] extends AnyRef

    A record batch storage that stores batches to some underlying persistent storage and commits offsets, preferably atomically, if exactly-once guarantees are required.

    A record batch storage that stores batches to some underlying persistent storage and commits offsets, preferably atomically, if exactly-once guarantees are required.

    B

    Type of record batches that the storage accepts.

  3. case class StagedOffsetCommit[S](staging: S, start: StreamPosition, end: StreamPosition)(implicit evidence$1: JsonSerializer[S]) extends Product with Serializable

    Two-phase commit staging information, part of the kafka offset commit metadata.

    Two-phase commit staging information, part of the kafka offset commit metadata. The staging information contains the staged record range start and end positions and some batch dependent storage information (e.g. staged temp file path), which must be JSON serializable.

  4. abstract class TwoPhaseCommitBatchStorage[-B <: RecordBatch, S] extends RecordBatchStorage[B] with Logging with Metrics

    An abstract batch storage that stores batches and commits offsets to Kafka in a two phase transaction.

    An abstract batch storage that stores batches and commits offsets to Kafka in a two phase transaction. Committed stream positions are looked up in Kafka. The batch commit algorithm proceeds in phases as follows:

    1. stage the batch to storage (e.g. upload file to a temporary path),
    2. stage offsets to Kafka by performing an offset commit without modifying the actual offset, instead saving the new offset and the staged batch information (e.g. file path of the temporary uploaded file) serialized as compressed and base64 encoded JSON to the offset commit metadata field,
    3. store the staged batch (e.g. move the temporary file to the final destination)
    4. commit new offsets to Kafka and clear the staging information from the offset metadata.

    If committing fails in the first two stages the recovery will revert it, if it fails afterwards, recovery will complete the transaction.

    Implementers need to define the batch staging and storing.

    B

    Type of record batches.

    S

    Type of the batch staging information, must be JSON serializable.

  5. case class TwoPhaseCommitMetadata[S](watermark: Timestamp, stagedOffsetCommit: Option[StagedOffsetCommit[S]])(implicit evidence$2: JsonSerializer[S]) extends Product with Serializable

    Kafka commit metadata used in the two-phase commit implementation, stored for each topic partition separately.

    Kafka commit metadata used in the two-phase commit implementation, stored for each topic partition separately. Consists of the currently stored watermark and an optional staging part, if a commit is in progress.

Value Members

  1. object TwoPhaseCommitMetadata extends Logging with Serializable

Ungrouped