Packages

package file

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. Protected

Type Members

  1. abstract class BaseFileBuilder[-R] extends FileBuilder[R]

    Base file builder implementation that provides record counting and basic clean up.

    Base file builder implementation that provides record counting and basic clean up.

    R

    type of the records being added.

  2. sealed trait Compression extends AnyRef

    Common compression type enumeration.

  3. class CountingOutputStream extends OutputStream

    An OutputStream wrapper that also counts the number of bytes written.

  4. trait FileBuilder[-R] extends AnyRef

    A data file builder that keeps adding records and returns the resulting file after flushing to disk.

    A data file builder that keeps adding records and returns the resulting file after flushing to disk.

    R

    type of the records being added.

  5. trait FileBuilderFactory[-R, FB <: FileBuilder[R]] extends AnyRef

    A FileBuilder instance producer.

    A FileBuilder instance producer.

    R

    type of the records written to files being built.

    FB

    type of FileBuilder instances produced.

  6. trait FileCommitStrategy extends AnyRef

    A strategy for determining when files should be closed and committed to storage.

  7. trait FilePathFormatter[-P] extends AnyRef

    Base trait used to construct file paths when storing files to persistent storages.

  8. trait FileRecordBatch extends RecordBatch

    Base trait for file based record batches.

  9. abstract class FileRecordBatcher[R, +B <: FileRecordBatch, FB <: FileBuilder[R]] extends RecordBatcher[B]

    A record batcher that passes records through a custom record formatter and forms batches by writing the formatted records to files using a provided file builder.

    A record batcher that passes records through a custom record formatter and forms batches by writing the formatted records to files using a provided file builder.

    R

    Type of records being written to files.

    B

    Type of record batches being built.

    FB

    Type of file builder being used.

  10. case class FileStats(fileOpenDuration: Duration, fileSize: Long, recordsWritten: Long) extends Product with Serializable
  11. trait MultiFileCommitStrategy extends AnyRef

    Trait for defining a strategy for completing a multi-file batch.

  12. case class PartitionedFileRecordBatch[P, +B <: FileRecordBatch](partitionBatches: Map[P, B], recordRanges: Seq[StreamRange]) extends RecordBatch with Product with Serializable

    A record batch that is partitioned by some value, e.g.

    A record batch that is partitioned by some value, e.g. by date.

    P

    Type of the partitioning information, e.g. date or a tuple of client/country, etc.

    B

    Type of the file record batches.

    partitionBatches

    Mapping of file record batch per partition.

    recordRanges

    The overall record ranges contained in the batch.

  13. class PartitioningFileRecordBatcher[P, R] extends RecordBatcher[PartitionedFileRecordBatch[P, SingleFileRecordBatch]]

    A record batcher that distributes records into user defined partitions using a given partitioner and writes them to separate files per partition.

    A record batcher that distributes records into user defined partitions using a given partitioner and writes them to separate files per partition.

    P

    Type of the partition values.

    R

    Type of formatted records.

  14. case class SingleFileRecordBatch(file: File, recordRanges: Seq[StreamRange]) extends FileRecordBatch with Product with Serializable
  15. class StreamFileBuilder[-R] extends BaseFileBuilder[R] with Logging

    A file builder based on FileOutputStream.

    A file builder based on FileOutputStream.

    R

    type of the records written to files being built.

  16. class TimePartitioningFilePathFormatter[P] extends FilePathFormatter[P]

    Formats file paths placing them into time-based directories constructed by formatting the partition using a given time formatter pattern, e.g.

    Formats file paths placing them into time-based directories constructed by formatting the partition using a given time formatter pattern, e.g. "/dt=2020-06-01/" etc. The filename itself is a UUID based on the hash of the ranges contained, for reproducibility.

Ungrouped