sparrow-ipc 0.2.0
Loading...
Searching...
No Matches
sparrow_ipc::chunk_serializer Class Reference

A serializer that writes record batches to chunked memory streams. More...

#include <chunk_memory_serializer.hpp>

Public Member Functions

 chunk_serializer (chunked_memory_output_stream< std::vector< std::vector< uint8_t > > > &stream, std::optional< CompressionType > compression=std::nullopt)
 Constructs a chunk serializer with a reference to a chunked memory output stream.
 
void write (const sparrow::record_batch &rb)
 Writes a single record batch to the chunked stream.
 
template<std::ranges::input_range R>
requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch>
void write (const R &record_batches)
 Writes a range of record batches to the chunked stream.
 
chunk_serializeroperator<< (const sparrow::record_batch &rb)
 
template<std::ranges::input_range R>
requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch>
chunk_serializeroperator<< (const R &record_batches)
 
void end ()
 Finalizes the chunk serialization by writing an end-of-stream marker.
 

Detailed Description

A serializer that writes record batches to chunked memory streams.

The chunk_serializer class provides functionality to serialize Apache Arrow record batches into separate memory chunks. Each record batch (and the schema) is written as an independent chunk in the output stream, making it suitable for scenarios where data needs to be processed or transmitted in discrete units.

The serializer maintains schema consistency across all record batches:

  • The schema is written once as the first chunk when the first record batch is processed
  • All subsequent record batches must have the same schema
  • Each record batch is serialized into its own independent memory chunk
Note
Once end() is called, no further record batches can be written to this serializer.

Definition at line 36 of file chunk_memory_serializer.hpp.

Constructor & Destructor Documentation

◆ chunk_serializer()

sparrow_ipc::chunk_serializer::chunk_serializer ( chunked_memory_output_stream< std::vector< std::vector< uint8_t > > > & stream,
std::optional< CompressionType > compression = std::nullopt )

Constructs a chunk serializer with a reference to a chunked memory output stream.

Parameters
streamReference to a chunked memory output stream that will receive the serialized chunks
compressionOptional: The compression type to use for record batch bodies.
Here is the caller graph for this function:

Member Function Documentation

◆ end()

void sparrow_ipc::chunk_serializer::end ( )

Finalizes the chunk serialization by writing an end-of-stream marker.

This method signals the end of the serialization process. After calling this method, no further record batches can be written to this serializer.

Exceptions
std::runtime_errorif attempting to write after this method has been called

◆ operator<<() [1/2]

template<std::ranges::input_range R>
requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch>
chunk_serializer & sparrow_ipc::chunk_serializer::operator<< ( const R & record_batches)

Definition at line 182 of file chunk_memory_serializer.hpp.

Here is the call graph for this function:

◆ operator<<() [2/2]

chunk_serializer & sparrow_ipc::chunk_serializer::operator<< ( const sparrow::record_batch & rb)
inline
Examples
/home/runner/work/sparrow-ipc/sparrow-ipc/include/sparrow_ipc/chunk_memory_serializer.hpp.

Definition at line 174 of file chunk_memory_serializer.hpp.

Here is the call graph for this function:
Here is the caller graph for this function:

◆ write() [1/2]

template<std::ranges::input_range R>
requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch>
void sparrow_ipc::chunk_serializer::write ( const R & record_batches)

Writes a range of record batches to the chunked stream.

This template method efficiently serializes multiple record batches to the chunked output stream. If this is the first write operation, the schema is automatically serialized first as a separate chunk. Each record batch is then serialized into its own independent chunk.

Template Parameters
RThe range type containing record batches (must satisfy std::ranges::input_range)
Parameters
record_batchesA range of record batches to serialize
Exceptions
std::runtime_errorif the serializer has been ended via end()
std::invalid_argumentif any record batch schema doesn't match previously written batches

Definition at line 139 of file chunk_memory_serializer.hpp.

Here is the call graph for this function:

◆ write() [2/2]

void sparrow_ipc::chunk_serializer::write ( const sparrow::record_batch & rb)

Writes a single record batch to the chunked stream.

This method serializes a record batch into the chunked output stream. If this is the first record batch written, the schema is automatically serialized first as a separate chunk.

Parameters
rbThe record batch to serialize
Exceptions
std::runtime_errorif the serializer has been ended via end()
std::invalid_argumentif the record batch schema doesn't match previously written batches
Examples
/home/runner/work/sparrow-ipc/sparrow-ipc/include/sparrow_ipc/chunk_memory_serializer.hpp.
Here is the call graph for this function:
Here is the caller graph for this function:

The documentation for this class was generated from the following file: