sparrow-ipc 0.2.0
Loading...
Searching...
No Matches
sparrow_ipc::stream_file_serializer Class Reference

A class for serializing Apache Arrow record batches to the IPC file format. More...

#include <stream_file_serializer.hpp>

Collaboration diagram for sparrow_ipc::stream_file_serializer:
[legend]

Public Member Functions

template<writable_stream TStream>
 stream_file_serializer (TStream &stream, std::optional< CompressionType > compression=std::nullopt)
 Constructs a stream_file_serializer object with a reference to a stream.
 
 ~stream_file_serializer ()
 Destructor for the stream_file_serializer.
 
void write (const sparrow::record_batch &rb)
 Writes a single record batch to the file.
 
template<std::ranges::input_range R>
requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch>
void write (const R &record_batches)
 Writes a collection of record batches to the file.
 
stream_file_serializeroperator<< (const sparrow::record_batch &rb)
 
template<std::ranges::input_range R>
requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch>
stream_file_serializeroperator<< (const R &record_batches)
 
stream_file_serializeroperator<< (stream_file_serializer &(*manip)(stream_file_serializer &))
 
void end ()
 Finalizes the file serialization by writing footer and trailing magic bytes.
 

Public Attributes

bool m_header_written {false}
 
bool m_schema_received {false}
 
std::optional< sparrow::record_batch > m_first_record_batch
 
std::vector< sparrow::data_type > m_dtypes
 
any_output_stream m_stream
 
bool m_ended {false}
 
std::optional< CompressionTypem_compression
 
std::vector< record_batch_blockm_record_batch_blocks
 

Detailed Description

A class for serializing Apache Arrow record batches to the IPC file format.

The stream_file_serializer class provides functionality to serialize single or multiple record batches into the Arrow IPC file format suitable for storage. It ensures schema consistency across multiple record batches and optimizes memory allocation by pre-calculating required buffer sizes.

The stream_file_serializer follows the Arrow IPC file format specification:

  • File header magic bytes (ARROW1 + padding)
  • Stream format data (schema + record batches + end-of-stream marker)
  • Footer (FlatBuffer containing schema and empty record batch blocks)
  • Footer size (int32)
  • Trailing magic bytes (ARROW1)

The class validates that all record batches have consistent schemas and throws std::invalid_argument if inconsistencies are detected.

Note
Unlike the stream serializer, the file serializer automatically writes the complete file format (including header and footer) when end() is called or when the destructor is invoked.

Definition at line 91 of file stream_file_serializer.hpp.

Constructor & Destructor Documentation

◆ stream_file_serializer()

template<writable_stream TStream>
sparrow_ipc::stream_file_serializer::stream_file_serializer ( TStream & stream,
std::optional< CompressionType > compression = std::nullopt )
inline

Constructs a stream_file_serializer object with a reference to a stream.

Template Parameters
TStreamThe type of the stream to be used for serialization.
Parameters
streamReference to the stream object that will be used for serialization operations. The serializer stores a pointer to this stream for later use.
compressionOptional compression type to apply to record batch bodies.

Definition at line 104 of file stream_file_serializer.hpp.

Here is the caller graph for this function:

◆ ~stream_file_serializer()

sparrow_ipc::stream_file_serializer::~stream_file_serializer ( )

Destructor for the stream_file_serializer.

Ensures proper cleanup by calling end() if the serializer has not been explicitly ended. This guarantees that the complete file format (including footer and trailing magic bytes) is written before the object is destroyed.

Member Function Documentation

◆ end()

void sparrow_ipc::stream_file_serializer::end ( )

Finalizes the file serialization by writing footer and trailing magic bytes.

This method completes the Arrow IPC file format by:

  1. Writing the end-of-stream marker
  2. Writing the footer (FlatBuffer containing schema)
  3. Writing the footer size (int32)
  4. Writing the trailing magic bytes (ARROW1)

It can be called multiple times safely as it tracks whether the file has already been ended to prevent duplicate operations.

Note
This method is idempotent - calling it multiple times has no additional effect.
Postcondition
After calling this method, m_ended will be set to true.
Exceptions
std::runtime_errorif no record batches have been written
Examples
/home/runner/work/sparrow-ipc/sparrow-ipc/include/sparrow_ipc/stream_file_serializer.hpp.

◆ operator<<() [1/3]

template<std::ranges::input_range R>
requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch>
stream_file_serializer & sparrow_ipc::stream_file_serializer::operator<< ( const R & record_batches)
inline

Definition at line 254 of file stream_file_serializer.hpp.

Here is the call graph for this function:

◆ operator<<() [2/3]

stream_file_serializer & sparrow_ipc::stream_file_serializer::operator<< ( const sparrow::record_batch & rb)
inline

Definition at line 228 of file stream_file_serializer.hpp.

Here is the call graph for this function:

◆ operator<<() [3/3]

stream_file_serializer & sparrow_ipc::stream_file_serializer::operator<< ( stream_file_serializer &(* manip )(stream_file_serializer &))
inline

Definition at line 274 of file stream_file_serializer.hpp.

Here is the call graph for this function:

◆ write() [1/2]

template<std::ranges::input_range R>
requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch>
void sparrow_ipc::stream_file_serializer::write ( const R & record_batches)
inline

Writes a collection of record batches to the file.

This method efficiently adds multiple record batches to the serialization stream by first calculating the total required size and reserving memory space to minimize reallocations during the append operations.

Template Parameters
RThe type of the record batch collection (must be iterable)
Parameters
record_batchesA collection of record batches to append to the file
Exceptions
std::runtime_errorif the serializer has been ended
std::invalid_argumentif any record batch schema doesn't match

The method performs the following operations:

  1. Writes file header magic bytes (if first write)
  2. Calculates the total size needed for all record batches
  3. Reserves the required memory space in the stream
  4. Writes schema message (if first write)
  5. Iterates through each record batch and writes it to the stream

Definition at line 148 of file stream_file_serializer.hpp.

Here is the call graph for this function:

◆ write() [2/2]

void sparrow_ipc::stream_file_serializer::write ( const sparrow::record_batch & rb)

Writes a single record batch to the file.

Parameters
rbThe record batch to write to the file
Exceptions
std::runtime_errorif the serializer has been ended
std::invalid_argumentif the record batch schema doesn't match the established schema
Here is the caller graph for this function:

Member Data Documentation

◆ m_compression

std::optional<CompressionType> sparrow_ipc::stream_file_serializer::m_compression

Definition at line 303 of file stream_file_serializer.hpp.

◆ m_dtypes

std::vector<sparrow::data_type> sparrow_ipc::stream_file_serializer::m_dtypes

Definition at line 300 of file stream_file_serializer.hpp.

◆ m_ended

bool sparrow_ipc::stream_file_serializer::m_ended {false}

Definition at line 302 of file stream_file_serializer.hpp.

◆ m_first_record_batch

std::optional<sparrow::record_batch> sparrow_ipc::stream_file_serializer::m_first_record_batch

Definition at line 299 of file stream_file_serializer.hpp.

◆ m_header_written

bool sparrow_ipc::stream_file_serializer::m_header_written {false}

Definition at line 297 of file stream_file_serializer.hpp.

◆ m_record_batch_blocks

std::vector<record_batch_block> sparrow_ipc::stream_file_serializer::m_record_batch_blocks

Definition at line 304 of file stream_file_serializer.hpp.

◆ m_schema_received

bool sparrow_ipc::stream_file_serializer::m_schema_received {false}

Definition at line 298 of file stream_file_serializer.hpp.

◆ m_stream

any_output_stream sparrow_ipc::stream_file_serializer::m_stream

Definition at line 301 of file stream_file_serializer.hpp.


The documentation for this class was generated from the following file: