|
sparrow-ipc 0.2.0
|
A class for serializing Apache Arrow record batches to the IPC file format. More...
#include <stream_file_serializer.hpp>
Public Member Functions | |
| template<writable_stream TStream> | |
| stream_file_serializer (TStream &stream, std::optional< CompressionType > compression=std::nullopt) | |
| Constructs a stream_file_serializer object with a reference to a stream. | |
| ~stream_file_serializer () | |
| Destructor for the stream_file_serializer. | |
| void | write (const sparrow::record_batch &rb) |
| Writes a single record batch to the file. | |
| template<std::ranges::input_range R> requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch> | |
| void | write (const R &record_batches) |
| Writes a collection of record batches to the file. | |
| stream_file_serializer & | operator<< (const sparrow::record_batch &rb) |
| template<std::ranges::input_range R> requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch> | |
| stream_file_serializer & | operator<< (const R &record_batches) |
| stream_file_serializer & | operator<< (stream_file_serializer &(*manip)(stream_file_serializer &)) |
| void | end () |
| Finalizes the file serialization by writing footer and trailing magic bytes. | |
Public Attributes | |
| bool | m_header_written {false} |
| bool | m_schema_received {false} |
| std::optional< sparrow::record_batch > | m_first_record_batch |
| std::vector< sparrow::data_type > | m_dtypes |
| any_output_stream | m_stream |
| bool | m_ended {false} |
| std::optional< CompressionType > | m_compression |
| std::vector< record_batch_block > | m_record_batch_blocks |
A class for serializing Apache Arrow record batches to the IPC file format.
The stream_file_serializer class provides functionality to serialize single or multiple record batches into the Arrow IPC file format suitable for storage. It ensures schema consistency across multiple record batches and optimizes memory allocation by pre-calculating required buffer sizes.
The stream_file_serializer follows the Arrow IPC file format specification:
The class validates that all record batches have consistent schemas and throws std::invalid_argument if inconsistencies are detected.
Definition at line 91 of file stream_file_serializer.hpp.
|
inline |
Constructs a stream_file_serializer object with a reference to a stream.
| TStream | The type of the stream to be used for serialization. |
| stream | Reference to the stream object that will be used for serialization operations. The serializer stores a pointer to this stream for later use. |
| compression | Optional compression type to apply to record batch bodies. |
Definition at line 104 of file stream_file_serializer.hpp.
| sparrow_ipc::stream_file_serializer::~stream_file_serializer | ( | ) |
Destructor for the stream_file_serializer.
Ensures proper cleanup by calling end() if the serializer has not been explicitly ended. This guarantees that the complete file format (including footer and trailing magic bytes) is written before the object is destroyed.
| void sparrow_ipc::stream_file_serializer::end | ( | ) |
Finalizes the file serialization by writing footer and trailing magic bytes.
This method completes the Arrow IPC file format by:
It can be called multiple times safely as it tracks whether the file has already been ended to prevent duplicate operations.
| std::runtime_error | if no record batches have been written |
|
inline |
Definition at line 254 of file stream_file_serializer.hpp.
|
inline |
Definition at line 228 of file stream_file_serializer.hpp.
|
inline |
Definition at line 274 of file stream_file_serializer.hpp.
|
inline |
Writes a collection of record batches to the file.
This method efficiently adds multiple record batches to the serialization stream by first calculating the total required size and reserving memory space to minimize reallocations during the append operations.
| R | The type of the record batch collection (must be iterable) |
| record_batches | A collection of record batches to append to the file |
| std::runtime_error | if the serializer has been ended |
| std::invalid_argument | if any record batch schema doesn't match |
The method performs the following operations:
Definition at line 148 of file stream_file_serializer.hpp.
| void sparrow_ipc::stream_file_serializer::write | ( | const sparrow::record_batch & | rb | ) |
Writes a single record batch to the file.
| rb | The record batch to write to the file |
| std::runtime_error | if the serializer has been ended |
| std::invalid_argument | if the record batch schema doesn't match the established schema |
| std::optional<CompressionType> sparrow_ipc::stream_file_serializer::m_compression |
Definition at line 303 of file stream_file_serializer.hpp.
| std::vector<sparrow::data_type> sparrow_ipc::stream_file_serializer::m_dtypes |
Definition at line 300 of file stream_file_serializer.hpp.
| bool sparrow_ipc::stream_file_serializer::m_ended {false} |
Definition at line 302 of file stream_file_serializer.hpp.
| std::optional<sparrow::record_batch> sparrow_ipc::stream_file_serializer::m_first_record_batch |
Definition at line 299 of file stream_file_serializer.hpp.
| bool sparrow_ipc::stream_file_serializer::m_header_written {false} |
Definition at line 297 of file stream_file_serializer.hpp.
| std::vector<record_batch_block> sparrow_ipc::stream_file_serializer::m_record_batch_blocks |
Definition at line 304 of file stream_file_serializer.hpp.
| bool sparrow_ipc::stream_file_serializer::m_schema_received {false} |
Definition at line 298 of file stream_file_serializer.hpp.
| any_output_stream sparrow_ipc::stream_file_serializer::m_stream |
Definition at line 301 of file stream_file_serializer.hpp.