|
sparrow-ipc 0.2.0
|
A serializer that writes record batches to chunked memory streams. More...
#include <chunk_memory_serializer.hpp>
Public Member Functions | |
| chunk_serializer (chunked_memory_output_stream< std::vector< std::vector< uint8_t > > > &stream, std::optional< CompressionType > compression=std::nullopt) | |
| Constructs a chunk serializer with a reference to a chunked memory output stream. | |
| void | write (const sparrow::record_batch &rb) |
| Writes a single record batch to the chunked stream. | |
| template<std::ranges::input_range R> requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch> | |
| void | write (const R &record_batches) |
| Writes a range of record batches to the chunked stream. | |
| chunk_serializer & | operator<< (const sparrow::record_batch &rb) |
| template<std::ranges::input_range R> requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch> | |
| chunk_serializer & | operator<< (const R &record_batches) |
| void | end () |
| Finalizes the chunk serialization by writing an end-of-stream marker. | |
A serializer that writes record batches to chunked memory streams.
The chunk_serializer class provides functionality to serialize Apache Arrow record batches into separate memory chunks. Each record batch (and the schema) is written as an independent chunk in the output stream, making it suitable for scenarios where data needs to be processed or transmitted in discrete units.
The serializer maintains schema consistency across all record batches:
Definition at line 36 of file chunk_memory_serializer.hpp.
| sparrow_ipc::chunk_serializer::chunk_serializer | ( | chunked_memory_output_stream< std::vector< std::vector< uint8_t > > > & | stream, |
| std::optional< CompressionType > | compression = std::nullopt ) |
Constructs a chunk serializer with a reference to a chunked memory output stream.
| stream | Reference to a chunked memory output stream that will receive the serialized chunks |
| compression | Optional: The compression type to use for record batch bodies. |
| void sparrow_ipc::chunk_serializer::end | ( | ) |
Finalizes the chunk serialization by writing an end-of-stream marker.
This method signals the end of the serialization process. After calling this method, no further record batches can be written to this serializer.
| std::runtime_error | if attempting to write after this method has been called |
| chunk_serializer & sparrow_ipc::chunk_serializer::operator<< | ( | const R & | record_batches | ) |
Definition at line 182 of file chunk_memory_serializer.hpp.
|
inline |
Definition at line 174 of file chunk_memory_serializer.hpp.
| void sparrow_ipc::chunk_serializer::write | ( | const R & | record_batches | ) |
Writes a range of record batches to the chunked stream.
This template method efficiently serializes multiple record batches to the chunked output stream. If this is the first write operation, the schema is automatically serialized first as a separate chunk. Each record batch is then serialized into its own independent chunk.
| R | The range type containing record batches (must satisfy std::ranges::input_range) |
| record_batches | A range of record batches to serialize |
| std::runtime_error | if the serializer has been ended via end() |
| std::invalid_argument | if any record batch schema doesn't match previously written batches |
Definition at line 139 of file chunk_memory_serializer.hpp.
| void sparrow_ipc::chunk_serializer::write | ( | const sparrow::record_batch & | rb | ) |
Writes a single record batch to the chunked stream.
This method serializes a record batch into the chunked output stream. If this is the first record batch written, the schema is automatically serialized first as a separate chunk.
| rb | The record batch to serialize |
| std::runtime_error | if the serializer has been ended via end() |
| std::invalid_argument | if the record batch schema doesn't match previously written batches |