This page describes how to serialize and deserialize record batches using sparrow-ipc.
Overview
sparrow-ipc provides two main approaches for both serialization and deserialization:
- Function API: Simple one-shot operations for serializing/deserializing complete data
- Class API: Streaming-oriented classes (
serializer and deserializer) for incremental operations
Serialization
Serialize record batches to a memory stream
The simplest way to serialize record batches is to use the serializer class with a memory_output_stream:
{
std::cout << "\n2. Serializing record batches to stream...\n";
std::vector<uint8_t> stream_data;
std::cout << " Serialized stream size: " << stream_data.size() << " bytes\n";
return stream_data;
}
Serialize individual record batches
You can also serialize record batches one at a time:
const std::vector<sp::record_batch>& batches,
const std::vector<uint8_t>& batch_stream_data
)
{
std::cout << "\n6. Demonstrating individual vs batch serialization...\n";
std::vector<uint8_t> individual_stream_data;
for (const auto& batch : batches)
{
individual_serializer << batch;
}
std::cout << " Individual serialization size: " << individual_stream_data.size() << " bytes\n";
std::cout << " Batch serialization size: " << batch_stream_data.size() << " bytes\n";
if (individual_deserialized.size() == batches.size())
{
std::cout << " ✓ Individual and batch serialization produce equivalent results\n";
}
else
{
std::cerr << " ✗ Individual and batch serialization mismatch!\n";
}
}
Deserialization
Using the function API
The simplest way to deserialize a complete Arrow IPC stream is using deserialize_stream:
Using the deserializer class
The deserializer class provides more control over deserialization and is useful when you want to:
- Accumulate batches into an existing container
- Deserialize data incrementally as it arrives
- Process multiple streams into a single container
Basic usage
{
std::vector<sp::record_batch> batches;
deser.deserialize(std::span<const uint8_t>(stream_data));
for (const auto& batch : batches)
{
std::cout << "Batch with " << batch.nb_rows() << " rows and " << batch.nb_columns() << " columns\n";
}
}
Incremental deserialization
The deserializer class is particularly useful for streaming scenarios where data arrives in chunks:
{
std::vector<sp::record_batch> batches;
for (const auto& chunk : stream_chunks)
{
deser << std::span<const uint8_t>(chunk);
std::cout << "After chunk: " << batches.size() << " batches accumulated\n";
}
std::cout << "Total batches deserialized: " << batches.size() << "\n";
}
Chaining deserializations
The streaming operator can be chained for fluent API usage:
const std::vector<uint8_t>& chunk1,
const std::vector<uint8_t>& chunk2,
const std::vector<uint8_t>& chunk3
)
{
std::vector<sp::record_batch> batches;
deser << std::span<const uint8_t>(chunk1) << std::span<const uint8_t>(chunk2)
<< std::span<const uint8_t>(chunk3);
std::cout << "Deserialized " << batches.size() << " batches from 3 chunks\n";
}