sparrow-ipc 0.2.0
Loading...
Searching...
No Matches
Serialization and Deserialization

This page describes how to serialize and deserialize record batches using sparrow-ipc.

Overview

sparrow-ipc provides two main approaches for both serialization and deserialization:

  • Function API: Simple one-shot operations for serializing/deserializing complete data
  • Class API: Streaming-oriented classes (serializer and deserializer) for incremental operations

Serialization

Serialize record batches to a memory stream

The simplest way to serialize record batches is to use the serializer class with a memory_output_stream:

std::vector<uint8_t> serialize_batches_to_stream(const std::vector<sp::record_batch>& batches)
{
std::cout << "\n2. Serializing record batches to stream...\n";
std::vector<uint8_t> stream_data;
sparrow_ipc::memory_output_stream stream(stream_data);
sparrow_ipc::serializer serializer(stream);
// Serialize all batches using the streaming operator
serializer << batches << sparrow_ipc::end_stream;
std::cout << " Serialized stream size: " << stream_data.size() << " bytes\n";
return stream_data;
}

Serialize individual record batches

You can also serialize record batches one at a time:

const std::vector<sp::record_batch>& batches,
const std::vector<uint8_t>& batch_stream_data
)
{
std::cout << "\n6. Demonstrating individual vs batch serialization...\n";
// Serialize individual batches one by one
std::vector<uint8_t> individual_stream_data;
sparrow_ipc::memory_output_stream individual_stream(individual_stream_data);
sparrow_ipc::serializer individual_serializer(individual_stream);
for (const auto& batch : batches)
{
individual_serializer << batch;
}
individual_serializer << sparrow_ipc::end_stream;
std::cout << " Individual serialization size: " << individual_stream_data.size() << " bytes\n";
std::cout << " Batch serialization size: " << batch_stream_data.size() << " bytes\n";
// Both should produce the same result
auto individual_deserialized = sparrow_ipc::deserialize_stream(individual_stream_data);
if (individual_deserialized.size() == batches.size())
{
std::cout << " ✓ Individual and batch serialization produce equivalent results\n";
}
else
{
std::cerr << " ✗ Individual and batch serialization mismatch!\n";
}
}

Deserialization

Using the function API

The simplest way to deserialize a complete Arrow IPC stream is using deserialize_stream:

std::vector<sp::record_batch> deserialize_stream_example(const std::vector<uint8_t>& stream_data)
{
// Deserialize the entire stream at once
auto batches = sp_ipc::deserialize_stream(stream_data);
return batches;
}

Using the deserializer class

The deserializer class provides more control over deserialization and is useful when you want to:

  • Accumulate batches into an existing container
  • Deserialize data incrementally as it arrives
  • Process multiple streams into a single container

Basic usage

void deserializer_basic_example(const std::vector<uint8_t>& stream_data)
{
// Create a container to hold the deserialized batches
std::vector<sp::record_batch> batches;
// Create a deserializer that will append to our container
sp_ipc::deserializer deser(batches);
// Deserialize the stream data
deser.deserialize(std::span<const uint8_t>(stream_data));
// Process the accumulated batches
for (const auto& batch : batches)
{
std::cout << "Batch with " << batch.nb_rows() << " rows and " << batch.nb_columns() << " columns\n";
}
}

Incremental deserialization

The deserializer class is particularly useful for streaming scenarios where data arrives in chunks:

void deserializer_incremental_example(const std::vector<std::vector<uint8_t>>& stream_chunks)
{
// Container to accumulate all deserialized batches
std::vector<sp::record_batch> batches;
// Create a deserializer
sp_ipc::deserializer deser(batches);
// Deserialize chunks as they arrive using the streaming operator
for (const auto& chunk : stream_chunks)
{
deser << std::span<const uint8_t>(chunk);
std::cout << "After chunk: " << batches.size() << " batches accumulated\n";
}
// All batches are now available in the container
std::cout << "Total batches deserialized: " << batches.size() << "\n";
}

Chaining deserializations

The streaming operator can be chained for fluent API usage:

const std::vector<uint8_t>& chunk1,
const std::vector<uint8_t>& chunk2,
const std::vector<uint8_t>& chunk3
)
{
std::vector<sp::record_batch> batches;
sp_ipc::deserializer deser(batches);
// Chain multiple deserializations in a single expression
deser << std::span<const uint8_t>(chunk1) << std::span<const uint8_t>(chunk2)
<< std::span<const uint8_t>(chunk3);
std::cout << "Deserialized " << batches.size() << " batches from 3 chunks\n";
}