|
sparrow-ipc 0.2.0
|
Namespaces | |
| namespace | details |
| namespace | utils |
Classes | |
| class | any_output_stream |
| Type-erased wrapper for any stream-like object. More... | |
| class | arrow_array_private_data |
| class | chunk_serializer |
| A serializer that writes record batches to chunked memory streams. More... | |
| class | chunked_memory_output_stream |
| An output stream that writes data into separate memory chunks. More... | |
| class | CompressionCache |
| class | deserializer |
| class | encapsulated_message |
| class | memory_output_stream |
| An output stream that writes data to a contiguous memory buffer. More... | |
| class | non_owning_arrow_schema_private_data |
| struct | record_batch_block |
| Represents a block entry in the Arrow IPC file footer. More... | |
| struct | serialized_record_batch_info |
| Information about a serialized record batch block. More... | |
| class | serializer |
| A class for serializing Apache Arrow record batches to an output stream. More... | |
| class | stream_file_serializer |
| A class for serializing Apache Arrow record batches to the IPC file format. More... | |
Concepts | |
| concept | writable_stream |
| Concept for stream-like types that support write operations. | |
| concept | ArrowPrivateData |
Enumerations | |
| enum class | CompressionType : std::uint8_t { LZ4_FRAME , ZSTD } |
Functions | |
| SPARROW_IPC_API void | release_arrow_array_children_and_dictionary (ArrowArray *array) |
| template<ArrowPrivateData T> | |
| void | arrow_array_release (ArrowArray *array) |
| template<ArrowPrivateData T, typename Arg> | |
| void | fill_arrow_array (ArrowArray &array, int64_t length, int64_t null_count, int64_t offset, size_t children_count, ArrowArray **children, ArrowArray *dictionary, Arg &&private_data_arg) |
| template<ArrowPrivateData T, typename Arg> | |
| ArrowArray | make_arrow_array (int64_t length, int64_t null_count, int64_t offset, size_t children_count, ArrowArray **children, ArrowArray *dictionary, Arg &&private_data_arg) |
| template<class T> requires std::same_as<T, ArrowArray> || std::same_as<T, ArrowSchema> | |
| void | release_common_non_owning_arrow (T &t) |
Release the children and dictionnary of an ArrowArray or ArrowSchema. | |
| SPARROW_IPC_API void | release_non_owning_arrow_schema (ArrowSchema *schema) |
| template<sparrow::input_metadata_container M = std::vector<sparrow::metadata_pair>> | |
| void | fill_non_owning_arrow_schema (ArrowSchema &schema, std::string_view format, const char *name, std::optional< M > metadata, std::optional< std::unordered_set< sparrow::ArrowFlag > > flags, size_t children_count, ArrowSchema **children, ArrowSchema *dictionary) |
| template<sparrow::input_metadata_container M = std::vector<sparrow::metadata_pair>> | |
| ArrowSchema | make_non_owning_arrow_schema (std::string_view format, const char *name, std::optional< M > metadata, std::optional< std::unordered_set< sparrow::ArrowFlag > > flags, size_t children_count, ArrowSchema **children, ArrowSchema *dictionary) |
| SPARROW_IPC_API std::span< const std::uint8_t > | compress (const CompressionType compression_type, const std::span< const std::uint8_t > &data, CompressionCache &cache) |
| SPARROW_IPC_API size_t | get_compressed_size (const CompressionType compression_type, const std::span< const std::uint8_t > &data, CompressionCache &cache) |
| SPARROW_IPC_API std::variant< std::vector< std::uint8_t >, std::span< const std::uint8_t > > | decompress (const CompressionType compression_type, std::span< const std::uint8_t > data) |
| SPARROW_IPC_API std::vector< sparrow::record_batch > | deserialize_stream (std::span< const uint8_t > data) |
| Deserializes an Arrow IPC stream from binary data into a vector of record batches. | |
| sparrow::fixed_width_binary_array | deserialize_non_owning_fixedwidthbinary (const org::apache::arrow::flatbuf::RecordBatch &record_batch, std::span< const uint8_t > body, std::string_view name, const std::optional< std::vector< sparrow::metadata_pair > > &metadata, bool nullable, size_t &buffer_index, int32_t byte_width) |
| template<typename T> | |
| sparrow::primitive_array< T > | deserialize_non_owning_primitive_array (const org::apache::arrow::flatbuf::RecordBatch &record_batch, std::span< const uint8_t > body, std::string_view name, const std::optional< std::vector< sparrow::metadata_pair > > &metadata, bool nullable, size_t &buffer_index) |
| template<typename T> | |
| T | deserialize_non_owning_variable_size_binary (const org::apache::arrow::flatbuf::RecordBatch &record_batch, std::span< const uint8_t > body, std::string_view name, const std::optional< std::vector< sparrow::metadata_pair > > &metadata, bool nullable, size_t &buffer_index) |
| std::pair< encapsulated_message, std::span< const uint8_t > > | extract_encapsulated_message (std::span< const uint8_t > buf_ptr) |
| std::pair< org::apache::arrow::flatbuf::Type, flatbuffers::Offset< void > > | get_flatbuffer_decimal_type (flatbuffers::FlatBufferBuilder &builder, std::string_view format_str, const int32_t bitWidth) |
| std::pair< org::apache::arrow::flatbuf::Type, flatbuffers::Offset< void > > | get_flatbuffer_type (flatbuffers::FlatBufferBuilder &builder, std::string_view format_str) |
| flatbuffers::Offset< flatbuffers::Vector< flatbuffers::Offset< org::apache::arrow::flatbuf::KeyValue > > > | create_metadata (flatbuffers::FlatBufferBuilder &builder, const ArrowSchema &arrow_schema) |
| Creates a FlatBuffers vector of KeyValue pairs from ArrowSchema metadata. | |
| ::flatbuffers::Offset< org::apache::arrow::flatbuf::Field > | create_field (flatbuffers::FlatBufferBuilder &builder, const ArrowSchema &arrow_schema, std::optional< std::string_view > name_override=std::nullopt) |
| Creates a FlatBuffer Field object from an ArrowSchema. | |
| ::flatbuffers::Offset< ::flatbuffers::Vector<::flatbuffers::Offset< org::apache::arrow::flatbuf::Field > > > | create_children (flatbuffers::FlatBufferBuilder &builder, const sparrow::record_batch &record_batch) |
| Creates a FlatBuffers vector of Field objects from a record batch. | |
| ::flatbuffers::Offset< ::flatbuffers::Vector<::flatbuffers::Offset< org::apache::arrow::flatbuf::Field > > > | create_children (flatbuffers::FlatBufferBuilder &builder, const ArrowSchema &arrow_schema) |
| Creates a FlatBuffers vector of Field objects from an ArrowSchema's children. | |
| flatbuffers::FlatBufferBuilder | get_schema_message_builder (const sparrow::record_batch &record_batch) |
| Creates a FlatBuffer builder containing a serialized Arrow schema message. | |
| void | fill_fieldnodes (const sparrow::arrow_proxy &arrow_proxy, std::vector< org::apache::arrow::flatbuf::FieldNode > &nodes) |
| Recursively fills a vector of FieldNode objects from an arrow_proxy and its children. | |
| std::vector< org::apache::arrow::flatbuf::FieldNode > | create_fieldnodes (const sparrow::record_batch &record_batch) |
| Creates a vector of Apache Arrow FieldNode objects from a record batch. | |
| void | fill_buffers (const sparrow::arrow_proxy &arrow_proxy, std::vector< org::apache::arrow::flatbuf::Buffer > &flatbuf_buffers, int64_t &offset) |
| Recursively fills a vector of FlatBuffer Buffer objects with buffer information from an Arrow proxy. | |
| std::vector< org::apache::arrow::flatbuf::Buffer > | get_buffers (const sparrow::record_batch &record_batch) |
| Extracts buffer information from a record batch for serialization. | |
| void | fill_compressed_buffers (const sparrow::arrow_proxy &arrow_proxy, std::vector< org::apache::arrow::flatbuf::Buffer > &flatbuf_compressed_buffers, int64_t &offset, const CompressionType compression_type, CompressionCache &cache) |
| Recursively populates a vector with compressed buffer metadata from an Arrow proxy. | |
| std::vector< org::apache::arrow::flatbuf::Buffer > | get_compressed_buffers (const sparrow::record_batch &record_batch, const CompressionType compression_type, CompressionCache &cache) |
| Retrieves metadata describing the layout of compressed buffers within a record batch. | |
| int64_t | calculate_body_size (const sparrow::arrow_proxy &arrow_proxy, std::optional< CompressionType > compression=std::nullopt, std::optional< std::reference_wrapper< CompressionCache > > cache=std::nullopt) |
| Calculates the total aligned size in bytes of all buffers in an Arrow array structure. | |
| int64_t | calculate_body_size (const sparrow::record_batch &record_batch, std::optional< CompressionType > compression=std::nullopt, std::optional< std::reference_wrapper< CompressionCache > > cache=std::nullopt) |
| Calculates the total body size of a record batch by summing the body sizes of all its columns. | |
| flatbuffers::FlatBufferBuilder | get_record_batch_message_builder (const sparrow::record_batch &record_batch, std::optional< CompressionType > compression=std::nullopt, std::optional< std::reference_wrapper< CompressionCache > > cache=std::nullopt) |
| Creates a FlatBuffer message containing a serialized Apache Arrow RecordBatch. | |
| SPARROW_IPC_API const org::apache::arrow::flatbuf::Footer * | get_footer_from_file_data (std::span< const uint8_t > file_data) |
| template<std::ranges::input_range R> | |
| bool | is_continuation (const R &buf) |
| template<std::ranges::input_range R> | |
| bool | is_end_of_stream (const R &buf) |
| template<std::ranges::input_range R> | |
| bool | is_arrow_file_magic (const R &buf) |
| std::vector< sparrow::metadata_pair > | to_sparrow_metadata (const ::flatbuffers::Vector<::flatbuffers::Offset< org::apache::arrow::flatbuf::KeyValue > > &metadata) |
| Converts FlatBuffers metadata to Sparrow metadata format. | |
| template<std::ranges::input_range R> requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch> | |
| void | serialize_record_batches_to_ipc_stream (const R &record_batches, any_output_stream &stream, std::optional< CompressionType > compression, std::optional< std::reference_wrapper< CompressionCache > > cache) |
| Serializes a collection of record batches into a binary format. | |
| SPARROW_IPC_API serialized_record_batch_info | serialize_record_batch (const sparrow::record_batch &record_batch, any_output_stream &stream, std::optional< CompressionType > compression, std::optional< std::reference_wrapper< CompressionCache > > cache) |
| Serializes a record batch into a binary format following the Arrow IPC specification. | |
| SPARROW_IPC_API void | serialize_schema_message (const sparrow::record_batch &record_batch, any_output_stream &stream) |
| Serializes a schema message for a record batch into a byte buffer. | |
| SPARROW_IPC_API std::size_t | calculate_schema_message_size (const sparrow::record_batch &record_batch) |
| Calculates the total serialized size of a schema message. | |
| SPARROW_IPC_API std::size_t | calculate_record_batch_message_size (const sparrow::record_batch &record_batch, std::optional< CompressionType > compression=std::nullopt, std::optional< std::reference_wrapper< CompressionCache > > cache=std::nullopt) |
| Calculates the total serialized size of a record batch message. | |
| template<std::ranges::input_range R> requires std::same_as<std::ranges::range_value_t<R>, sparrow::record_batch> | |
| std::size_t | calculate_total_serialized_size (const R &record_batches, std::optional< CompressionType > compression=std::nullopt, std::optional< std::reference_wrapper< CompressionCache > > cache=std::nullopt) |
| Calculates the total serialized size for a collection of record batches. | |
| SPARROW_IPC_API void | fill_body (const sparrow::arrow_proxy &arrow_proxy, any_output_stream &stream, std::optional< CompressionType > compression=std::nullopt, std::optional< std::reference_wrapper< CompressionCache > > cache=std::nullopt) |
| Fills the body vector with serialized data from an arrow proxy and its children. | |
| SPARROW_IPC_API void | generate_body (const sparrow::record_batch &record_batch, any_output_stream &stream, std::optional< CompressionType > compression=std::nullopt, std::optional< std::reference_wrapper< CompressionCache > > cache=std::nullopt) |
| Generates a serialized body from a record batch. | |
| SPARROW_IPC_API std::vector< sparrow::data_type > | get_column_dtypes (const sparrow::record_batch &rb) |
| serializer & | end_stream (serializer &serializer) |
| SPARROW_IPC_API size_t | write_footer (const sparrow::record_batch &record_batch, const std::vector< record_batch_block > &record_batch_blocks, any_output_stream &stream) |
| Writes the Arrow IPC file footer. | |
| SPARROW_IPC_API std::vector< sparrow::record_batch > | deserialize_file (std::span< const uint8_t > data) |
| Deserializes Arrow IPC file format into a vector of record batches. | |
| stream_file_serializer & | end_file (stream_file_serializer &serializer) |
Variables | |
| constexpr int | SPARROW_IPC_VERSION_MAJOR = 0 |
| constexpr int | SPARROW_IPC_VERSION_MINOR = 2 |
| constexpr int | SPARROW_IPC_VERSION_PATCH = 0 |
| constexpr int | SPARROW_IPC_BINARY_CURRENT = 2 |
| constexpr int | SPARROW_IPC_BINARY_REVISION = 0 |
| constexpr int | SPARROW_IPC_BINARY_AGE = 1 |
| constexpr std::array< std::uint8_t, 4 > | continuation = {0xFF, 0xFF, 0xFF, 0xFF} |
| Continuation value defined in the Arrow IPC specification: https://arrow.apache.org/docs/format/Columnar.html#encapsulated-message-format. | |
| constexpr std::array< std::uint8_t, 8 > | end_of_stream = {0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00} |
| End-of-stream marker defined in the Arrow IPC specification: https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format. | |
| constexpr std::array< std::uint8_t, 6 > | arrow_file_magic = {'A', 'R', 'R', 'O', 'W', '1'} |
| Magic bytes for Arrow file format defined in the Arrow IPC specification: https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format The magic string is "ARROW1" (6 bytes) followed by 2 padding bytes to reach 8-byte alignment. | |
| constexpr std::size_t | arrow_file_magic_size = arrow_file_magic.size() |
| constexpr std::array< std::uint8_t, 8 > | arrow_file_header_magic = {'A', 'R', 'R', 'O', 'W', '1', 0x00, 0x00} |
| Magic bytes with padding for file header (8 bytes total for alignment) | |
|
strong |
| Enumerator | |
|---|---|
| LZ4_FRAME | |
| ZSTD | |
Definition at line 14 of file compression.hpp.
| void sparrow_ipc::arrow_array_release | ( | ArrowArray * | array | ) |
Definition at line 16 of file arrow_array.hpp.
|
nodiscard |
Calculates the total aligned size in bytes of all buffers in an Arrow array structure.
This function recursively computes the total size needed for all buffers in an Arrow array structure, including buffers from child arrays. Each buffer size is aligned to 8-byte boundaries as required by the Arrow format.
| arrow_proxy | The Arrow array proxy containing buffers and child arrays. |
| compression | Optional: The compression type to use when serializing. |
| cache | Optional: A cache to store and retrieve compressed buffer sizes, avoiding recompression. If compression is given, cache should be set as well. |
| std::invalid_argument | if compression is given but not cache. |
|
nodiscard |
Calculates the total body size of a record batch by summing the body sizes of all its columns.
This function iterates through all columns in the given record batch and accumulates the body size of each column's underlying Arrow array proxy. The body size represents the total memory required for the serialized data content of the record batch.
| record_batch | The sparrow record batch containing columns to calculate size for. |
| compression | Optional: The compression type to use when serializing. If not provided, sizes are for uncompressed buffers. |
| cache | Optional: A cache to store and retrieve compressed buffer sizes, avoiding recompression. If compression is given, cache should be set as well. |
|
nodiscard |
Calculates the total serialized size of a record batch message.
This function computes the complete size that would be produced by serialize_record_batch(), including:
| record_batch | The record batch to be measured. |
| compression | Optional: The compression type to use when serializing. |
| cache | Optional: A cache to store and retrieve compressed buffer sizes, avoiding recompression. If compression is given, cache should be set as well. |
|
nodiscard |
Calculates the total serialized size of a schema message.
This function computes the complete size that would be produced by serialize_schema_message(), including:
| record_batch | The record batch containing the schema to be measured |
|
nodiscard |
Calculates the total serialized size for a collection of record batches.
This function computes the complete size that would be produced by serializing a schema message followed by all record batch messages in the collection.
| R | Range type containing sparrow::record_batch objects. |
| record_batches | Collection of record batches to be measured. |
| compression | Optional: The compression type to use when serializing. |
| cache | Optional: A cache to store and retrieve compressed buffer sizes, avoiding recompression. If compression is given, cache should be set as well. |
| std::invalid_argument | if record batches have inconsistent schemas. |
Definition at line 82 of file serialize_utils.hpp.
|
nodiscard |
|
nodiscard |
Creates a FlatBuffers vector of Field objects from an ArrowSchema's children.
This function iterates through all children of the given ArrowSchema and converts each child to a FlatBuffers Field object. The resulting fields are collected into a FlatBuffers vector.
| builder | Reference to the FlatBufferBuilder used for creating FlatBuffers objects |
| arrow_schema | The ArrowSchema containing the children to convert |
| std::invalid_argument | If any child pointer in the ArrowSchema is null |
|
nodiscard |
Creates a FlatBuffers vector of Field objects from a record batch.
This function extracts column information from a record batch and converts each column into a FlatBuffers Field object. It uses both the column's Arrow schema and the record batch's column names to create properly named fields. The resulting fields are collected into a FlatBuffers vector.
| builder | Reference to the FlatBuffers builder used for creating the vector |
| record_batch | The record batch containing columns and their associated names |
|
nodiscard |
Creates a FlatBuffer Field object from an ArrowSchema.
This function converts an ArrowSchema structure into a FlatBuffer Field representation suitable for Apache Arrow IPC serialization. It handles the creation of all necessary components including field name, type information, metadata, children, and nullable flag.
| builder | Reference to the FlatBufferBuilder used for creating FlatBuffer objects |
| arrow_schema | The ArrowSchema structure containing the field definition to convert |
| name_override | Optional field name to use instead of the name from arrow_schema. If provided, this name will be used regardless of arrow_schema.name. If not provided, falls back to arrow_schema.name (or empty if null) |
|
nodiscard |
Creates a vector of Apache Arrow FieldNode objects from a record batch.
This function iterates through all columns in the provided record batch and generates corresponding FieldNode flatbuffer objects. Each column's arrow proxy is used to populate the field nodes vector through the fill_fieldnodes function.
| record_batch | The sparrow record batch containing columns to process |
|
nodiscard |
Creates a FlatBuffers vector of KeyValue pairs from ArrowSchema metadata.
This function converts metadata from an ArrowSchema into a FlatBuffers representation suitable for serialization. It processes key-value pairs from the schema's metadata and creates corresponding FlatBuffers KeyValue objects.
| builder | Reference to the FlatBufferBuilder used for creating FlatBuffers objects |
| arrow_schema | The ArrowSchema containing metadata to be serialized |
|
nodiscard |
|
nodiscard |
Deserializes Arrow IPC file format into a vector of record batches.
Reads an Arrow IPC file format which consists of:
| data | A span of bytes containing the serialized Arrow IPC file data |
| std::runtime_error | If:
|
|
nodiscard |
|
nodiscard |
Definition at line 18 of file deserialize_primitive_array.hpp.
|
nodiscard |
Definition at line 17 of file deserialize_variable_size_binary_array.hpp.
|
nodiscard |
Deserializes an Arrow IPC stream from binary data into a vector of record batches.
This function processes an Arrow IPC stream format, extracting schema information and record batch data. It handles encapsulated messages sequentially, first expecting a Schema message followed by one or more RecordBatch messages.
| data | A span of bytes containing the serialized Arrow IPC stream data |
| std::runtime_error | If:
|
|
inline |
Definition at line 320 of file stream_file_serializer.hpp.
|
inline |
Definition at line 223 of file serializer.hpp.
|
nodiscard |
| void sparrow_ipc::fill_arrow_array | ( | ArrowArray & | array, |
| int64_t | length, | ||
| int64_t | null_count, | ||
| int64_t | offset, | ||
| size_t | children_count, | ||
| ArrowArray ** | children, | ||
| ArrowArray * | dictionary, | ||
| Arg && | private_data_arg ) |
Definition at line 32 of file arrow_array.hpp.
| SPARROW_IPC_API void sparrow_ipc::fill_body | ( | const sparrow::arrow_proxy & | arrow_proxy, |
| any_output_stream & | stream, | ||
| std::optional< CompressionType > | compression = std::nullopt, | ||
| std::optional< std::reference_wrapper< CompressionCache > > | cache = std::nullopt ) |
Fills the body vector with serialized data from an arrow proxy and its children.
This function recursively processes an arrow proxy by:
The function ensures proper memory alignment by padding each buffer's data to the next 8-byte boundary, which is typically required for efficient memory access and Arrow format compliance.
| arrow_proxy | The arrow proxy containing buffers and potential child proxies to serialize. |
| stream | The output stream where the serialized body data will be written. |
| compression | Optional: The compression type to use when serializing. |
| cache | Optional: A cache for compressed buffers to avoid recompression if compression is enabled. If compression is given, cache should be set as well. |
| std::invalid_argument | if compression is given but not cache. |
| void sparrow_ipc::fill_buffers | ( | const sparrow::arrow_proxy & | arrow_proxy, |
| std::vector< org::apache::arrow::flatbuf::Buffer > & | flatbuf_buffers, | ||
| int64_t & | offset ) |
Recursively fills a vector of FlatBuffer Buffer objects with buffer information from an Arrow proxy.
This function traverses an Arrow proxy structure and creates FlatBuffer Buffer entries for each buffer found in the proxy and its children. The buffers are processed in a depth-first manner, first handling the buffers of the current proxy, then recursively processing all child proxies.
| arrow_proxy | The Arrow proxy object containing buffers and potential child proxies to process |
| flatbuf_buffers | Vector of FlatBuffer Buffer objects to be populated with buffer information |
| offset | Reference to the current byte offset, updated as buffers are processed and aligned to 8-byte boundaries |
| void sparrow_ipc::fill_compressed_buffers | ( | const sparrow::arrow_proxy & | arrow_proxy, |
| std::vector< org::apache::arrow::flatbuf::Buffer > & | flatbuf_compressed_buffers, | ||
| int64_t & | offset, | ||
| const CompressionType | compression_type, | ||
| CompressionCache & | cache ) |
Recursively populates a vector with compressed buffer metadata from an Arrow proxy.
This function traverses the Arrow proxy and its children, compressing each buffer and recording its metadata (offset and size) in the provided vector. The offset is updated to ensure proper alignment for each subsequent buffer.
| arrow_proxy | The Arrow proxy containing the buffers to be compressed. |
| flatbuf_compressed_buffers | A vector to store the resulting compressed buffer metadata. |
| offset | The current offset in the buffer layout, which will be updated by the function. |
| compression_type | The compression algorithm to use. |
| cache | A cache to store compressed buffers and avoid recompression. |
| void sparrow_ipc::fill_fieldnodes | ( | const sparrow::arrow_proxy & | arrow_proxy, |
| std::vector< org::apache::arrow::flatbuf::FieldNode > & | nodes ) |
Recursively fills a vector of FieldNode objects from an arrow_proxy and its children.
This function creates FieldNode objects containing length and null count information from the given arrow_proxy and recursively processes all its children, appending them to the provided nodes vector in depth-first order.
| arrow_proxy | The arrow proxy object containing array metadata (length, null_count) and potential child arrays |
| nodes | Reference to a vector that will be populated with FieldNode objects. Each FieldNode contains the length and null count of the corresponding array. |
| void sparrow_ipc::fill_non_owning_arrow_schema | ( | ArrowSchema & | schema, |
| std::string_view | format, | ||
| const char * | name, | ||
| std::optional< M > | metadata, | ||
| std::optional< std::unordered_set< sparrow::ArrowFlag > > | flags, | ||
| size_t | children_count, | ||
| ArrowSchema ** | children, | ||
| ArrowSchema * | dictionary ) |
Definition at line 18 of file arrow_schema.hpp.
| SPARROW_IPC_API void sparrow_ipc::generate_body | ( | const sparrow::record_batch & | record_batch, |
| any_output_stream & | stream, | ||
| std::optional< CompressionType > | compression = std::nullopt, | ||
| std::optional< std::reference_wrapper< CompressionCache > > | cache = std::nullopt ) |
Generates a serialized body from a record batch.
This function iterates through all columns in the provided record batch, extracts their Arrow proxy representations, and serializes them into a single byte vector that forms the body of the serialized data.
| record_batch | The record batch containing columns to be serialized. |
| stream | The output stream where the serialized body will be written. |
| compression | Optional: The compression type to use when serializing. |
| cache | Optional: A cache for compressed buffers to avoid recompression if compression is enabled. If compression is given, cache should be set as well. |
|
nodiscard |
Extracts buffer information from a record batch for serialization.
This function iterates through all columns in the provided record batch and collects their buffer information into a vector of Arrow FlatBuffer Buffer objects. The buffers are processed sequentially with cumulative offset tracking.
| record_batch | The sparrow record batch containing columns to extract buffers from |
| SPARROW_IPC_API std::vector< sparrow::data_type > sparrow_ipc::get_column_dtypes | ( | const sparrow::record_batch & | rb | ) |
|
nodiscard |
Retrieves metadata describing the layout of compressed buffers within a record batch.
This function processes a record batch to determine the metadata (offset and size) for each of its buffers, assuming they are compressed using the specified algorithm. This metadata accounts for each compressed buffer being prefixed by its 8-byte uncompressed size and padded to ensure 8-byte alignment.
| record_batch | The record batch whose buffers' compressed metadata is to be retrieved. |
| compression_type | The compression algorithm that would be applied (e.g., LZ4_FRAME, ZSTD). |
| cache | A cache to store compressed buffers and avoid recompression. |
|
nodiscard |
|
nodiscard |
|
nodiscard |
|
nodiscard |
|
nodiscard |
Creates a FlatBuffer message containing a serialized Apache Arrow RecordBatch.
This function builds a complete Arrow IPC message by serializing a record batch along with its metadata (field nodes and buffer information) into a FlatBuffer format that conforms to the Arrow IPC specification.
| record_batch | The source record batch containing the data to be serialized. |
| compression | Optional: The compression algorithm to be used for the message body. |
| cache | Optional: A cache for compressed buffers to avoid recompression if compression is enabled. If compression is given, cache should be set as well. |
| std::invalid_argument | if compression is given but not cache. |
|
nodiscard |
Creates a FlatBuffer builder containing a serialized Arrow schema message.
This function constructs an Arrow IPC schema message from a record batch by:
| record_batch | The source record batch containing column definitions |
|
nodiscard |
Definition at line 49 of file magic_values.hpp.
|
nodiscard |
Definition at line 37 of file magic_values.hpp.
|
nodiscard |
Definition at line 43 of file magic_values.hpp.
|
nodiscard |
Definition at line 63 of file arrow_array.hpp.
|
nodiscard |
Definition at line 57 of file arrow_schema.hpp.
| SPARROW_IPC_API void sparrow_ipc::release_arrow_array_children_and_dictionary | ( | ArrowArray * | array | ) |
| void sparrow_ipc::release_common_non_owning_arrow | ( | T & | t | ) |
Release the children and dictionnary of an ArrowArray or ArrowSchema.
| T | ArrowArray or ArrowSchema |
| t | The ArrowArray or ArrowSchema to release. |
Definition at line 20 of file arrow_array_schema_common_release.hpp.
| SPARROW_IPC_API void sparrow_ipc::release_non_owning_arrow_schema | ( | ArrowSchema * | schema | ) |
| SPARROW_IPC_API serialized_record_batch_info sparrow_ipc::serialize_record_batch | ( | const sparrow::record_batch & | record_batch, |
| any_output_stream & | stream, | ||
| std::optional< CompressionType > | compression, | ||
| std::optional< std::reference_wrapper< CompressionCache > > | cache ) |
Serializes a record batch into a binary format following the Arrow IPC specification.
This function converts a sparrow record batch into a serialized byte vector that includes:
| record_batch | The sparrow record batch to serialize |
| stream | The output stream where the serialized record batch will be written |
| compression | Optional: The compression type to use when serializing. |
| cache | Optional: A cache to store and retrieve compressed buffers, avoiding recompression. |
| void sparrow_ipc::serialize_record_batches_to_ipc_stream | ( | const R & | record_batches, |
| any_output_stream & | stream, | ||
| std::optional< CompressionType > | compression, | ||
| std::optional< std::reference_wrapper< CompressionCache > > | cache ) |
Serializes a collection of record batches into a binary format.
This function takes a collection of record batches and serializes them into a single binary representation following the Arrow IPC format. The serialization includes:
| R | Container type that holds record batches (must support empty(), operator[], begin(), end()) |
| record_batches | Collection of record batches to serialize. All batches must have identical schemas. |
| stream | The output stream where the serialized data will be written. |
| compression | Optional: The compression type to use when serializing. |
| cache | Optional: A cache to store and retrieve compressed buffers, avoiding recompression. If compression is given, cache should be set as well. |
| std::invalid_argument | If record batches have inconsistent schemas or if the collection contains batches that cannot be serialized together. |
Definition at line 50 of file serialize.hpp.
| SPARROW_IPC_API void sparrow_ipc::serialize_schema_message | ( | const sparrow::record_batch & | record_batch, |
| any_output_stream & | stream ) |
Serializes a schema message for a record batch into a byte buffer.
Serializes a record batch schema into a binary message format.
This function creates a serialized schema message following the Arrow IPC format. The resulting buffer contains:
| record_batch | The record batch containing the schema to serialize |
| stream | The output stream where the serialized schema message will be written |
This function creates a serialized schema message by combining continuation bytes, a length prefix, the flatbuffer schema data, and padding to ensure 8-byte alignment. The resulting format follows the Arrow IPC specification for schema messages.
| record_batch | The record batch containing the schema to be serialized |
| stream | The output stream where the serialized schema message will be written |
| std::vector< sparrow::metadata_pair > sparrow_ipc::to_sparrow_metadata | ( | const ::flatbuffers::Vector<::flatbuffers::Offset< org::apache::arrow::flatbuf::KeyValue > > & | metadata | ) |
Converts FlatBuffers metadata to Sparrow metadata format.
This function takes a FlatBuffers vector containing key-value pairs from Apache Arrow format and converts them into a vector of Sparrow metadata pairs. Each key-value pair from the FlatBuffers structure is extracted and stored as a sparrow::metadata_pair.
| metadata | A FlatBuffers vector containing KeyValue pairs from Apache Arrow format |
| SPARROW_IPC_API size_t sparrow_ipc::write_footer | ( | const sparrow::record_batch & | record_batch, |
| const std::vector< record_batch_block > & | record_batch_blocks, | ||
| any_output_stream & | stream ) |
Writes the Arrow IPC file footer.
| record_batch | A record batch containing the schema for the footer |
| record_batch_blocks | Vector of block information for each record batch |
| stream | The output stream to write the footer to |
|
inlineconstexpr |
Magic bytes with padding for file header (8 bytes total for alignment)
Definition at line 34 of file magic_values.hpp.
|
inlineconstexpr |
Magic bytes for Arrow file format defined in the Arrow IPC specification: https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format The magic string is "ARROW1" (6 bytes) followed by 2 padding bytes to reach 8-byte alignment.
Definition at line 28 of file magic_values.hpp.
|
inlineconstexpr |
Definition at line 29 of file magic_values.hpp.
|
inlineconstexpr |
Continuation value defined in the Arrow IPC specification: https://arrow.apache.org/docs/format/Columnar.html#encapsulated-message-format.
Definition at line 15 of file magic_values.hpp.
|
inlineconstexpr |
End-of-stream marker defined in the Arrow IPC specification: https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format.
Definition at line 21 of file magic_values.hpp.
|
constexpr |
Definition at line 11 of file sparrow_ipc_version.hpp.
|
constexpr |
Definition at line 9 of file sparrow_ipc_version.hpp.
|
constexpr |
Definition at line 10 of file sparrow_ipc_version.hpp.
|
constexpr |
Definition at line 5 of file sparrow_ipc_version.hpp.
|
constexpr |
Definition at line 6 of file sparrow_ipc_version.hpp.
|
constexpr |
Definition at line 7 of file sparrow_ipc_version.hpp.