|
sparrow-extensions 0.1.0
Extension types for the sparrow library
|
The JSON array is an Arrow-compatible array for storing JSON-encoded data according to the Apache Arrow canonical extension specification for JSON.
Each element is stored as a UTF-8 encoded string containing valid JSON data.
The JSON extension type is defined as:
arrow.jsonString (Utf8), LargeString (LargeUtf8), or StringView (Utf8View)Three variants are provided to accommodate different use cases:
| Type | Storage Type | Description |
|---|---|---|
json_array | String (32-bit offsets) | Standard choice for most JSON datasets |
big_json_array | LargeString (64-bit offsets) | For datasets exceeding 2GB cumulative string length |
json_view_array | StringView | View-based storage for optimized performance |
Use for most JSON datasets where the cumulative length of all strings is less than 2GB:
Use for very large JSON datasets where the cumulative length may exceed 2GB:
Use for optimized performance with the Binary View layout, which stores short values inline:
All JSON array variants automatically set the following Arrow extension metadata:
ARROW:extension:name: "arrow.json"ARROW:extension:metadata: ""This metadata is added to the Arrow schema, allowing other Arrow implementations to recognize the array as containing JSON data.
The json_extension type alias is defined as simple_extension<"arrow.json">, providing the extension type implementation for all JSON array variants.
| Feature | Description |
|---|---|
| Storage type | String (Utf8) |
| Offset type | 32-bit (int32_t) |
| Max cumulative size | 2^31-1 bytes (~2GB) |
| Extension name | "arrow.json" |
| Feature | Description |
|---|---|
| Storage type | LargeString (LargeUtf8) |
| Offset type | 64-bit (int64_t) |
| Max cumulative size | 2^63-1 bytes |
| Extension name | "arrow.json" |
| Feature | Description |
|---|---|
| Storage type | StringView (Utf8View) |
| Layout | Binary View (inline short strings) |
| Extension name | "arrow.json" |