Skip to content

Commit

Permalink
MINOR: [Docs] Clarify inlined strings in VariableLengthStringView i…
Browse files Browse the repository at this point in the history
…s padded with `0` (#40512)

### Rationale for this change
While  implementing `Variable-size Binary View Layout` (thanks @ ariesdevil !) in  apache/arrow-rs#5481 it was not 100% clear if the inlined string was zero padded. 

@ bkietz noted that 

> The spec does say "padded with zero" https://github.com/apache/arrow/blob/main/docs/source/format/Columnar.rst?plain=1#L384 but it could be repeated in the surrounding paragraph. In any case, padded with zero is definitely the intent

```
    * Short strings, length <= 12
      | Bytes 0-3  | Bytes 4-15                            |
      |------------|---------------------------------------|
      | length     | data (padded with 0)                  |
```
### What changes are included in this PR?

Add a sentence in the surrounding text to make it clear the inlined strings values are zero padded

Note I do not think this is a specification change (and therefore doesn't need a vote on the mailing list) as the spec already specifies the padding is zero (in the diagram). This simply clarifies the text to emphasize this point for ease of understanding

### Are these changes tested?

### Are there any user-facing changes?

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
alamb authored Mar 13, 2024
1 parent 1dd0d45 commit bd3fab4
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/source/format/Columnar.rst
Original file line number Diff line number Diff line change
Expand Up @@ -393,7 +393,8 @@ length of the string and can be used to determine how the rest of the view
should be interpreted.

In the short string case the string's bytes are inlined — stored inside the
view itself, in the twelve bytes which follow the length.
view itself, in the twelve bytes which follow the length. Any remaining bytes
after the string itself are padded with `0`.

In the long string case, a buffer index indicates which data buffer
stores the data bytes and an offset indicates where in that buffer the
Expand Down

0 comments on commit bd3fab4

Please sign in to comment.