-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: byte slice JSON parser #1415
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1415 +/- ##
=======================================
Coverage 47.54% 47.54%
=======================================
Files 388 388
Lines 61279 61279
=======================================
Hits 29133 29133
Misses 29707 29707
Partials 2439 2439 ☔ View full report in Codecov by Sentry. |
Nice :) Have you considered porting over buger/jsonparser? It is an existing reflection-free JSON implementation. |
Yes, I'm currently working with jsonparser as a reference. |
## Description While I was working the [JSON](gnolang#1415), @harry-hov requested to update the package list. After checking, I noticed that multiple packages were absent from the list, so I include them. However, I omitted the testing package, as it appeared to be managed independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this phenomenal effort 🙏
The description helped me a lot with reading through and understanding the changes.
I've left minor comments, nothing major. We should be good to merge 🚀
# Description - Add `unicode/utf16` package. transffered directly from Go without any changes. - register to the `stdlibWhitelist` transpiler.go In an earlier JSON PR #1415 , I included this `unicode/utf16` to handle unescaping and other byte slice operations, but realized that I wasn't using it in that package, leading me to submit a separate PR sepcifically for this.
Description
I implemented a JSON parser using byte slices
State Machine For JSON
Each state transitions to the next when specific conditions are met, representing the process of parsing a JSON structure sequentially. State transitions are defined based on the conditions and actions that occur during the parsing process.
The diagram essentially includes the following states (All states are defined at
internal.gno
file):__
), returning an error if an unexpected token is encountered.ST
), number (MI
,ZE
,IN
), boolean (T1
,F1
), and null (N1
) values.co
) and arrays (bo
) (ec
,cc
,bc
).KE
) and values (VA
), and for handling commas (cm
) and colons (cl
).Each state deals with various scenarios that can occur during JSON parsing, with transitions to the next state determined by the current token and context. Below is a graph depicting how the states transition:
Walkthrough The JSON Machine
Gno is not completely compatible with Go, which means that many functions within the standard library are not fully implemented yet. Therefore, some files are added not directly related to JSON but necessary for functionality implementation.
Float Value Handler
The
strconv
package currently provided by gno has functions injected for parsing basicint
anduint
types, but does not have an implementation for parsing floating-point numbers withParseFloat
. Therefore, I have brought over the implementation of theeisel-lemire
algorithm from Go's strconv package (./p/demo/json/eisel_lemire
).1Additionally, since the
FormatFloat
function is also not implemented yet. So, I imported theryu64
algorithm 2 to implement this feature (./p/demo/json/ryu
).Anyway, I plan to add this code to the strconv package if possible, so that the necessary functionality and functions can be completely written in gno.
Buffer
buffer.gno
manages internal buffer management and interaction with the state machine for JSON parsing. The buffer processes the JSON string sequentially, interpreting the meaning of each character and deciding the next action through the state machine.Here, I'll describe the key functions and how they interact with the state machine. The
/
next to a number is a notation borrowed from Elixir to indicate the number of parameters:newBuffer
: This function creates a new buffer instance containing the given data. The initial state is set toGO
, signifying the start of parsing and preparing for subsequent parsing stages as the state machine's initial state.first
: Finds the first meaningful (non-whitespace) character. Although the state machine is not yet activated at this stage, the result of this function plays a crucial role in determining the first step of parsing.current
,next
,step
: These functions manage the current position within the buffer, reading characters or moving to the next one.current
returns the character at the current index,next
returns the next character, andstep
only moves to the next position. These movement functions are necessary to decide what input should be processed when the state machine transitions to the next state.getState
: Determines the next state based on the character at the current buffer position. This function evaluates the class (type of character) of the current character and uses a state transition table to decide the next state. This process is central to how the state machine interprets the JSON structure.numeric/1
,string/2
,word/1
: These functions parse numbers, strings, and specified word tokens, respectively. During parsing, the state machine transitions to the next state based on the current character's type and context, which is essential for accurately interpreting the structure and meaning of JSON data.skip
,skipAny/1
: Functions for skipping characters that meet certain conditions, such as moving the buffer index until a specific character or set of tokens is encountered. These functions are mainly used to manage the current state of the state machine while parsing structural elements (e.g., the end of an object or array).These functions are used to closely interact with the state machine to recognize and interpret the various data types and structures within the JSON string. The current state of the state machine changes based on each character or string the buffer processes, dynamically controlling the parsing process.
Unescape
These functions are designed to process JSON strings, specifically by managing internal buffer interactions and unescaping characters as per JSON standards. This involves translating escape sequences like
\uXXXX
for unicode characters, as well as simpler escapes like\\
,\/
,\b
,\f
,\n
,\r
, and\t
.Here's some key functions for this file:
Unescape/2
: This is the primary function that takes an input byte slice (representing a JSON string with escape sequences) and an output byte slice to write the unescaped version of the input. It processes each escape sequence encountered in the input slice and translates it into the corresponding UTF-8 character in the output slice.Unquote/2
: This function is designed to remove surrounding quotes from a JSON string and unescape its contents. It's useful for processing JSON string values to their literal representations.Node
When a JSON string is decoded, the package converts the data into a Node type.
This node type allows you to fetch and manipulate the specific values from JSON. For example, you can use the
GetKey/1
function to retrieve the value stored at a specific key, and you can useDelete
to remove the node. By doing so, enabling you to process JSON data.1: The Eisel-Lemire algorithm provides a fast way to parse floating-point numbers from strings. The core idea of this algorithm is to minimize potential errors during the conversion process from strings to numbers, while processing the conversion as quickly as possible. Eisel-Lemire is particularly useful when dealing with large amounts of numerical data, providing much faster and more accurate results than traditional parsing methods.
2: The Ryu algorithm focuses on converting floating-point numbers to strings. Ryu generally converts floating-point numbers to the shortest possible string representation accurately, with excellent performance and precision. A key advantage of the Ryu algorithm is that the converted string maintains the minimum length while precisely representing the original number. This helps save storage space and reduces data transmission times over networks.