Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This adds an `order` kwarg to all encoders for configuring how unordered collections/objects are encoded. Options are: - `None`: the default. All objects are encoded in the most efficient manner corresponding to their in-memory representation. - `'deterministic'`: Unordered collections (sets, dicts) are sorted before encoding. This ensures a consistent output between runs, which may be useful when comparing/hashing the encoded binary representation. - `'sorted'`: same as `'deterministic'`, but *all* objet-like objects will have their fields encoded in alphabetical order by name. This is more expensive than `'deterministic'`, but may be useful for making the output more human readable. The `'deterministic'` output has been heavily optimized - given the work required to accomplish this feature, I wouldn't expect we can speed up this operation much more. The `'sorted'` option has not been fully optimized (the assumption being a human-readable output is rarely perf sensitive). If needed, there are some rather simple optimizations we can add here to speed this up further. In general, `msgspec.json.encode(obj, order="deterministic")` should be as fast or faster than `orjson.dumps(obj, option=orjson.OPT_SORT_KEYS)`. For common small object sizes we average a ~25% speedup over `orjson` for key sorting.
- Loading branch information