-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Unify model format customize string or Json #4887
Comments
@jameslamb , @shiyu1994 , @StrikerRUS , @guolinke, @hzy46 not sure your opinion on this? thanks! |
I'm for JSON too! However, I guess we should compare performance of these two formats to make a thoughtful choice. Note, XGBoost is also using JSON as their main serialization format. |
Linking original issue: #2604. |
I remember these is a multi-threading speed-up for string format model file loading. |
@guolinke 's comments make sense to me, that it would be harder to take advantage of parallelism when parsing JSON than when parsing the existing text format. But I think it's worth trying and benchmarking! I'll also note here that @lemire, author of |
@jameslamb We would be there to help if you choose to adopt simdjson. At this point, it is a mature library, with extensive tests and documentation. We support two APIs: our On Demand front-end (high perf. default) and a conventional DOM approach. |
Our friends at XGBoost are trying to adopt Universal Binary JSON format as a binary serialization format: dmlc/xgboost#7545. |
Summary
Currently we have 2 model formats: customize string and json. It is hard to maintain 2 serialize/deserialize for model. Specially for some components with complex parameters like category encoders... (almost 1/3 code is about serialize/deserialize and hard to extend). We'd better only support 1 model format.
Motivation
Json format is more reasonable to me, it is standard format and easy to handle hierarchy structure. Backward should be a big problem, I would suggest for new features we can use DumpToString() == DumpToJson(), and replace old component to use json as default format and keep customize string as deprecate method.
The text was updated successfully, but these errors were encountered: