choose different serialization scheme for storing configuration #2329

vladak · 2018-08-31T07:34:41Z

The XML encoder used for configuration serialization is not very robust (e.g. in the face of changing class hierarchy and removing configuration options) and has some quirks (#2002). We should consider using something else (YAML/JSON ?).

Also, this serialization is used not only for configuration but also elsewhere (IndexAnalysisSettings).

The text was updated successfully, but these errors were encountered:

tulinkry · 2018-08-31T07:50:02Z

Yes, finally.

tulinkry · 2019-02-04T07:50:13Z

Looks like yaml would be the way to go.

vladak · 2019-04-12T08:24:49Z

Also, the configuration should be treated as data, not serialized objects, to avoid security vulnerabilities that might happen when de-serializing XML into Java objects.

vladak · 2022-03-28T12:51:00Z

The other reason for using something else is performance. Lately, I realized that XMLEncoder does not scale when retrieving configuration using the RESTful API. When running a multithreaded program where each thread just retrieves the configuration in a loop, where the number of threads matches the number of CPUs, the times shoot up to almost 2 seconds, compared to single threaded program with 0.4 seconds. The XML file with the configuration has some 1.38 MB. When I got a jstack snapshot, it revealed that lots of the XMLEncoder processing threads (like 25 out of the 32 threads I was using) are waiting on internal synchronization object, with top of the stack looking like this:

"http-nio-8080-exec-1427" #29360 daemon prio=5 os_prio=64 cpu=38052.59ms elapsed=2859934.54s tid=0x000000000531c000 nid=0x7981 waiting for monitor entry  [0x00007fff808fa000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at com.sun.beans.util.Cache.get(java.desktop@11.0.7-internal/Cache.java:119)
        - waiting to lock <0x00007ff387d7f320> (a java.lang.ref.ReferenceQueue)
        at com.sun.beans.finder.MethodFinder.findMethod(java.desktop@11.0.7-internal/MethodFinder.java:81)
        at java.beans.Statement.getMethod(java.desktop@11.0.7-internal/Statement.java:369)
        at java.beans.Statement.invokeInternal(java.desktop@11.0.7-internal/Statement.java:273)
        at java.beans.Statement$2.run(java.desktop@11.0.7-internal/Statement.java:187)
        at java.security.AccessController.doPrivileged(java.base@11.0.7-internal/Native Method)
        at java.beans.Statement.invoke(java.desktop@11.0.7-internal/Statement.java:184)
        at java.beans.Expression.getValue(java.desktop@11.0.7-internal/Expression.java:155)
        at java.beans.Encoder.getValue(java.desktop@11.0.7-internal/Encoder.java:105)
        at java.beans.Encoder.get(java.desktop@11.0.7-internal/Encoder.java:252)
        at java.beans.PersistenceDelegate.writeObject(java.desktop@11.0.7-internal/PersistenceDelegate.java:112)
        at java.beans.Encoder.writeObject(java.desktop@11.0.7-internal/Encoder.java:74)
        at java.beans.XMLEncoder.writeObject(java.desktop@11.0.7-internal/XMLEncoder.java:326)

Now, I did this exercise in order to simulate read timeout problems that occur right after running all-project sync using the sync.py command. This command runs number of reindex_project.py programs in parallel and each reindex_project.py retrieves the configuration from the web app at the start. Using --api_timeout with increased value for the Python tools is usable as a workaround, however my expectation is that this should scale.

vladak · 2022-10-19T15:58:52Z

Another feature that could be brought with new serialization scheme is wildcards. For instance, I'd like to be able to set project properties for a set of projects specified with wildcards (regexps, even), similarly to what is done in opengrok-mirror configuration:

projects:
  apache-httpd-.*:
     proxy: true

vladak · 2022-12-01T13:30:16Z

YAML is probably not so great so perhaps using something like TOML might be better idea, however still need to address the need for serialization of objects like Project and RepositoryInfo. Seems like some TOML Java implementations support serialization.

vladak added the enhancement label Aug 31, 2018

vladak added the help wanted label Aug 31, 2018

vladak mentioned this issue Apr 12, 2019

multiple vulnerabilities in input data handling #2749

Open

vladak mentioned this issue May 15, 2019

Please can you publish a schema for the configuration.xml file #2768

Closed

This was referenced Apr 7, 2021

Prevent XMLDecoder from loading other than whitelisted classes #3526

Merged

different serialization scheme for history #3539

Closed

vladak mentioned this issue Aug 10, 2021

ERROR: java.lang.NoSuchMethodException: <unbound>=Configuration.getAllowInsecureTokens(); #3693

Closed

vladak removed the help wanted label Mar 28, 2022

vladak mentioned this issue Oct 27, 2023

testDeserializationOfNotWhiteListedClassThrowsError sometimes fails #4441

Open

vladak mentioned this issue Nov 29, 2023

help config provides suggester config in every element #4488

Open

vladak mentioned this issue Jul 22, 2024

convert LDAP authorization plugin configuration to YAML #4599

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

choose different serialization scheme for storing configuration #2329

choose different serialization scheme for storing configuration #2329

vladak commented Aug 31, 2018 •

edited

Loading

tulinkry commented Aug 31, 2018

tulinkry commented Feb 4, 2019

vladak commented Apr 12, 2019 •

edited

Loading

vladak commented Mar 28, 2022

vladak commented Oct 19, 2022 •

edited

Loading

vladak commented Dec 1, 2022

choose different serialization scheme for storing configuration #2329

choose different serialization scheme for storing configuration #2329

Comments

vladak commented Aug 31, 2018 • edited Loading

tulinkry commented Aug 31, 2018

tulinkry commented Feb 4, 2019

vladak commented Apr 12, 2019 • edited Loading

vladak commented Mar 28, 2022

vladak commented Oct 19, 2022 • edited Loading

vladak commented Dec 1, 2022

vladak commented Aug 31, 2018 •

edited

Loading

vladak commented Apr 12, 2019 •

edited

Loading

vladak commented Oct 19, 2022 •

edited

Loading