-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NJsonDocUtils.writeField() converts single-element-valued collection to String #29
Comments
@jordanpadams @tloubrieu-jpl gonna need either permission to fix this in all cases, or some context for when it's desirable to convert a single-element string array to a string, to prevent this from potentially breaking a whole bunch of stuff (i.e. if end-users are reading such fields programmatically) |
Hi @alexdunnjpl , good catch but I have no idea yet why this case has been coded in the first place. As you said I suspect there is a good reason for that although that does not sound like good thing to have in the code (even less without a comment). |
Conversion originally implemented in this commit The commit message is unrelated/opaque. |
Oh, looks like it's a reimplementation of existing functionality that was unintentionally deleted in the previous commit. Original source is here and/or here |
@alexdunnjpl unless @tloubrieu-jpl has an objection, let's march forward with a PR fix, and we can evaluate if that is something we do not want to do |
I cannot be 100% sure, but it is slowly emerging from my memories that without this we would have all the properties values show as arrays even if there is a single value. See screenshot: Instead of scalar values we would have arrays with length 1. I remember I asked that change to Eugene when I started the development of the API service. But I agree this code does not look good and I would prefer if we have an option not to have arrays values in the first place for single values. That might be related to the XML parser which is used ? @alexdunnjpl could you have a look at that ? |
@tloubrieu-jpl I'll dig in to make sure I'm not misremembering something, but it sounds like the core issue is that there's a generic "properties" flow which processes fields with either collection-like or string-like values (depending on the field), but currently has no way of differentiating between the two cases. My gut feeling is that we should know prima facie what properties are collection-like, and should therefore be able to detect "collection-like strings" and "everything else" and process them accordingly? This could/should probably be done during xml parsing, I agree. I'm not certain that's a valid assumption though, and I'd probably want help enumerating the collection-likes unless they're trivially-obvious to track down. |
Taking a closer look,
Unfortunately, all values for |
The imprecisely-typed metadata construction occurs in At a glance, it looks pretty fraught to retroactively introduce any concept of typing here. It seems like the idea that metadata field values will be extracted into collections is pretty baked in. If that's the case, it may be least-worst to fall back to Eugene's original approach (i.e. the JSON writer is responsible for "typing" the fields) and extract the "convert to a single value" conditional to a function that encapsulates the business logic for making the differentiation - instead of @tloubrieu-jpl @jordanpadams thoughts? |
@jordanpadams @jshughes could we use the PDS information model (schema, schematron) to tell which element should be treated as an array, which one should not ? |
Per stand-up meeting, solution is to remove scalar conversion - all fields will be written as arrays |
Since the API is showing these values as arrays anyway (see https://pds.nasa.gov/api/search/1.0/products) we will remove the condition in registry-common to always save collections as arrays in OpenSearch. |
🐛 Describe the bug
When encountering a collection-valued field with size 1, writeField() explicitly writes a String instead of an Array with a single String element
This is resulting in undesirable behaviour in some cases.
This must be fixed, but given that someone wrote it intentionally I'm loathe to unilaterally change it in case there are some cases where it's desirable - from what I've seen this will result in numerous fields being "corrected", not just
ref_lid_collection
.📜 To Reproduce
, into a registry instance
curl -k -u admin:admin https://localhost:9200/registry/_doc/urn:nasa:pds:epoxi_mri::1.0?_source=ref_lidvid_collection,ref_lid_collection | json_pp
)ref_lid_collection
🕵️ Expected behavior
Unclear. Collection-like fields should be written as Arrays, but if there are cases where conversion is desired/required, these must be accounted for and not affected by the fix.
📚 Version of Software Used
registry-common==1.4.0
The text was updated successfully, but these errors were encountered: