Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducer.. wt=phps incompatibility with /admin/luke still in doubts. .. #2114

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mkhludnev
Copy link
Member

Expected :a:1:{s:3:"map";a:1:{s:10:"lucene doc";a:3:{s:2:"id";s:1:"2";s:4:"cost";s:1:"5";s:4:"name";a:1:{s:7:"foo bar";}}}}
Actual   :a:1:{s:3:"map";a:1:{s:10:"lucene doc";i:0;a:3:{s:2:"id";s:1:"2";s:4:"cost";s:1:"5";s:4:"name";a:1:{i:0;s:7:"foo bar";}}}}

@mkhludnev mkhludnev changed the title reproducer.. still in doubts. .. reproducer.. wt=phps incompatibility with /admin/luke still in doubts. .. Dec 4, 2023
@thomascorthals
Copy link

Looks like there's a small mistake in the expected value. If name is multivalued, the PHPS representation must contain an index and a value.

a:1:{s:3:"map";a:1:{s:10:"lucene doc";a:3:{s:2:"id";s:1:"2";s:4:"cost";s:1:"5";s:4:"name";a:1:{i:0;s:7:"foo bar";}}}}

You can run some PHP code online to check the output: https://onlinephp.io?s=tY5NCsIwEEb3hd4hzCYVXPRHESaCuCi4EOwNSkxHGmib0DYiine3gYAX0NkM8wa-7-0PtrVxFEekWsN4-bCkZmqQr1l1qurychZxdJdj3bjeJm6YaNSy009KuMQMXxMWCL20IMKZpQidUzQQa4zyuPA4R9ANiOWPkPu9QVBmmgPaBjTInkKUxnRhO4SbMewqRxDvZfhqJb66RzU72f1O1nf-RfgD&v=8.2.10

@@ -84,9 +87,37 @@ public void writeResponse() throws IOException {

@Override
public void writeNamedList(String name, NamedList<?> val) throws IOException {
/*SimpleOrderedMap<Object> copy = new SimpleOrderedMap<>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove me. wrong way

@dsmiley
Copy link
Contributor

dsmiley commented Dec 4, 2023

A side: I recall some doubt as to whether Solr ought to even have a "php" writer in the first place. Maybe it should be deprecated. Surely JSON is the standard? WDYT
CC @epugh

@thomascorthals
Copy link

As someone on the PHP side of things, there's a good reason to look into PHPS with the /luke handler in particular.

curl 'http://localhost:8983/solr/techproducts/admin/luke?id=VDBDB1A16&json.nl=flat'

Snippet of the output:

      "cat":{
        "type":"string",
        "schema":"I-S-UM----OF-----l",
        "flags":"ITS-------OF------",
        "value":"electronics",
        "internal":"electronics",
        "docFreq":15},
      "cat":{
        "type":"string",
        "schema":"I-S-UM----OF-----l",
        "flags":"ITS-------OF------",
        "value":"memory",
        "internal":"memory",
        "docFreq":3},

The output for a SimpleOrderedMap can contain duplicate keys and this can't be controlled with json.nl. Which means PHP's json_decode() loses data. PHPS solves this by mangling the keys to cat, cat 1, cat 2

This is a (admittedly very rare) case where dealing with JSON output in PHP is cumbersome.

@epugh
Copy link
Contributor

epugh commented Dec 5, 2023

A side: I recall some doubt as to whether Solr ought to even have a "php" writer in the first place. Maybe it should be deprecated. Surely JSON is the standard? WDYT CC @epugh

I think it's a valid question.. That applies to all the language specific writer types... Especially if we start supporting code generated API libraries that some of these wrappers for Solr could think about adopting? @thomascorthals, maybe a conversation for mailing list, but I'd love you thoughts on if we generated a PHP library, would that make your life easier?

@epugh
Copy link
Contributor

epugh commented Dec 5, 2023

So... I think we are faced here with a different issue than how PHP is handling the output.... When I run the curl command that was shared, I also see the cat key come back mulitple times... And that is a problem with the JSON output.. indeed, running

curl 'http://localhost:8983/solr/techproducts/admin/luke?id=VDBDB1A16&json.nl=flat'  | jq .

and we drop the extra cat clauses from the original. It seems like we actually have a bug???

@epugh
Copy link
Contributor

epugh commented Dec 5, 2023

Okay, so I see that cat is duplicated because it's a multivalued field, so we get:

      "cat":{
        "type":"string",
        "schema":"I-S-UM----OF-----l",
        "flags":"ITS-------OF------",
        "value":"electronics",
        "internal":"electronics",
        "docFreq":12
      },
      "cat":{
        "type":"string",
        "schema":"I-S-UM----OF-----l",
        "flags":"ITS-------OF------",
        "value":"memory",
        "internal":"memory",
        "docFreq":3
      },

Shouldn't this instead be instead an array of values for the cat key? I know this changes things...:

      "cat":[{
        "type":"string",
        "schema":"I-S-UM----OF-----l",
        "flags":"ITS-------OF------",
        "value":"electronics",
        "internal":"electronics",
        "docFreq":12
      },
      {
        "type":"string",
        "schema":"I-S-UM----OF-----l",
        "flags":"ITS-------OF------",
        "value":"memory",
        "internal":"memory",
        "docFreq":3
      }],

@mkhludnev
Copy link
Member Author

I hardly get common hesitation over duplicated JSON keys. Why it is a problem? Github come up with 480 json PHP parsers. Isn't there a library, which don't feel so unique about keys?

@dsmiley
Copy link
Contributor

dsmiley commented Dec 5, 2023

Remember json.nl parameter options, which exist to balance convenience of the response with rigor over retrieving duplicated keys. But that only applies to JSON. https://solr.apache.org/guide/solr/latest/query-guide/response-writers.html#json-nl

@epugh
Copy link
Contributor

epugh commented Dec 5, 2023

I guess my point is that the actual API response, in JSON, is outputting invalid JSON. That the fix is to change the way we handle multivalued fields to nest them as an array under the top level key....

@dsmiley
Copy link
Contributor

dsmiley commented Dec 5, 2023

I guess my point is that the actual API response, in JSON, is outputting invalid JSON. That the fix is to change the way we handle multivalued fields to nest them as an array under the top level key....

It's invalid but the user is in control of supplying a different value to json.nl which avoids the invalidity. Use json.nl=arrmap for example. No change in Solr is necessary.

But this work item is about php.

@thomascorthals
Copy link

@thomascorthals, maybe a conversation for mailing list, but I'd love you thoughts on if we generated a PHP library, would that make your life easier?

I had the same thought when I saw the generated Python client. Solarium aims at accurately modelling Solr by providing a fluent interface to build queries and parsing the results to PHP objects. The implementation for API requests is currently generic and just gets you the output from Solr as an array.

Isn't there a library, which don't feel so unique about keys?

There is, and I'm already using it to parse that particular piece of JSON. There are two things at play for me:

  • Solarium supports JSON and PHPS output since before I was using it. I'm trying not to break that. (If Solr deprecates PHPS, that's a different story of course.)
  • I was curious how parsing it would compare performance wise.

Remember json.nl parameter options

That is for NamedLists, the "offending" data structure is a SimpleOrderedMap and those aren't affected by json.nl.

the actual API response, in JSON, is outputting invalid JSON.

Technically, it isn't. ECMA-404 doesn't say anything about it. RFC 8259 says names SHOULD be unique, not MUST. But you can see how this becomes a nuisance when decoding to a language that has no internal data structure that allows duplicate keys.

@dsmiley
Copy link
Contributor

dsmiley commented Dec 5, 2023

Sorry; I wasn't paying enough attention. We shouldn't be using a SimpleOrderedMap in any scenario where keys can be duplicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants