Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: Invalid ElasticSearch request on pentester submissions #688

Open
dd32 opened this issue May 9, 2024 · 12 comments · May be fixed by #689
Open

Search: Invalid ElasticSearch request on pentester submissions #688

dd32 opened this issue May 9, 2024 · 12 comments · May be fixed by #689
Labels
[Type] Bug Something isn't working

Comments

@dd32
Copy link
Member

dd32 commented May 9, 2024

Describe the bug
A pentester has hit the wporg-patterns api in such a way that the pattern directory generates an invalid ElasticSearch request. Causing a 400 Bad Request warning.

I can't tell if this is supposed to work, as the code-branch doesn't currently work. I suspect this is just unexpected input to the API endpoint.

For example:

GET patterns/wp-json/wp/v2/wporg-pattern?per_page=6&curation=core&search=block

This code:

$filter['bool']['must'][] = [
'terms' => [ "taxonomy.$taxonomy.term_id" => $term['terms'] ],
];

resulted in this ES query, and response:

Query piece: {"terms":{"taxonomy.wporg-pattern-keyword.term_id":"core"}}

Error: [terms] query does not support [taxonomy.wporg-pattern-keyword.term_id]

terms valid input would be an array, and well core is never going to match a term_id.. which is what leads me to think that the endpoint is not expecting a query-by-slugs.

To Reproduce
Steps to reproduce the behavior:

  1. Query via patterns/wp-json/wp/v2/wporg-pattern?per_page=6&curation=core&search=block
  2. Get a 400 error.
  3. To get the underlying ES error, you need to be an automattician with a WordPress.com sandbox so you can get the underlying queries.

Expected behavior
Either the API should throw a error immediately if it gets invalid input OR the fields should be validated prior to querying ES.

E_USER_WARNING: jetpack_search_abort - no_search_results_array - {"errors":{"invalid_search_api_response":["Invalid response from API - 400"]},"error_data":[]} in wp-content/plugins/pattern-directory/includes/search.php:186
@dd32 dd32 added the [Type] Bug Something isn't working label May 9, 2024
@pkevan
Copy link
Contributor

pkevan commented May 9, 2024

terms valid input would be an array, and well core is never going to match a term_id.. which is what leads me to think that the endpoint is not expecting a query-by-slugs.

I wonder if the switch to the new theme is the cause, since we are using the slug here: https://github.com/WordPress/pattern-directory/blob/trunk/public_html/wp-content/themes/wporg-pattern-directory-2024/functions.php#L167-L188 - and a quick look in the previous theme doesn't appear to have the same functionality.

@pkevan
Copy link
Contributor

pkevan commented May 9, 2024

Actually, I wonder if the taxonomy doesn't exist on the ES side, if it was added? I'll investigate that too.

@pkevan
Copy link
Contributor

pkevan commented May 9, 2024

Actually, I wonder if the taxonomy doesn't exist on the ES side, if it was added? I'll investigate that too.

It does exist, so it's not this 😄

@pkevan pkevan linked a pull request May 9, 2024 that will close this issue
@ryelle
Copy link
Contributor

ryelle commented May 9, 2024

The curation param should be a string, that is expected — it was added almost a year ago #583

But the core request could also create a query with a term_id (using pattern-keywords) when fetching these remote patterns. For example: /patterns/wp-json/wp/v2/wporg-pattern?page=1&per_page=100&order=desc&orderby=date&locale=en_US&wp-version=6.6-alpha-58124&pattern-keywords=11

That would generate a tax_query with term_ids. So both are possible. Maybe the code in search.php should check for the field in the tax query in addition to checking the tax name?

@dd32
Copy link
Member Author

dd32 commented May 9, 2024

Ah! it's pattern-keywords that I was thinking of having term_id's. That threw me for a spin when I initially looked at this, and I was second guessing if it was expected or junk.

Changing to term as per #689 seems straight forward, but I would probably suggest it should be terms and just casted to an array, to account for multiple-term-queries, in conjunction with checking the field for term_id or slug.. or just assuming that all numeric-inputs would be a term_id, and otherwise by slug.

@pkevan
Copy link
Contributor

pkevan commented May 10, 2024

That would generate a tax_query with term_ids. So both are possible. Maybe the code in search.php should check for the field in the tax query in addition to checking the tax name?

Thanks for the extra details @ryelle - i'll work up the PR to account for that, as well as the potential @dd32 mentioned with multiple terms.

@pkevan
Copy link
Contributor

pkevan commented May 10, 2024

Changing to term as per #689 seems straight forward, but I would probably suggest it should be terms and just casted to an array, to account for multiple-term-queries, in conjunction with checking the field for term_id or slug.. or just assuming that all numeric-inputs would be a term_id, and otherwise by slug.

For curation, it seems like we only allow one variant:

curation is not one of all, core, and community. is the message when you add multiple.

But the core request could also create a query with a term_id (using pattern-keywords) when fetching these remote patterns. For example: /patterns/wp-json/wp/v2/wporg-pattern?page=1&per_page=100&order=desc&orderby=date&locale=en_US&wp-version=6.6-alpha-58124&pattern-keywords=11

That would generate a tax_query with term_ids. So both are possible. Maybe the code in search.php should check for the field in the tax query in addition to checking the tax name?

The change in #689 doesn't appear to change the output when querying pattern-keywords from what I can see - can you try testing it to double check i'm not testing it incorrectly @ryelle 😄

@ryelle
Copy link
Contributor

ryelle commented May 10, 2024

With #689, the curation=core requests no longer error, but I don't think it's returning correctly. These two queries should return the same set of results, since curation=core is the same as pattern-keywords=11, but using curation returns an empty set. pattern-keywords=11 returns 3 results (on prod).

Also, on the PR, I get the jetpack_search_abort - no_search_results_array error when running a search with the pattern-keywords value (pattern-keywords is just the query name for the wporg-pattern-keyword tax). Try it with the second URL above.

One thing that was throwing me off is that https://wordpress.org/patterns/?s=dark currently works, but it sounds like it shouldn't, since that uses curation=core. Turns out, the frontend of the site isn't using ES anymore, since it only applies to REST requests. Sure enough, if I disable the filter, it breaks (even with #689, but I don't know exactly how to debug the ES query to continue investigating).

@pkevan
Copy link
Contributor

pkevan commented May 13, 2024

https://wordpress.org/patterns/wp-json/wp/v2/wporg-pattern?curation=core&search=dark
https://wordpress.org/patterns/wp-json/wp/v2/wporg-pattern?pattern-keywords=11&search=dark

These return different results on prod, so suspect there is something else afoot - it looks like the search keyword isn't being in the same way - in the curation one it's just via pattern_content whereas pattern-keyword is using several fields to query on.

Turns out, the frontend of the site isn't using ES anymore, since it only applies to REST requests. Sure enough, if I disable the filter, it breaks (even with #689, but I don't know exactly how to debug the ES query to continue investigating).

Hmmm, I guess it used to work?

@ryelle
Copy link
Contributor

ryelle commented May 13, 2024

Hmmm, I guess it used to work?

Yeah, the frontend used to be JS-powered, so it used the API too. Now it's not, so it doesn't used ES at all.

@pkevan
Copy link
Contributor

pkevan commented May 13, 2024

What's the preference here? Use ES or not, i'm not sure I understand how the site needs to function enough to know.

@ryelle
Copy link
Contributor

ryelle commented May 13, 2024

Yes, it should still use ES on the frontend too (not just the API).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants