Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Thinking Sphinx configuration file: add batch_size #3162

Merged
merged 1 commit into from
Jan 18, 2023

Conversation

mayorova
Copy link
Contributor

What this PR does / why we need it:

Updating the thinking_sphinx.yml configuration, by:

  • organize the different settings according to what they configure (thinking sphinx, searchd, etc.)
  • remove the environments development, test, preview that are not needed in the container image.
  • add batch_size configuration. It is required for reindexing in SaaS, as with the default value 1000 reindexing with rake ts:rt:index fails with the following error:
    ThinkingSphinx::QueryLengthError: The supplied SphinxQL statement is 10266420 characters long. The maximum allowed length is 8388603.
    
    Probably that's because of indexing CMS contents that can be quite long.

Which issue(s) this PR fixes

No JIRA, that's part of SaaS migration activity.

Verification steps

Special notes for your reviewer:

This file will also be applied for on-premises, but should not change the behavior of the on-premises sphinx.

Copy link
Contributor

@akostadinov akostadinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could you add changes description to commit message so that we don't need github to see why it was done?

config/docker/thinking_sphinx.yml Show resolved Hide resolved
# see https://gist.github.com/pat/a7d73376dd657b4457092efc9e9c418a
# ----------------------------------
mysql41: 9306
address: <%= ENV['THINKING_SPHINX_ADDRESS'] || '0.0.0.0' %>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like daemon address. 0.0.0.0 means bind to all IPs of host.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this is so, it used to be like that.
The 3scale operator is overriding this anyway, by settings to 0.0.0.0 for the sphinx pod, and system-sphinx for other system-app pods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense to be 0.0.0.0. Because we don't know IP in advance. My point was about your grouping, that perhaps it should go under daemon settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah, OK.
Re: grouping, I used the following rules:

  1. If it appears in https://gist.github.com/pat/a7d73376dd657b4457092efc9e9c418a, it's "thinking sphinx" config (address specifically is used not only for the daemon, but also for the client)
  2. If it's not "thinking sphinx", then it belongs to the group that the config appears in in the Sphinx documentation: https://sphinxsearch.com/docs/current.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine with me, apparently it can't be completely unambiguous

So I only request info to go to commit message, other than that LGT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M

big_document_ids: true

configuration_file: <%= ENV.fetch('THINKING_SPHINX_CONFIGURATION_FILE') { Rails.root.join("config/#{Rails.env}.sphinx.conf")} %>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems also like daemon config, but I guess needed also to generate it in the first place...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thinking sphinx will generate the .conf file and place it to the location specified here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was about grouping of settings. No problem with the value :)

- add batch_size configuration (see below)
- remove the environments development, test, preview that are not needed in the container image
- group the different settings according to what they configure (thinking sphinx, searchd, etc.)

batch_size setting is required for reindexing in SaaS, as with the default value 1000 reindexing with `rake ts:rt:index` fails with the following error:
```
ThinkingSphinx::QueryLengthError: The supplied SphinxQL statement is 10266420 characters long. The maximum allowed length is 8388603.
```
Probably that's because of indexing CMS contents that can be quite long.
@mayorova mayorova force-pushed the update-thinking-sphinx-configuration branch from 4f22b39 to 317cb83 Compare January 17, 2023 16:08
@mayorova mayorova merged commit 94f7ce0 into master Jan 18, 2023
@mayorova mayorova deleted the update-thinking-sphinx-configuration branch January 18, 2023 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants