Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] Add back preference for searching _primaries or _replicas #6046

Closed
mattweber opened this issue Jan 27, 2023 · 24 comments · Fixed by #7375
Closed

[DISCUSS] Add back preference for searching _primaries or _replicas #6046

mattweber opened this issue Jan 27, 2023 · 24 comments · Fixed by #7375
Assignees
Labels
discuss Issues intended to help drive brainstorming and decision making distributed framework enhancement Enhancement or improvement to existing feature or request

Comments

@mattweber
Copy link
Contributor

With segment replication and the potential lag of replicas, it would be nice to support search routing for _primaries and _replicas. With this preference we can keep traffic off our indexers and/or only on primaries to ensure the most up-to-date data. This was previously supported in Elaslticsearch <= 6x but removed for 7x+. I imagine it wouldn't be too hard to revert that PR (it is pre-license change) to re-enable this support. What do you think?

@mattweber
Copy link
Contributor Author

cc @reta @dblock I can work on this if you would consider merging it

@andrross
Copy link
Member

@mch2 What do you think about this idea?

@andrross
Copy link
Member

We have some longer term plans regarding separating readers and writers, but this seems like low hanging fruit for a useful capability today.

@mch2
Copy link
Member

mch2 commented Jan 27, 2023

Love this idea, @mattweber Thanks for raising this! We had been considering the possibility of a priority/urgent request param to fwd requests only to primaries given the added replication lag, but having an option to control routing to both primaries & replicas is even better. This would be a great addition.

@anasalkouz
Copy link
Member

@mattweber Thanks for raising this idea. Feel free to pick up this and submit PR. I will assigned it to you.

@anasalkouz anasalkouz added distributed framework enhancement Enhancement or improvement to existing feature or request labels Jan 27, 2023
@nknize
Copy link
Collaborator

nknize commented Jan 27, 2023

@mattweber I remember the conversation to remove that feature in 7x+ to avoid trappy hot spots. It was a controversial change for sure.

I'm all for reverting as a simple mechanism, but maybe just enable it for indexes using segrep?

Another idea we had been toying w/ as @mch2 mentioned was to add something like priority={CRITICAL | HIGH | MEDIUM | LOW} as an API mechanism to help triage query routing and guide ARS. CRITICAL queries will only be executed on the primary and pre-empt other lower priority queries, lower priority queries can run on primaries or replicas but they're ordered and pre-empted by priority. Perhaps that could be a follow on enhancement.

@andrross
Copy link
Member

With this preference we can keep traffic off our indexers

@mattweber Just curious how far this feature will get you towards this goal, given that the default behavior is to allocate primary and replica shards evenly across all nodes. Do you have any mechanism to change the shard allocation?

@mattweber
Copy link
Contributor Author

@andrross you are right by default and where nodes have more than one shard this still won't 100% solve things. I main use-case is currently querying the primary for most up-to-date data to validate document counts after a load.

@mattweber
Copy link
Contributor Author

@nknize sure I can do that if you prefer. IMO, I guess a trappy hot spot is still possible even with segrep so it might be better to just add it back and document the potential host spot?

@nknize
Copy link
Collaborator

nknize commented Jan 27, 2023

main use-case is currently querying the primary for most up-to-date data to validate document counts after a load.

This is the purpose of the priority query idea mentioned by @mch2. We haven't concretely pursued this yet because we haven't quantified (average) consistency metrics yet to determine if it's a mechanism worth introducing or if it will just exacerbate hot spots. (@mch2 maybe you have more info here). In essence, mechanisms like preference are really just "hacks" to push the system closer to "strongly consistent" than it's "eventually consistent" nature.

@nknize
Copy link
Collaborator

nknize commented Jan 27, 2023

... add it back and document the potential host spot?

I lean towards this as well. Some folks are less accommodating, though, and strongly oppose trappy leniency so maybe we open a PR to do this (revert the change) and give sufficient feedback time for strong viewpoints to veto and offer an alternative?

@nknize nknize added the discuss Issues intended to help drive brainstorming and decision making label Jan 27, 2023
@nknize nknize changed the title Add preference for searching _primaries or _replicas [DISCUSS] Add back preference for searching _primaries or _replicas Jan 27, 2023
@mattweber
Copy link
Contributor Author

@nknize sure I will work on the revert and open the PR for others to review.

I believe we should not handicap people because others might do something bad with it. We can have sane defaults and documentation about potential issues. When someone decides not to follow those and change the value, it is really on them if they run into the issue we told them could happen. This is one of the things that has turned me off of Elasticsearch, too much hand holding and blocking advanced settings/features because someone might do something dumb with it.

@nknize
Copy link
Collaborator

nknize commented Jan 27, 2023

This is one of the things that has turned me off of Elasticsearch, too much hand holding and blocking advanced settings/features because someone might do something dumb with it.

💯 * 1000

@mch2
Copy link
Member

mch2 commented Jan 27, 2023

We haven't concretely pursued this yet because we haven't quantified (average) consistency metrics yet to determine if it's a mechanism worth introducing or if it will just exacerbate hot spots. (@mch2 maybe you have more info here).

These metrics are proportional to the replication delay that will vary based on network performance, shard size, refresh times and shard/replica/node counts. With that said we've tested with configurations and seen ms/second delays and others in minutes. Will document all this to help guide, but the feature in general is going to require considering all these factors to come up with appropriate settings and if options like this search routing make sense to use.

Related, am also working on #4478 to provide an optional backpressure guardrail to prevent replicas from falling too far behind.

@mattweber
Copy link
Contributor Author

@mch2 FYI, just used it on a 2.8TB primary index with 155M docs with amazing results. Very happy with it so far.

@mattweber
Copy link
Contributor Author

I have the revert done but get a failing test with REPLICA_FIRST preference tests so I need to dig into that a bit more before I open the PR. Hopefully I will have something ready later this week.

@dblock
Copy link
Member

dblock commented Jan 30, 2023

@mch2 FYI, just used it on a 2.8TB primary index with 155M docs with amazing results. Very happy with it so far.

but does it work for large indexes? :)

@anasalkouz
Copy link
Member

but does it work for large indexes? :)

I think there are many factors to decide if SegRep is a good option or not like # of primary shards, # of replicas, # of nodes, index size, workload ..etc, we expect this to perform even better once we integrate SegRep with Remote Storage.
@mch2 we should have a clear guide for users on when we recommend to enable SegRep.

@mch2
Copy link
Member

mch2 commented Mar 7, 2023

Hi @mattweber checking in here. I'd be happy to help out with digging into the failing test if you'd like to throw up a draft of your revert?

@nknize
Copy link
Collaborator

nknize commented Mar 16, 2023

@mattweber I'm also happy to help move this forward if you'd like. Do you have a WIP branch we could collaborate? I'd like to move forward with this sooner than later and happy to create one but I think you could knock out rev1 pretty quickly.

@shwetathareja
Copy link
Member

ping @mattweber are you planning to take it forward? else happy to help.

@kotwanikunal
Copy link
Member

@shwetathareja / @nknize / @mattweber : We have a similar use case for searchable snapshots where we would like to route requests to particular shards to maximize cache efficiency.
I am diving into this issue to see if we can achieve both the goals with a single/combined solution.
I will keep this issue up to date with my findings.

@anasalkouz / @andrross

@kotwanikunal kotwanikunal self-assigned this Apr 28, 2023
@shwetathareja
Copy link
Member

@kotwanikunal As the first step, we can bring this commit and then we can improvise. Thoughts?

@kotwanikunal
Copy link
Member

@shwetathareja Raised a PR to add the commit back in as is: #7375

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making distributed framework enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants