Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new neon_local option to spin up read-only replica. #3714

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lubennikovaav
Copy link
Contributor

@lubennikovaav lubennikovaav commented Feb 28, 2023

UPD:

Usage example:

./target/debug/neon_local start
./target/debug/neon_local tenant create --set-default
./target/debug/neon_local pg start repl --hot-standby=true

This is a rebased version of #850
This is only a control_ctl part of the patch, that starts postgres in hot standby mode:

psql -p 55432 postgres -U cloud_admin -c 'select pg_is_in_recovery();'
 pg_is_in_recovery 
-------------------
 t
(1 row)

Some design questions:

  • Should we instead use pg start replica --timeline-id=... --hot_standby=on syntax and not depend on primary node name at all?
    @MMeent , WDYT?

  • How to pass safekeeper conninfo to primary_conninfo? And how to handle possible safekeeper failures?

@arssher , is there any service discovery that tracks state of safekeepers and reports it to control plane?
There is a broker, but IIUC, only pageserver and safekeepers use it.

pass full list of all safekeepers to primary_conninfo

@MMeent
Copy link
Contributor

MMeent commented Feb 28, 2023

Should we instead use pg start replica --timeline-id=... --hot_standby=on syntax and not depend on primary node name at all?

I think that would be quite useful, yes.

@arssher
Copy link
Contributor

arssher commented Feb 28, 2023

is there any service discovery that tracks state of safekeepers and reports it to control plane?

No, and using it would be hard, as in turn it might fail and/or have connectivity issues with some of safekeepers. I'd propose to just wire full list of all safekeepers to primary_conninfo (connection strings support multiple hosts). That would be already a good start. Later we can also add logic like 'if currently seemingly good connection doesn't provide new data for a long time, let's try another one'. Or think about declaring safekeeper itself as disconnected from the quorum.

@lubennikovaav lubennikovaav force-pushed the ro_replica_comute_ctl branch from 2571ccc to afaaeb9 Compare March 6, 2023 10:50
@lubennikovaav lubennikovaav changed the title WIP: Add new neon_local option to spin up read-only replica. Add new neon_local option to spin up read-only replica. Mar 6, 2023
@lubennikovaav lubennikovaav requested a review from MMeent March 6, 2023 11:09
@lubennikovaav lubennikovaav marked this pull request as ready for review March 6, 2023 11:09
@lubennikovaav lubennikovaav requested review from a team as code owners March 6, 2023 11:09
@lubennikovaav lubennikovaav requested review from problame and removed request for a team and problame March 6, 2023 11:10
@@ -918,6 +972,12 @@ fn cli() -> Command {
.help("Specify Lsn on the timeline to start from. By default, end of the timeline would be used.")
.required(false);

let hot_standby_arg = Arg::new("hot-standby")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this moment you need to specify

--hot-standby=true

What if leave just flag --hot-standby?

let hot_standby_arg = Arg::new("hot-standby")
        .long("hot-standby")
        .action(ArgAction::SetTrue)
        .help("If set, the node will be a hot replica on the specified timeline")
        .required(false);

and then instead of

let hot_standby = sub_args
                .get_one::<bool>("hot-standby")
                .copied()
                .unwrap_or(false);

you can just use

let hot_standby = sub_args.get_flag("hot-standby");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that still allow --hot-standby=false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be false if you're not providing this flag

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that doesn't answer my question. Can you make it explicit that you're not using hot standby with --hot-standby=false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't try it, personally. But why do you need this?

for instance ls command. and when you provide ls -l it changes the representation. You don't need to write ls --list=false

Therefore I'm trying to understand the use case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that you need to template the neon_local

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we use it in k8s/containers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_ctl is used here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we use it in k8s/containers?

Nope, neon_local is for local development only.
I think, for k8s we will just provide all these config lines from console before starting the compute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it also need the standby.signal and some other special handling?

@@ -13,7 +13,7 @@ use std::io::BufRead;
use std::str::FromStr;

/// In-memory representation of a postgresql.conf file
#[derive(Default)]
#[derive(Default, Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change still needed? I couldn't find the code that would need this.

Replication::Static(Lsn::from_str(lsn_str)?)
} else if let Some(slot_name) = conf.get("primary_slot_name") {
let slot_name = slot_name.to_string();
let prefix = format!("repl_{}_{}_", timeline_id, name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let prefix = format!("repl_{}_{}_", timeline_id, name);
let prefix = format!("repl_{}_", timeline_id);

See https://github.com/neondatabase/neon/pull/3714/files#diff-8219525a31af104cdd3e3ced077045b49cdb50819bd97f50eb4f642dc0565047R421

@MMeent MMeent mentioned this pull request Mar 10, 2023
7 tasks
MMeent added a commit that referenced this pull request Apr 27, 2023
Notes:
 - This still needs UI support from the Console
 - I've not tuned any GUCs for PostgreSQL to make this work better
 - Safekeeper has gotten a tweak in which WAL is sent and how: It now
sends zero-ed WAL data from the start of the timeline's first segment up to
the first byte of the timeline to be compatible with normal PostgreSQL
WAL streaming.
 - This includes the commits of #3714 

Fixes one part of #769

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants