Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize windows with last up-to-WINDOW_SIZE blobs #524

Merged
merged 3 commits into from
Jul 9, 2018

Conversation

rob-solana
Copy link
Contributor

@rob-solana rob-solana commented Jul 2, 2018

the goal of this PR is to have full windows on all nodes at all times, which should reduce "failed RequestWindowIndex" in replication during test_multi_node_dynamic_network().

@rob-solana rob-solana added the work in progress This isn't quite right yet label Jul 2, 2018
@rob-solana
Copy link
Contributor Author

hoping to get feedback as I go

@garious
Copy link
Contributor

garious commented Jul 2, 2018

Just a thought, how about passing the uninitialized window right into Bank::process_ledger? Let it fill the window as it goes. 99% of the entries would get pushed right back out, but maybe that's the no problem. I'd think that it'd allow us to reuse the existing windowing code.

@rob-solana
Copy link
Contributor Author

I did re-use the existing windowing code (via copy-paste)... the only thing I didn't copy is the "whack the previous blobs" code

@rob-solana rob-solana closed this Jul 2, 2018
@rob-solana rob-solana reopened this Jul 2, 2018
@rob-solana
Copy link
Contributor Author

the only place we populate windows is in broadcast(), apparently

@rob-solana
Copy link
Contributor Author

now it's actually compiling

eprintln!("processed {} ledger...", entry_height);

let window_entries = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a split_at() before process_ledger() so that you don't need to clone() it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please ignore the .collect()... deemed infeasible in Discord#development discussion. therefore, split_at() non-option

src/streamer.rs Outdated
blobs: VecDeque<SharedBlob>,
entry_height: u64,
) -> Window {
let window = Arc::new(RwLock::new(vec![None; WINDOW_SIZE as usize]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_window()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

src/server.rs Outdated
@@ -47,6 +50,7 @@ impl Server {
pub fn new_leader<W: Write + Send + 'static>(
bank: Bank,
entry_height: u64,
window_entries: Option<Vec<Entry>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about passing in a Window? We could always wrap these functions with things like new_default_leader to make it easier on tests (and the drone).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do, but still need tail -WINDOW_SIZE from the entries iterator...

also, means passing a blob_recycler down to new_leader, because that's the one broadcaster uses...

src/server.rs Outdated
let window = streamer::default_window();

let blob_recycler = BlobRecycler::default();
let window = match window_entries {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste in one PR!? 👎

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will collapse once I grok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whacked. only leader needs a populated window for this PR.

@mvines mvines added noCI Suppress CI on this Pull Request and removed noCI Suppress CI on this Pull Request labels Jul 3, 2018
@rob-solana rob-solana removed the noCI Suppress CI on this Pull Request label Jul 3, 2018
@rob-solana rob-solana force-pushed the populate-initial-window branch 3 times, most recently from 8f71272 to 2acf401 Compare July 4, 2018 00:34
@rob-solana
Copy link
Contributor Author

next up is initializing validators' windows from the ledger, should be a short trip from here.

comments on my implementation of "tail" in process_ledger are greatly appreciated

@rob-solana rob-solana removed the work in progress This isn't quite right yet label Jul 5, 2018
Copy link
Contributor

@garious garious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned the system will work (even long-term) without this PR. If you disagree, can you update the PR description with a note that helps me understand?

src/bank.rs Outdated
let bank = Bank::default();
bank.process_ledger(ledger).unwrap();
let (ledger_height, tail) = bank.process_ledger(ledger).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a second process_ledger test for when the ledger is longer than the tail? It should probably test that the last Entry.id in the window matches bank.last_id().

src/fullnode.rs Outdated
Some(ledger_tail) => {
let mut blobs = VecDeque::new();
ledger_tail.to_blobs(&blob_recycler, &mut blobs);
streamer::initialized_window(&crdt, blobs, entry_height)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a test for how the leader behaves differently on this branch.

src/streamer.rs Outdated
assert!(blobs.len() <= win.len());

// flatten deque to vec
let mut blobs: Vec<_> = blobs.into_iter().collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can push this on the caller. Maybe the author of the caller will conclude its vector should be changed to Vec.

@@ -458,6 +458,45 @@ pub fn default_window() -> Window {
Arc::new(RwLock::new(vec![None; WINDOW_SIZE as usize]))
}

/// Initialize a rebroadcast window with most recent Entry blobs

This comment was marked as resolved.

src/streamer.rs Outdated
);
// Index the blobs
let mut received = entry_height - blobs.len() as u64;
Crdt::index_blobs(crdt, &blobs, &mut received).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error will be panicked on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replicated data lock, or all kinds of blob operations... I can change to an expect()...

@rob-solana rob-solana force-pushed the populate-initial-window branch 4 times, most recently from 24cdff1 to 884d4a4 Compare July 6, 2018 19:17
@rob-solana rob-solana added the work in progress This isn't quite right yet label Jul 6, 2018
@rob-solana
Copy link
Contributor Author

rob-solana commented Jul 9, 2018

fixes issue #299

@rob-solana
Copy link
Contributor Author

realizing needs a test (a validator that starts with an old ledger)

@rob-solana rob-solana removed the work in progress This isn't quite right yet label Jul 9, 2018
@garious
Copy link
Contributor

garious commented Jul 9, 2018

@rob-solana, fyi, https://help.github.com/articles/closing-issues-using-keywords/. There's a special syntax to get GitHub to auto-close issues, and it needs to be in the PR description of in any of the PR's commit messages. Won't work from a PR comment.

Copy link
Contributor

@garious garious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. I see it as a restartable leader feature though, not a restartable validator feature.

fn restart_leader(
exit: Option<Arc<AtomicBool>>,
leader_fullnode: Option<FullNode>,
ledger_path: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can cut down on a bunch of clone() calls by making this a &str.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I have to to_str() for InFile::Path() and OutFile::Path()?


let mut client = mk_client(&validator_data);
let getbal = retry_get_balance(&mut client, &bob_pubkey, Some(leader_balance));
assert!(getbal == Some(leader_balance));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_eq will offer a better error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack


// create a "stale" ledger by copying current ledger
let mut stale_ledger_path = ledger_path.clone();
stale_ledger_path.insert_str(ledger_path.rfind("/").unwrap() + 1, "stale_");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@rob-solana rob-solana Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using with_file_name() or with_extension() sends me down a rabbit hole of conversion from a PathBuf back to a String (which doesn't always work). Lots more code for this really simple test case unless I stay in String land as long as possible.

let mut stale_ledger_path = ledger_path.clone();
stale_ledger_path.insert_str(ledger_path.rfind("/").unwrap() + 1, "stale_");

std::fs::copy(ledger_path.clone(), stale_ledger_path.clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither clone() should be needed there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

333 |     std::fs::copy(ledger_path, stale_ledger_path)
    |                   ----------- value moved here
334 |         .expect(format!("copy {} to {}", &ledger_path, &stale_ledger_path,).as_str());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By changing from ledger_path.clone() to &ledger_path

}

#[test]
fn test_leader_restart_validator_start_from_old_ledger() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here describing the edge case? My understanding, "Test the case where both a leader and validator are starting up at roughly the same time, but the leader has a more recent copy of the ledger. This test ensures the leader makes its most recent entries available to the validator."

@garious
Copy link
Contributor

garious commented Jul 9, 2018

cc #310

@garious garious changed the title support an initial window filled with last up-to-WINDOW_SIZE blobs Initialize windows with last up-to-WINDOW_SIZE blobs Jul 9, 2018
@rob-solana
Copy link
Contributor Author

what's "cc" do?

@garious
Copy link
Contributor

garious commented Jul 9, 2018

That's a convention @mvines started using here. It's just to notify subscribers of that issue of this PRs existence, much like you'd CC folks in email. If you go to that issue, you'll see the cross-link.

@rob-solana rob-solana merged commit 90a4ab7 into solana-labs:master Jul 9, 2018
@rob-solana rob-solana mentioned this pull request Jul 9, 2018
3 tasks
@rob-solana rob-solana deleted the populate-initial-window branch July 12, 2018 16:29
vkomenda pushed a commit to vkomenda/solana that referenced this pull request Aug 29, 2021
)

Bumps [@solana/web3.js](https://github.com/solana-labs/solana-web3.js) from 0.76.0 to 0.77.0.
- [Release notes](https://github.com/solana-labs/solana-web3.js/releases)
- [Commits](solana-labs/solana-web3.js@v0.76.0...v0.77.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
brooksprumo pushed a commit to brooksprumo/solana that referenced this pull request Apr 3, 2024
* add scan_index for improving index generation

* pr feedback

* rework some stuff from pr feedback

* get rid of redundant if

* deal with rent correctly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants