This repository has been archived by the owner on Feb 3, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 267
Holoscape Debug View Improvements #1954
Merged
Merged
Changes from 9 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
a62086a
Add StatsSignal that gets send every 500ms over all admin interfaces
lucksus 003e956
rustfmt
lucksus 284afc5
Merge branch 'develop' into debug-view-improvements
zippy 608721f
Start signal multiplexer after instances got initialized so we have a…
lucksus 993ca00
Add parameters to conductor API call `debug/state_dump` to exlude por…
lucksus 7e42f37
rustfmt
lucksus c547c75
Merge branch 'develop' into debug-view-improvements
lucksus db39a32
Merge branch 'debug-view-improvements' of github.com:holochain/holoch…
lucksus ad51a79
Merge branch 'develop' into debug-view-improvements
zippy e56dbec
Update crates/conductor_lib/src/interface.rs
lucksus aa40468
Merge branch 'develop' into debug-view-improvements
lucksus e77acae
Merge branch 'develop' into debug-view-improvements
lucksus 3a85fcf
Update crates/conductor_lib/src/conductor/base.rs
lucksus 6e168a0
Replace unreachable! with panic with message
lucksus d934731
Pull stats signal processing into its own thread
lucksus d974339
Make SignalWrapper an enum and represent InstanceStats there.
lucksus a29afcd
serde(tag="type") to make SignalWrapper's JSON representation backwar…
lucksus 68c64d4
changelog
lucksus f27cc8c
Merge branch 'develop' into debug-view-improvements
lucksus 6eb2852
Revert thread::yield_now() back to sleep() to not eat up all free CPU…
lucksus File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,13 +55,17 @@ use crate::{ | |
static_server_impls::NickelStaticServer as StaticServer, | ||
}; | ||
use boolinator::Boolinator; | ||
use holochain_core::signal::{InstanceStats, StatsSignal}; | ||
use holochain_core_types::dna::bridges::BridgePresence; | ||
use holochain_net::{ | ||
connection::net_connection::NetHandler, | ||
ipc::spawn::{ipc_spawn, SpawnResult}, | ||
p2p_config::{BackendConfig, P2pBackendKind, P2pConfig}, | ||
p2p_network::P2pNetwork, | ||
}; | ||
use std::time::Instant; | ||
|
||
const STATS_SIGNAL_INTERVAL: Duration = Duration::from_millis(500); | ||
|
||
pub const MAX_DYNAMIC_PORT: u16 = std::u16::MAX; | ||
|
||
|
@@ -275,21 +279,23 @@ impl Conductor { | |
self.stop_signal_multiplexer(); | ||
let broadcasters = self.interface_broadcasters.clone(); | ||
let instance_signal_receivers = self.instance_signal_receivers.clone(); | ||
let instances = self.instances.clone(); | ||
let signal_tx = self.signal_tx.clone(); | ||
let config = self.config.clone(); | ||
let (kill_switch_tx, kill_switch_rx) = unbounded(); | ||
self.signal_multiplexer_kill_switch = Some(kill_switch_tx); | ||
let mut last_stats_signal_instant = Instant::now(); | ||
|
||
debug!("starting signal loop"); | ||
thread::Builder::new() | ||
.name("signal_multiplexer".to_string()) | ||
.spawn(move || loop { | ||
let broadcasters = broadcasters.read().unwrap(); | ||
{ | ||
for (instance_id, receiver) in instance_signal_receivers.read().unwrap().iter() | ||
{ | ||
if let Ok(signal) = receiver.try_recv() { | ||
signal_tx.clone().map(|s| s.send(signal.clone())); | ||
let broadcasters = broadcasters.read().unwrap(); | ||
let interfaces_with_instance: Vec<&InterfaceConfiguration> = | ||
match signal { | ||
// Send internal signals only to admin interfaces, if signals.trace is set: | ||
|
@@ -341,6 +347,8 @@ impl Conductor { | |
println!("INTERFACEs for SIGNAL: {:?}", interfaces); | ||
interfaces | ||
} | ||
|
||
Signal::Stats(_) => unreachable!(), | ||
}; | ||
|
||
for interface in interfaces_with_instance { | ||
|
@@ -356,6 +364,57 @@ impl Conductor { | |
} | ||
} | ||
} | ||
|
||
if last_stats_signal_instant.elapsed() > STATS_SIGNAL_INTERVAL { | ||
let admin_interfaces = config | ||
.interfaces | ||
.iter() | ||
.filter(|interface_config| interface_config.admin) | ||
.collect::<Vec<_>>(); | ||
|
||
if admin_interfaces.len() > 0 { | ||
lucksus marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// Get stats for all instances: | ||
let mut instance_stats: HashMap<String, InstanceStats> = HashMap::new(); | ||
for (id, instance) in instances.iter() { | ||
if let Err(error) = instance | ||
.read() | ||
.map_err(|_| { | ||
HolochainInstanceError::InternalFailure(HolochainError::new( | ||
"Could not get lock on instance", | ||
)) | ||
}) | ||
.and_then(|instance| instance.context()) | ||
.and_then(|context| context.get_stats().map_err(|e| e.into())) | ||
.and_then(|stats| { | ||
instance_stats.insert(id.clone(), stats); | ||
Ok(()) | ||
}) | ||
{ | ||
error!( | ||
"Could not get stats for instance '{}'. Error: {:?}", | ||
id, error | ||
); | ||
} | ||
} | ||
|
||
// Wrap stats in signal: | ||
let stats_signal = Signal::Stats(StatsSignal { instance_stats }); | ||
|
||
// Send signal over admin interfaces: | ||
for interface in admin_interfaces { | ||
if let Some(broadcaster) = broadcasters.get(&interface.id) { | ||
if let Err(error) = broadcaster.send(SignalWrapper { | ||
signal: stats_signal.clone(), | ||
instance_id: String::new(), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this need a |
||
}) { | ||
notify(error.to_string()); | ||
} | ||
}; | ||
} | ||
} | ||
last_stats_signal_instant = Instant::now(); | ||
} | ||
|
||
if kill_switch_rx.try_recv().is_ok() { | ||
break; | ||
} | ||
|
@@ -715,7 +774,6 @@ impl Conductor { | |
self.p2p_config = Some(self.initialize_p2p_config()); | ||
} | ||
|
||
self.start_signal_multiplexer(); | ||
self.dpki_bootstrap()?; | ||
|
||
for id in self.config.instance_ids_sorted_by_bridge_dependencies()? { | ||
|
@@ -735,6 +793,8 @@ impl Conductor { | |
} | ||
} | ||
|
||
self.start_signal_multiplexer(); | ||
|
||
for ui_interface_config in self.config.ui_interfaces.clone() { | ||
notify(format!("adding ui interface {}", &ui_interface_config.id)); | ||
let bundle_config = self | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,6 +45,7 @@ use std::{ | |
time::Duration, | ||
}; | ||
|
||
use crate::signal::InstanceStats; | ||
#[cfg(test)] | ||
use test_utils::mock_signing::mock_conductor_api; | ||
|
||
|
@@ -389,6 +390,23 @@ impl Context { | |
"No public CapTokenGrant entry type in chain".into(), | ||
)) | ||
} | ||
|
||
pub fn get_stats(&self) -> HcResult<InstanceStats> { | ||
let state = self | ||
.state() | ||
.ok_or_else(|| "Couldn't get instance state".to_string())?; | ||
let dht_store = state.dht(); | ||
let holding_map = dht_store.get_holding_map().bare(); | ||
Ok(InstanceStats { | ||
number_held_entries: holding_map.keys().count(), | ||
number_held_aspects: holding_map | ||
.values() | ||
.fold(0, |acc, aspect_set| acc + aspect_set.len()), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice use of the fold here |
||
number_pending_validations: dht_store.queued_holding_workflows().len(), | ||
number_running_zome_calls: state.nucleus().running_zome_calls.len(), | ||
offline: false, | ||
}) | ||
} | ||
} | ||
|
||
pub async fn get_dna_and_agent(context: &Arc<Context>) -> HcResult<(Address, String)> { | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem great, any code that might happen to send a Stats signal to one of these receivers will crash the system. This creates a special case for Stats signals, where they are taboo to send outside of this thread.
Why break with the existing pattern here? This thread is for receiving signals and sending them out across the appropriate interface. There is duplicated code below to pick out the interface and generate the signal. It's not strictly wrong but it feels like it's hijacking this thread for a separate concern, and also breaking the assumption that someone can send a signal to a receiver and have it transmitted properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the calculation of stats could potentially take a significant amount of time, and this seems like it should be a tight loop so that the other signals can get delivered in a timely fashion, especially when we get into user-defined signals. I'd propose to move the generation of signals outside of this loop, and send it in across the channel like all the other signals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That
unreachable!
is an explicit statement about an invariant: namely thatStats
signals are not send by any code. And if somebody would add that to any code, they would find this unreachable and at least know that they start using it differently to how it was meant to be used.That at least is a common pattern that I use when adding explicit panics like this one - that is the semantic I connect to explict panics.
The Why:
I want
Stats
signals to be created by the conductor and not by instances so that we only have one signal for all instances. That is why I've decided to put it here instead of the instance.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wish our code would reflect how it is intended to be used, rather than us relying on tribal knowledge of special cases. I know there's no foreseeable reason why anyone would send a Stats signal from elsewhere. It's not that I'm actually concerned about that unreachable code getting reachable. I'm concerned that we're hijacking a more specific pattern to be used for something more general, and I wish we would put in the bit of extra effort to generalize code when it needs to be generalized, rather than adding the minimum to get the special case working.
In this case the special case is a new type of signal that is associated with a conductor, and you're using a SignalWrapper which is intended for associating a Signal with an instance. To achieve that, you're introducing unreachable code, and ignoring a field of a struct that normally has semantic meaning. Everyone else who reads this code now has to apply the special knowledge that the instance_id field only matters sometimes, when it's not a Stats signal, and they have to find out the hard way if for whatever reason they wanted to send a Stats signal from an instance. For instance you could have at least made instance_id of SignalWrapper Optional.
I know that realistically this probably won't cause a crash, so my concern is not about proper functioning. It's about understanding the code, which is getting harder and harder to do. I feel that these special cases are slowly degrading our ability to reason about the code. We are fortunate enough to be using a language with a rich, expressive type system which in many cases can describe exactly how one should use a piece of code, and prevent anyone from using it the wrong way. I want to challenge us to actually make use of that, as a powerful documentation and intent-sharing tool.
Since this is a minor one I won't block this PR because of it. Can you just respond to the concern about potentially taking a long time in the loop? @lucksus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right @maackle. This can and should be implemented by using the type system to make the impossible impossible. I just did that and pushed - please have another look.
I wasn't thinking the that calling
.len()
on some collections and iterating over held entries would block that thread in a noticeable manner and thought that adding a new thread might be more of a problem. But since the calculation of number of aspects isO(number of held entries)
it can grow large over time. It is now done in a separate thread.