Skip to content
This repository has been archived by the owner on Feb 3, 2023. It is now read-only.

Holoscape Debug View Improvements #1954

Merged
merged 20 commits into from
Dec 11, 2019
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
a62086a
Add StatsSignal that gets send every 500ms over all admin interfaces
lucksus Dec 6, 2019
003e956
rustfmt
lucksus Dec 6, 2019
284afc5
Merge branch 'develop' into debug-view-improvements
zippy Dec 9, 2019
608721f
Start signal multiplexer after instances got initialized so we have a…
lucksus Dec 9, 2019
993ca00
Add parameters to conductor API call `debug/state_dump` to exlude por…
lucksus Dec 9, 2019
7e42f37
rustfmt
lucksus Dec 9, 2019
c547c75
Merge branch 'develop' into debug-view-improvements
lucksus Dec 9, 2019
db39a32
Merge branch 'debug-view-improvements' of github.com:holochain/holoch…
lucksus Dec 9, 2019
ad51a79
Merge branch 'develop' into debug-view-improvements
zippy Dec 10, 2019
e56dbec
Update crates/conductor_lib/src/interface.rs
lucksus Dec 10, 2019
aa40468
Merge branch 'develop' into debug-view-improvements
lucksus Dec 10, 2019
e77acae
Merge branch 'develop' into debug-view-improvements
lucksus Dec 10, 2019
3a85fcf
Update crates/conductor_lib/src/conductor/base.rs
lucksus Dec 11, 2019
6e168a0
Replace unreachable! with panic with message
lucksus Dec 11, 2019
d934731
Pull stats signal processing into its own thread
lucksus Dec 11, 2019
d974339
Make SignalWrapper an enum and represent InstanceStats there.
lucksus Dec 11, 2019
a29afcd
serde(tag="type") to make SignalWrapper's JSON representation backwar…
lucksus Dec 11, 2019
68c64d4
changelog
lucksus Dec 11, 2019
f27cc8c
Merge branch 'develop' into debug-view-improvements
lucksus Dec 11, 2019
6eb2852
Revert thread::yield_now() back to sleep() to not eat up all free CPU…
lucksus Dec 11, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 62 additions & 2 deletions crates/conductor_lib/src/conductor/base.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,17 @@ use crate::{
static_server_impls::NickelStaticServer as StaticServer,
};
use boolinator::Boolinator;
use holochain_core::signal::{InstanceStats, StatsSignal};
use holochain_core_types::dna::bridges::BridgePresence;
use holochain_net::{
connection::net_connection::NetHandler,
ipc::spawn::{ipc_spawn, SpawnResult},
p2p_config::{BackendConfig, P2pBackendKind, P2pConfig},
p2p_network::P2pNetwork,
};
use std::time::Instant;

const STATS_SIGNAL_INTERVAL: Duration = Duration::from_millis(500);

pub const MAX_DYNAMIC_PORT: u16 = std::u16::MAX;

Expand Down Expand Up @@ -275,21 +279,23 @@ impl Conductor {
self.stop_signal_multiplexer();
let broadcasters = self.interface_broadcasters.clone();
let instance_signal_receivers = self.instance_signal_receivers.clone();
let instances = self.instances.clone();
let signal_tx = self.signal_tx.clone();
let config = self.config.clone();
let (kill_switch_tx, kill_switch_rx) = unbounded();
self.signal_multiplexer_kill_switch = Some(kill_switch_tx);
let mut last_stats_signal_instant = Instant::now();

debug!("starting signal loop");
thread::Builder::new()
.name("signal_multiplexer".to_string())
.spawn(move || loop {
let broadcasters = broadcasters.read().unwrap();
{
for (instance_id, receiver) in instance_signal_receivers.read().unwrap().iter()
{
if let Ok(signal) = receiver.try_recv() {
signal_tx.clone().map(|s| s.send(signal.clone()));
let broadcasters = broadcasters.read().unwrap();
let interfaces_with_instance: Vec<&InterfaceConfiguration> =
match signal {
// Send internal signals only to admin interfaces, if signals.trace is set:
Expand Down Expand Up @@ -341,6 +347,8 @@ impl Conductor {
println!("INTERFACEs for SIGNAL: {:?}", interfaces);
interfaces
}

Signal::Stats(_) => unreachable!(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem great, any code that might happen to send a Stats signal to one of these receivers will crash the system. This creates a special case for Stats signals, where they are taboo to send outside of this thread.

Why break with the existing pattern here? This thread is for receiving signals and sending them out across the appropriate interface. There is duplicated code below to pick out the interface and generate the signal. It's not strictly wrong but it feels like it's hijacking this thread for a separate concern, and also breaking the assumption that someone can send a signal to a receiver and have it transmitted properly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the calculation of stats could potentially take a significant amount of time, and this seems like it should be a tight loop so that the other signals can get delivered in a timely fashion, especially when we get into user-defined signals. I'd propose to move the generation of signals outside of this loop, and send it in across the channel like all the other signals.

Copy link
Collaborator Author

@lucksus lucksus Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That unreachable! is an explicit statement about an invariant: namely that Stats signals are not send by any code. And if somebody would add that to any code, they would find this unreachable and at least know that they start using it differently to how it was meant to be used.

That at least is a common pattern that I use when adding explicit panics like this one - that is the semantic I connect to explict panics.

The Why:
I want Stats signals to be created by the conductor and not by instances so that we only have one signal for all instances. That is why I've decided to put it here instead of the instance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wish our code would reflect how it is intended to be used, rather than us relying on tribal knowledge of special cases. I know there's no foreseeable reason why anyone would send a Stats signal from elsewhere. It's not that I'm actually concerned about that unreachable code getting reachable. I'm concerned that we're hijacking a more specific pattern to be used for something more general, and I wish we would put in the bit of extra effort to generalize code when it needs to be generalized, rather than adding the minimum to get the special case working.

In this case the special case is a new type of signal that is associated with a conductor, and you're using a SignalWrapper which is intended for associating a Signal with an instance. To achieve that, you're introducing unreachable code, and ignoring a field of a struct that normally has semantic meaning. Everyone else who reads this code now has to apply the special knowledge that the instance_id field only matters sometimes, when it's not a Stats signal, and they have to find out the hard way if for whatever reason they wanted to send a Stats signal from an instance. For instance you could have at least made instance_id of SignalWrapper Optional.

I know that realistically this probably won't cause a crash, so my concern is not about proper functioning. It's about understanding the code, which is getting harder and harder to do. I feel that these special cases are slowly degrading our ability to reason about the code. We are fortunate enough to be using a language with a rich, expressive type system which in many cases can describe exactly how one should use a piece of code, and prevent anyone from using it the wrong way. I want to challenge us to actually make use of that, as a powerful documentation and intent-sharing tool.

Since this is a minor one I won't block this PR because of it. Can you just respond to the concern about potentially taking a long time in the loop? @lucksus

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right @maackle. This can and should be implemented by using the type system to make the impossible impossible. I just did that and pushed - please have another look.

I wasn't thinking the that calling .len() on some collections and iterating over held entries would block that thread in a noticeable manner and thought that adding a new thread might be more of a problem. But since the calculation of number of aspects is O(number of held entries) it can grow large over time. It is now done in a separate thread.

};

for interface in interfaces_with_instance {
Expand All @@ -356,6 +364,57 @@ impl Conductor {
}
}
}

if last_stats_signal_instant.elapsed() > STATS_SIGNAL_INTERVAL {
let admin_interfaces = config
.interfaces
.iter()
.filter(|interface_config| interface_config.admin)
.collect::<Vec<_>>();

if admin_interfaces.len() > 0 {
lucksus marked this conversation as resolved.
Show resolved Hide resolved
// Get stats for all instances:
let mut instance_stats: HashMap<String, InstanceStats> = HashMap::new();
for (id, instance) in instances.iter() {
if let Err(error) = instance
.read()
.map_err(|_| {
HolochainInstanceError::InternalFailure(HolochainError::new(
"Could not get lock on instance",
))
})
.and_then(|instance| instance.context())
.and_then(|context| context.get_stats().map_err(|e| e.into()))
.and_then(|stats| {
instance_stats.insert(id.clone(), stats);
Ok(())
})
{
error!(
"Could not get stats for instance '{}'. Error: {:?}",
id, error
);
}
}

// Wrap stats in signal:
let stats_signal = Signal::Stats(StatsSignal { instance_stats });

// Send signal over admin interfaces:
for interface in admin_interfaces {
if let Some(broadcaster) = broadcasters.get(&interface.id) {
if let Err(error) = broadcaster.send(SignalWrapper {
signal: stats_signal.clone(),
instance_id: String::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a // TODO?

}) {
notify(error.to_string());
}
};
}
}
last_stats_signal_instant = Instant::now();
}

if kill_switch_rx.try_recv().is_ok() {
break;
}
Expand Down Expand Up @@ -715,7 +774,6 @@ impl Conductor {
self.p2p_config = Some(self.initialize_p2p_config());
}

self.start_signal_multiplexer();
self.dpki_bootstrap()?;

for id in self.config.instance_ids_sorted_by_bridge_dependencies()? {
Expand All @@ -735,6 +793,8 @@ impl Conductor {
}
}

self.start_signal_multiplexer();

for ui_interface_config in self.config.ui_interfaces.clone() {
notify(format!("adding ui interface {}", &ui_interface_config.id));
let bundle_config = self
Expand Down
17 changes: 16 additions & 1 deletion crates/conductor_lib/src/interface.rs
Original file line number Diff line number Diff line change
Expand Up @@ -743,6 +743,9 @@ impl ConductorApiBuilder {
/// Returns a JSON object with all relevant fields of an instance's state.
/// Params:
/// - `instance_id` ID of the instance of which the state is requested
/// - `source_chain` [bool] (optional) If set to false, will exclude source chain headers
/// - `held_aspects` [bool] (optional) If set to false, will exclude the holding map entries
/// - `queued_holding_workflows` [bool] (optional If set to false, will exclude contents of the validatio queue
lucksus marked this conversation as resolved.
Show resolved Hide resolved
///
/// - `debug/fetch_cas`
/// Returns content of a given instance's CAS.
Expand All @@ -763,7 +766,19 @@ impl ConductorApiBuilder {
let params_map = Self::unwrap_params_map(params)?;
let instance_id = Self::get_as_string("instance_id", &params_map)?;

let dump = conductor_call!(|c| c.state_dump_for_instance(&instance_id))?;
let mut dump = conductor_call!(|c| c.state_dump_for_instance(&instance_id))?;

if Ok(false) == Self::get_as_bool("source_chain", &params_map) {
dump.source_chain.clear()
}

if Ok(false) == Self::get_as_bool("held_aspects", &params_map) {
dump.held_aspects.clear()
}

if Ok(false) == Self::get_as_bool("queued_holding_workflows", &params_map) {
dump.queued_holding_workflows.clear()
}

Ok(serde_json::to_value(dump)
.map_err(|e| jsonrpc_core::Error::invalid_params(e.to_string()))?)
Expand Down
18 changes: 18 additions & 0 deletions crates/core/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ use std::{
time::Duration,
};

use crate::signal::InstanceStats;
#[cfg(test)]
use test_utils::mock_signing::mock_conductor_api;

Expand Down Expand Up @@ -389,6 +390,23 @@ impl Context {
"No public CapTokenGrant entry type in chain".into(),
))
}

pub fn get_stats(&self) -> HcResult<InstanceStats> {
let state = self
.state()
.ok_or_else(|| "Couldn't get instance state".to_string())?;
let dht_store = state.dht();
let holding_map = dht_store.get_holding_map().bare();
Ok(InstanceStats {
number_held_entries: holding_map.keys().count(),
number_held_aspects: holding_map
.values()
.fold(0, |acc, aspect_set| acc + aspect_set.len()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice use of the fold here

number_pending_validations: dht_store.queued_holding_workflows().len(),
number_running_zome_calls: state.nucleus().running_zome_calls.len(),
offline: false,
})
}
}

pub async fn get_dna_and_agent(context: &Arc<Context>) -> HcResult<(Address, String)> {
Expand Down
17 changes: 16 additions & 1 deletion crates/core/src/signal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use holochain_json_api::{error::JsonError, json::JsonString};
use holochain_wasm_utils::api_serialization::emit_signal::EmitSignalArgs;
use serde::{Deserialize, Deserializer};
use snowflake::ProcessUniqueId;
use std::thread;
use std::{collections::HashMap, thread};

#[derive(Clone, Debug, Serialize, DefaultJson)]
#[serde(tag = "signal_type")]
Expand All @@ -13,6 +13,7 @@ pub enum Signal {
Trace(ActionWrapper),
Consistency(ConsistencySignal<String>),
User(UserSignal),
Stats(StatsSignal),
}

#[derive(Clone, Debug, Serialize, Deserialize, DefaultJson, PartialEq)]
Expand All @@ -30,6 +31,20 @@ impl From<EmitSignalArgs> for UserSignal {
}
}

#[derive(Clone, Debug, Serialize, Deserialize, DefaultJson, PartialEq)]
pub struct StatsSignal {
pub instance_stats: HashMap<String, InstanceStats>,
}

#[derive(Clone, Debug, Serialize, Deserialize, DefaultJson, PartialEq)]
pub struct InstanceStats {
pub number_held_entries: usize,
pub number_held_aspects: usize,
pub number_pending_validations: usize,
pub number_running_zome_calls: usize,
pub offline: bool,
}

impl<'de> Deserialize<'de> for Signal {
fn deserialize<D>(_deserializer: D) -> Result<Signal, D::Error>
where
Expand Down