Skip to content

Commit

Permalink
policy: Atomic swap for policy map
Browse files Browse the repository at this point in the history
Currently we must execute in all worker threads to be able to ACK the
update back to Cilium agent. This has become a problem as Envoy logs show
worker threads stalling for 10 seconds doing nothing, pausing the policy
update progress long enough to cause problems for FQDN policies.

Change to create a new policy map for each update, sharing it as a const
map for all worker threads using an atomic swap. Deletion of the old map
is coordinated with the worker threads in RCU fashion, waiting until each
worker thread has processed a new posted function, proving that they no
longer use the policy lookop results they may have obtained. This can be
done after ACK is already sent back to Cilium Agent.

There is a potentially significant functional change, we no longer wait
at all for SDS secrets to be fetched. Network policy update is a node
wide function, so it was questionable to stall it for a single secret
used by some endpoint’s policy to begin with.

Unless Envoy main thread is too busy, this change should eliminate
toFQDNs policy proxy wait timeouts sometimes seen in Cilium Agent.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
  • Loading branch information
jrajahalme committed Jan 16, 2025
1 parent fe62be6 commit 5aeb98c
Show file tree
Hide file tree
Showing 8 changed files with 218 additions and 296 deletions.
1 change: 1 addition & 0 deletions cilium/api/bpf_metadata.proto
Original file line number Diff line number Diff line change
Expand Up @@ -49,5 +49,6 @@ message BpfMetadata {

// policy_update_warning_limit is the time in milliseconds after which a warning is logged if
// network policy update took longer
// Deprecated, has no effect.
google.protobuf.Duration policy_update_warning_limit = 9;
}
20 changes: 5 additions & 15 deletions cilium/bpf_metadata.cc
Original file line number Diff line number Diff line change
Expand Up @@ -189,13 +189,10 @@ createHostMap(Server::Configuration::ListenerFactoryContext& context) {
}

std::shared_ptr<const Cilium::NetworkPolicyMap>
createPolicyMap(Server::Configuration::FactoryContext& context, Cilium::CtMapSharedPtr& ct,
std::chrono::milliseconds policy_update_warning_limit_ms) {
createPolicyMap(Server::Configuration::FactoryContext& context, Cilium::CtMapSharedPtr& ct) {
return context.serverFactoryContext().singletonManager().getTyped<const Cilium::NetworkPolicyMap>(
SINGLETON_MANAGER_REGISTERED_NAME(cilium_network_policy),
[&context, &ct, &policy_update_warning_limit_ms] {
auto map =
std::make_shared<Cilium::NetworkPolicyMap>(context, ct, policy_update_warning_limit_ms);
SINGLETON_MANAGER_REGISTERED_NAME(cilium_network_policy), [&context, &ct] {
auto map = std::make_shared<Cilium::NetworkPolicyMap>(context, ct);
map->startSubscription();
return map;
});
Expand All @@ -212,14 +209,7 @@ Config::Config(const ::cilium::BpfMetadata& config,
Network::Utility::parseInternetAddressNoThrow(config.ipv4_source_address())),
ipv6_source_address_(
Network::Utility::parseInternetAddressNoThrow(config.ipv6_source_address())),
enforce_policy_on_l7lb_(config.enforce_policy_on_l7lb()),
policy_update_warning_limit_ms_(std::chrono::milliseconds(100)) {

const uint64_t limit = DurationUtil::durationToMilliseconds(config.policy_update_warning_limit());
if (limit > 0) {
policy_update_warning_limit_ms_ = std::chrono::milliseconds(limit);
}

enforce_policy_on_l7lb_(config.enforce_policy_on_l7lb()) {
if (is_l7lb_ && is_ingress_) {
throw EnvoyException("cilium.bpf_metadata: is_l7lb may not be set with is_ingress");
}
Expand Down Expand Up @@ -261,7 +251,7 @@ Config::Config(const ::cilium::BpfMetadata& config,
// Get the shared policy provider, or create it if not already created.
// Note that the API config source is assumed to be the same for all filter
// instances!
npmap_ = createPolicyMap(context, ct_maps_, policy_update_warning_limit_ms_);
npmap_ = createPolicyMap(context, ct_maps_);
}
}

Expand Down
1 change: 0 additions & 1 deletion cilium/bpf_metadata.h
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,6 @@ class Config : public Cilium::PolicyResolver,
Cilium::CtMapSharedPtr ct_maps_{};
Cilium::IPCacheSharedPtr ipcache_{};
std::shared_ptr<const Cilium::PolicyHostMap> hosts_{};
std::chrono::milliseconds policy_update_warning_limit_ms_;

private:
uint32_t resolveSourceIdentity(const PolicyInstanceConstSharedPtr policy,
Expand Down
4 changes: 2 additions & 2 deletions cilium/conntrack.cc
Original file line number Diff line number Diff line change
Expand Up @@ -159,10 +159,10 @@ CtMap::openMap6(const std::string& map_name) {
return pair.first;
}

void CtMap::closeMaps(const std::shared_ptr<absl::flat_hash_set<std::string>>& to_be_closed) {
void CtMap::closeMaps(const absl::flat_hash_set<std::string>& to_be_closed) {
std::lock_guard<Thread::MutexBasicLockable> guard(maps_mutex_);

for (const auto& name : *to_be_closed) {
for (const auto& name : to_be_closed) {
auto ct4 = ct_maps4_.find(name);
if (ct4 != ct_maps4_.end()) {
ct_maps4_.erase(ct4);
Expand Down
2 changes: 1 addition & 1 deletion cilium/conntrack.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ class CtMap : public Singleton::Instance, Logger::Loggable<Logger::Id::filter> {
CtMap6 ctmap6_tcp_;
CtMap6 ctmap6_any_;
};
void closeMaps(const std::shared_ptr<absl::flat_hash_set<std::string>>& to_be_closed);
void closeMaps(const absl::flat_hash_set<std::string>& to_be_closed);

private:
absl::flat_hash_map<const std::string, std::unique_ptr<CtMaps4>>::iterator
Expand Down
Loading

0 comments on commit 5aeb98c

Please sign in to comment.