Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Is there any risks that more than one NewPodTopologyCache running in one Scheduler App? #53

Open
NoicFank opened this issue Oct 20, 2023 · 3 comments

Comments

@NoicFank
Copy link

NoicFank commented Oct 20, 2023

Func NewPodTopologyCache responses to build a common cache for plugin NodeResourceTopologyMatch, and it will be called in plugin's New func as following:

func New(args runtime.Object, handle framework.Handle) (framework.Plugin, error) {
	...
	topologyMatch := &TopologyMatch{
                // here initializing the cache
		PodTopologyCache:       NewPodTopologyCache(ctx, 30*time.Minute),
		handle:                 handle,
		lister:                 lister,
		topologyAwareResources: sets.NewString(cfg.TopologyAwareResources...),
	}

	return topologyMatch, nil
}

Then, once plugin NodeResourceTopologyMatch appears in multi profiles for one scheduler app, then the plugin will be initialized many times, which means the upper func New will be triggered more than once.

Then, the most important thing is multi PodTopologyCache shows in one scheduler app. Is there any potential risks in this situation(e.g. data race)?

@Garrybest @qmhu PTAL, thanks

@NoicFank NoicFank changed the title Bug: Is there any risks that more than one NewPodTopologyCache running is one Scheduler App? Bug: Is there any risks that more than one NewPodTopologyCache running in one Scheduler App? Oct 20, 2023
@NoicFank
Copy link
Author

The nearly same thing is putting the creating SharedInformerFactory into plugin as following, Then there will be many informers for same one Resource(GVK), which means there will be many duplicate same indexer(local cache) for one Resource.

And for this case, not only 「same plugin appears in multi profiles」 will trigger this, once resource is used in other plugins will also trigger this.

It could bring many memory pressure when all the same informers to list same one resource, which may trigger OOM.

And there is chance that indexer inconsistency for one resource

func initTopologyInformer(
	ctx context.Context,
	client topologyclientset.Interface,
) (listerv1alpha1.NodeResourceTopologyLister, error) {
	topologyInformerFactory := informers.NewSharedInformerFactory(client, 0)
	nrtLister := topologyInformerFactory.Topology().V1alpha1().NodeResourceTopologies().Lister()

	klog.V(4).InfoS("Start nodeTopologyInformer")
	topologyInformerFactory.Start(ctx.Done())
	topologyInformerFactory.WaitForCacheSync(ctx.Done())
	return nrtLister, nil
}

@NoicFank
Copy link
Author

NoicFank commented Oct 20, 2023

In my opinion, Bring initialization of SharedInformerFactory and Self-Define Cache into cmd.main, which means plugin's func New() should be wrapped that SharedInformerFactory and Self-Define Cache will be injected into plugin as plugin.New() parameters.

@qmhu
Copy link
Member

qmhu commented Oct 23, 2023

Which situation will create more than one noderesourcetopology plugin?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants