Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GEN-1703] merge main #1732

Merged
merged 506 commits into from
Nov 11, 2024
Merged

Conversation

alonkeyval
Copy link
Collaborator

No description provided.

tamirdavid1 and others added 25 commits November 5, 2024 17:53
Co-authored-by: Tamir David <tamirdavid@Tamirs-MacBook-Pro.local>
small change, not sure if this was left out intentionally or not
This pull request introduces new features for describing Odigos and
sources, along with various improvements to the frontend components and
services. The most important changes include the addition of new
description drawers for Odigos and sources, updates to the API services,
and integration of these features into the existing components.

### New Features:
* Added `OdigosDescriptionDrawer` and `SourceDescriptionDrawer`
components to display detailed descriptions of Odigos and sources.
[[1]](diffhunk://#diff-4c8617cf98daa70e8ff7ee9893152742cc6daede3034814f62e66d3109ab056aR1-R248)
[[2]](diffhunk://#diff-f1476b3baa85929c2d816bc33571b7453fb4bace0b7a1f6e5c83b7c2b2e3c9d7R1-R282)
* Integrated new description drawers into the `OverviewHeader` and
`EditSourceForm` components.
[[1]](diffhunk://#diff-9d02b5dae936ee5dd59a0d8d3722fee2e78eca7acd0279c6d605048303f400cdL54-R58)
[[2]](diffhunk://#diff-dd26a25a84389833622d5c5b855c3d93d0f7f6a331d652d25863a17b81dda343R110-R114)

### API and Hooks Updates:
* Created new API functions `getOdigosDescription` and
`getSourceDescription` to fetch descriptions from the backend.
* Updated the `useDescribe` hook to manage fetching and state for Odigos
and source descriptions.
* Modified the `get` function in `api.ts` to use the `fetch` API instead
of `axios`.

### Asset and Export Updates:
* Added new SVG icons `describe.svg` and `refresh.svg` to the assets.
* Updated various index files to export the new components and hooks.
[[1]](diffhunk://#diff-348d5bfe60b97e626aa656d290b7f82a1f4ccd34d08bdcf8cd43e5ae6e1d4c95R2)
[[2]](diffhunk://#diff-4acfd7087b0688dfb3cb2e8adc9cb925048e2976396258273a7459dcdc40c666R1)
[[3]](diffhunk://#diff-3869e04212089fd401e12696b69ff9edcc4d2aa7c227987e4e00d6991205d6b4R6)
[[4]](diffhunk://#diff-ccdabc5e7350447daadecb689a4e8f7f3c747ffefbbc0bb26738df9052d3068cR12)

### Code Improvements:
* Replaced `NodeJS.Timer` with `NodeJS.Timeout` in `DataFlowContainer`.
* Simplified the response code logic in `DescribeOdigos` endpoint.
Co-authored-by: Tamir David <tamirdavid@Tamirs-MacBook-Pro.local>
…odigos-io#1703)

Co-authored-by: Tamir David <tamirdavid@Tamirs-MacBook-Pro.local>
…io#1707)

Just a small bug fix, the cli describe command was showing runtime
details of the `instrumentationConfig` instead of the value it should
show of `instrumentedApplication`
When a new pod is scheduled on a node, and the pod contains resource
request for odigos device, then kubelet will trigger the Allocate
function in odiglet.

Odiglet will try to `Get` the collectors group to calculate enabled
signals, which are forwarded to the agent creation to enable / disable
specific signals and avoid sending data from applications to a
non-existing endpoint.

We must never return an error from the `Allocate` function, even if for
some reason the collectors group had error, as it will make the pod non
schedulable with error: `UnexpectedAdmissionError`.

This PR makes it so if we have some error, we log it and start the agent
with no enable signals, so odigos will never interfere with the pod
start sequence
…odigos-io#1712)

When instrumented application updates, we only care if it's a change to
the runtime details. this is achieved with the generation field on the
object.
Using the right event filters for controllers is very important to
reduce resource utilization, avoid un-needed calls to api-server, and
reduce chance to race conditions when multiple things changes at the
same time.

We keep refining our controller event filters to achieve better
stability for odigos.

Right now, we define our custom predicates near the controller where
they are used, but this make it less easy to reuse existing common
predicates and share code. This PR move some of the common and reuseable
predicates to `k8sutils` module and refactor how they are being set up
…1713)

During the un-instrumentation of a large number of
sources`reconcileSingleWorkload` may be called concurrently by the
collectors group reconciler and the instrumented-application reconciler.

This race condition can cause the instrumentation device to be added
right after its removal as demonstrated in the following log:

```
2024-11-08T12:18:23Z	INFO	removed instrumentation device from workload	{"controller": "instrumentationdevice-instrumentedapplication", "controllerGroup": "odigos.io", "controllerKind": "InstrumentedApplication", "InstrumentedApplication": {"name":"deployment-coupon","namespace":"simple-demo18"}, "namespace": "simple-demo18", "name": "deployment-coupon", "reconcileID": "a50d0c85-fd07-4896-84a3-817472fc3953", "namespace": "simple-demo18", "kind": "&TypeMeta{Kind:Deployment,APIVersion:apps/v1,}", "name": "coupon", "reason": "NoRuntimeDetails"}
2024-11-08T12:18:23Z	DEBUG	instrumented application deleted	{"controller": "deleteinstrumentedapplication-deployment", "controllerGroup": "apps", "controllerKind": "Deployment", "Deployment": {"name":"coupon","namespace":"simple-demo18"}, "namespace": "simple-demo18", "name": "coupon", "reconcileID": "08560386-4960-4ace-aaae-343fd6fb4cc2", "namespace": "simple-demo18", "name": "coupon", "kind": "Deployment"}
2024-11-08T12:18:23Z	INFO	added instrumentation device to workload	{"controller": "instrumentationdevice-collectorsgroup", "controllerGroup": "odigos.io", "controllerKind": "CollectorsGroup", "CollectorsGroup": {"name":"odigos-data-collection","namespace":"odigos-system"}, "namespace": "odigos-system", "name": "odigos-data-collection", "reconcileID": "52989ae7-2686-4e1c-99cb-23b3252535ad", "name": "coupon", "namespace": "simple-demo18"}
```

To address this, an event filter is added to the CG reconciler, which
only forward either
1. Create events which have CG ready.
2. Update events where the CG became ready.

In addition, the reconciler is updated to get a fresh copy from the
cache of each instrumented app it handles before it reconciles it. This
should reduce the chance of handling an old instrumented application and
reconciling workloads unnecessarily.
This controller was relevant when the defaultSdks were read from the
odigos config.
Once it's removed in odigos-io#1626 , now the instrumentor read the default sdks
based on the tier on boot, and no longer consumes it from this config
map.

Notice that any change in tier will not be picked up automatically to
update the device names (was not supported before this PR).
The delete instrumented application controllers in instrumentor are
responsible to delete the instrumentated application and instrumentation
config once the workload and namespace labels are changed.

Specifically, it has a ns reconcile, and once detects labels change for
a ns and the ns is uninstrumented, it loops on all deployments, ds and
ss to delete any instrumentations for workloads that inheritaed
instrumentation from ns.

There were few bugs fixed in this PR:
- the ns reconciler would list the workloads, and then iterate them one
by one. This can take time. When UI instruments workloads, it starts by
setting the label on the ns and then iterates the workloads. This mean
that the ns reconciler might have read stale data and use it to delete
instrumentation, even if the workload has already being labeld. It is
fixed by always `Get`ing a fresh workload object when iterating the
list.
- the reconciler was attempting to delete any uninstrumented workload.
In the ns case, we should only care for workloads without label which
inherited the instrumentation from the ns (and ignore those that are
already marked with "disabled" which are handled in another controller.
This fix is to narrow the check for only those cases.
- the event filter allowed any label changes, it was narrowed to only
allow ns label changed from `enabled` to something else, making this
reconciler being called less times. For that the cache is modified to
store all ns so we can know when the labels has changed. This should not
be significant compared to other objects we store in cache.
…igos-io#1719)

Co-authored-by: Tamir David <tamirdavid@Tamirs-MacBook-Pro.local>
This is a minor refactor and improvement. instead of listing all
collectors groups, and then iterating them to find the once we want, we
simply fetch it by it's name. This also remove deprecated migration code
from `v1.0.31` which is not supported for long time
Co-authored-by: Tamir David <tamirdavid@Tamirs-MacBook-Pro.local>
This PR addresses 2 main issues in the config map reconciler in the lang
detection package inside the instrumentor.
The only goal of this reconciler is to trigger runtime detection in case
of a change in the ignored containers field in Odigos config.

1. Change the event filter predicate to only pass data updates of the
config map - and only for the Odigos config cm.
Adding a util predicate for the config map data change one.
3. Fix possible stale data being read from the workloads lists by
re-getting from the cache before deciding. similar to odigos-io#1713 and odigos-io#1716

note: the reconcile logic can in the OdigosConfig reconciler can be
improved by iterating over the instrumentationconfigs and marking them
to perform runtime detection - instead of iterating over the workloads
which is what is done today. This is left out of this PR since the
runtime detection signaling is constantly being refactored, and for
making this PR smaller and easier to review.
…gos-io#1724)

Following odigos-io#1722, use the `ConfigMapDataChangedPredicate` in conjunction
with `OdigosConfigMapPredicate` to only handle update events from our
config map in autoscaler.
At the moment, the sequence of events is:

1. user deleted a source from ui
2. the label is removed from the workload object (deployment etc).
3. the instrumentor detects the label is removed and delete instrumented
application and instrumentation config.
4. when instrumented application is deleted, the device is cleared from
workload pod spec with a relevant controller.

Device should be removed if the instrumented application is deleted, and
specifically in these cases:
- if instrumentor restarts when the instrumented application has been
deleted but the device cleanup where not applied yet.
- when 2 controllers perform the device logic at the same time, and one
is deleting the device from the workload, while the other one, which
might be using old state, will attempt to reintroduce it.


I reproduced these cases locally with old version, observed the device
is not cleaned up properly in these edge cases, and then run the changes
in this PR to make sure these are fixed.
This pull request includes updates to the `DetectedContainers` component
in the `frontend/webapp` directory. The changes enhance the user
interface and improve the handling of detected languages and agents.

UI Enhancements:

*
[`frontend/webapp/components/overview/sources/detected-containers/index.tsx`](diffhunk://#diff-37ab49243eec9831d4ec80b017423789fb6db91dc1cd32bc4a5f591b7961b8ddR25-R27):
Added a border, border-radius, and padding to the `Container` styled
component.

Functionality Improvements:

*
[`frontend/webapp/components/overview/sources/detected-containers/index.tsx`](diffhunk://#diff-37ab49243eec9831d4ec80b017423789fb6db91dc1cd32bc4a5f591b7961b8ddR13):
Updated the `DetectedContainers` component to handle a new optional
`other_agent` property in the `Language` interface, and display a
warning message if another agent is detected.
[[1]](diffhunk://#diff-37ab49243eec9831d4ec80b017423789fb6db91dc1cd32bc4a5f591b7961b8ddR13)
[[2]](diffhunk://#diff-37ab49243eec9831d4ec80b017423789fb6db91dc1cd32bc4a5f591b7961b8ddL48-R80)
Copy link

Merge main to new-ui

@alonkeyval alonkeyval merged commit b314d77 into odigos-io:new-ui Nov 11, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants