Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add proof of concept implementation of argo image updater replacement #173

Merged
merged 29 commits into from
Aug 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
eeadd14
initial progress checkpoint
shini4i Aug 18, 2023
2d52c58
checkpoint
shini4i Aug 25, 2023
471a7bd
checkpoint
shini4i Aug 25, 2023
1a84953
potential fix for tests
shini4i Aug 25, 2023
8fb96f0
potential fix for tests pt2
shini4i Aug 25, 2023
3da27ea
potential fix for tests pt3
shini4i Aug 25, 2023
228f4f4
skip empty commits
shini4i Aug 25, 2023
1d89c75
add initial info to readme
shini4i Aug 25, 2023
fa215f7
minor logic split
shini4i Aug 26, 2023
c1e8c1c
add a simple generated test
shini4i Aug 26, 2023
7e12693
chore: add more tests
shini4i Aug 27, 2023
6df22be
replace hardcoded values with const
shini4i Aug 27, 2023
6b359bd
cleanup + env for updater
shini4i Aug 27, 2023
1dfed33
replace missed hardcoded values
shini4i Aug 27, 2023
755677f
add more tests
shini4i Aug 28, 2023
dccf17d
add trivial task validation
shini4i Aug 28, 2023
61fe284
fix(config): typo
shini4i Aug 28, 2023
6cc674e
add a simple mutex
shini4i Aug 28, 2023
5af839b
reject tasks with invalid token
shini4i Aug 29, 2023
1177a3f
update docs + make client and server use the same variable name
shini4i Aug 29, 2023
c1f3b0d
minor logic fix
shini4i Aug 29, 2023
a6c5401
add annotations issue handling
shini4i Aug 29, 2023
3c4dd47
fix and improve mutext logic
shini4i Aug 29, 2023
9b53f73
fix comment
shini4i Aug 29, 2023
6f214d2
add a simple retry for updater activities
shini4i Aug 30, 2023
69abd2a
add golangci-lint exclusion
shini4i Aug 30, 2023
23502c1
chore: bump project dependencies
shini4i Aug 30, 2023
705ded8
update pre-commit configuration
shini4i Aug 30, 2023
1ffee91
bump golang version to 1.21
shini4i Aug 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/run-tests-and-sonar-scan.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
pull_request:
types: [opened, synchronize, reopened]
env:
GOLANG_VERSION: '1.20.4'
GOLANG_VERSION: '1.21.0'

jobs:
golangci:
Expand Down
6 changes: 3 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ static/
**/.envrc
**/.env

# directory for built binary files
bin/
# built binary file
argo-watcher

# swagger files
**/swagger.*
Expand All @@ -25,4 +25,4 @@ cmd/argo-watcher/docs
cmd/argo-watcher/mock

# goreleaser
dist/
dist/
12 changes: 12 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,20 @@ repos:
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-yaml
- id: check-added-large-files
- id: no-commit-to-branch
args:
- --branch=main
- repo: https://github.com/hadolint/hadolint
rev: v2.12.0
hooks:
- id: hadolint
args:
- --ignore=DL3018
- repo: https://github.com/dnephin/pre-commit-golang
rev: v0.5.1
hooks:
- id: go-fmt
- id: go-mod-tidy
- id: golangci-lint
19 changes: 13 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,23 @@ install-deps: ## Install dependencies
test: mocks ## Run tests
@ARGO_TIMEOUT=1 go test -v ./... -count=1 -coverprofile coverage.out `go list ./... | egrep -v '(test|mock)'`

.PHONY: ensure-dirs
ensure-dirs:
@mkdir -p bin

.PHONY: build
build: ensure-dirs docs ## Build the binaries
build: docs ## Build the binaries
@echo "===> Building [$(CYAN)${VERSION}$(RESET)] version of [$(CYAN)argo-watcher$(RESET)] binary"
@CGO_ENABLED=0 go build -ldflags="-s -w -X main.version=${VERSION}" -o bin/argo-watcher ./cmd/argo-watcher
@CGO_ENABLED=0 go build -ldflags="-s -w -X main.version=${VERSION}" -o argo-watcher ./cmd/argo-watcher
@echo "===> Done"

.PHONY: kind-upload
kind-upload:
@echo "===> Building [$(CYAN)dev$(RESET)] version of [$(CYAN)argo-watcher$(RESET)] binary"
@CGO_ENABLED=0 GOARCH=arm64 GOOS=linux go build -ldflags="-s -w -X main.version=dev" -o argo-watcher ./cmd/argo-watcher
@echo "===> Building [$(CYAN)argo-watcher$(RESET)] docker image"
@docker build -t argo-watcher:dev .
@echo "===> Loading [$(CYAN)argo-watcher$(RESET)] docker image into [$(CYAN)kind$(RESET)] cluster"
@kind load docker-image argo-watcher:dev -n disposable-cluster
@echo "===> Restarting [$(CYAN)argo-watcher$(RESET)] deployment"
@kubectl rollout restart deploy argo-watcher -n argo-watcher

.PHONY: build-goreleaser
build-goreleaser:
@echo "===> Building [$(CYAN)${VERSION}$(RESET)] version of [$(CYAN)argo-watcher$(RESET)] binary"
Expand Down
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,31 @@ The workflow for deployment might be the following

## Documentation

> Starting with version v0.6.0, Argo Watcher now offers experimental support for making direct commits to the GitOps repository.

<details>
<summary>Direct Git Integration</summary>
If you've been using Argo CD Image Updater across hundreds of applications, you might have noticed that the latency in detecting new images can sometimes slow down your deployments considerably.

To address the challenges with deployment latency, we're excited to unveil an experimental feature in Argo Watcher that allows direct commits to your GitOps repository.

We remain committed to supporting the straightforward scenario where users simply check the Application status. This ensures flexibility for those who prefer or need to use the original method.

For those looking to experiment with faster image updates, you can leverage the new direct commit capability using the following annotations.

```yaml
annotations:
argo-watcher/managed: "true"
argo-watcher/managed-images: "app=ghcr.io/shini4i/argo-watcher"
argo-watcher/app.helm.image-tag: "image.tag"
```
This configuration will require mounting ssh key to the container. Support for this configuration is available in helm chart starting from verion `0.4.0`.

⚠️ Important Note Regarding Direct Commit Feature:

Please be aware that when using the direct commit feature, Argo Watcher does not verify the actual availability of the image. It assumes and trusts that the tag received from the client is correct. Ensure you have processes in place to validate image tags before relying on this feature.
</details>

- Installation instructions and more information can be found in the [docs](docs/installation.md).
- Development instructions can be found in the [docs](docs/development.md).
- A short story about why this project was created can be found [here](https://medium.com/dyninno/a-journey-to-gitops-9aa445474eb6).
Expand Down
8 changes: 0 additions & 8 deletions cmd/argo-watcher/argo.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,6 @@ var (
argoSyncRetryDelay = 15 * time.Second
)

const (
ArgoAppSuccess = iota
ArgoAppNotSynced
ArgoAppNotAvailable
ArgoAppNotHealthy
ArgoAppFailed
)

const (
ArgoAPIErrorTemplate = "ArgoCD API Error: %s"
argoUnavailableErrorMessage = "connect: connection refused"
Expand Down
61 changes: 57 additions & 4 deletions cmd/argo-watcher/argo_status_updater.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"errors"
"fmt"
"strings"
"sync"
"time"

"github.com/avast/retry-go/v4"
Expand All @@ -13,10 +14,21 @@ import (

const failedToUpdateTaskStatusTemplate string = "Failed to change task status: %s"

type MutexMap struct {
m sync.Map
}

func (mm *MutexMap) Get(key string) *sync.Mutex {
log.Debug().Msgf("acquiring mutex for %s app", key)
m, _ := mm.m.LoadOrStore(key, &sync.Mutex{})
return m.(*sync.Mutex) // nolint:forcetypeassert // type assertion is guaranteed to be correct
}

type ArgoStatusUpdater struct {
argo Argo
registryProxyUrl string
retryOptions []retry.Option
mutex MutexMap
}

func (updater *ArgoStatusUpdater) Init(argo Argo, retryAttempts uint, retryDelay time.Duration, registryProxyUrl string) {
Expand Down Expand Up @@ -76,6 +88,48 @@ func (updater *ArgoStatusUpdater) waitForApplicationDeployment(task models.Task)
var application *models.Application
var err error

app, err := updater.argo.api.GetApplication(task.App)
if err != nil {
return nil, err
}
shini4i marked this conversation as resolved.
Show resolved Hide resolved

// This mutex is used only to avoid concurrent updates of the same application.
mutex := updater.mutex.Get(task.App)

// Locking the mutex here to unlock within the next if block without duplicating the code,
// avoiding defer to unlock before the function's end. This approach may be revised later
mutex.Lock()

if app.IsManagedByWatcher() && task.Validated {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the token verification should happen in the router. In case the token is invalid, fail the API call. In case it's successful - continue the execution. And for git commit deployment identification in the code, I'd add a separate parameter, for example GitCommitEnabled, so that it's explicit what the client wants.

log.Debug().Str("id", task.Id).Msg("Application managed by watcher. Initiating git repo update.")

// simplest way to deal with potential git conflicts
// need to be replaced with a more sophisticated solution after PoC
err := retry.Do(
func() error {
if err := app.UpdateGitImageTag(&task); err != nil {
return err
}
return nil
},
retry.DelayType(retry.FixedDelay),
retry.Attempts(3),
retry.OnRetry(func(n uint, err error) {
log.Warn().Str("id", task.Id).Msgf("Failed to update git repo. Error: %s, retrying...", err.Error())
}),
retry.LastErrorOnly(true),
)

mutex.Unlock()
if err != nil {
log.Error().Str("id", task.Id).Msgf("Failed to update git repo. Error: %s", err.Error())
return nil, err
}
} else {
mutex.Unlock()
log.Debug().Str("id", task.Id).Msg("Skipping git repo update: Application not managed by watcher or token is absent/invalid.")
shini4i marked this conversation as resolved.
Show resolved Hide resolved
}

// wait for application to get into deployed status or timeout
log.Debug().Str("id", task.Id).Msg("Waiting for rollout")
_ = retry.Do(func() error {
Expand Down Expand Up @@ -107,7 +161,7 @@ func (updater *ArgoStatusUpdater) waitForApplicationDeployment(task models.Task)
}

func (updater *ArgoStatusUpdater) handleArgoAPIFailure(task models.Task, err error) {
var apiFailureStatus string = models.StatusFailedMessage
var apiFailureStatus = models.StatusFailedMessage

// check if ArgoCD didn't have the app
if task.IsAppNotFoundError(err) {
Expand All @@ -122,8 +176,7 @@ func (updater *ArgoStatusUpdater) handleArgoAPIFailure(task models.Task, err err
reason := fmt.Sprintf(ArgoAPIErrorTemplate, err.Error())
log.Warn().Str("id", task.Id).Msgf("Deployment failed with status \"%s\". Aborting with error: %s", apiFailureStatus, reason)

errStatusChange := updater.argo.state.SetTaskStatus(task.Id, apiFailureStatus, reason)
if errStatusChange != nil {
log.Error().Str("id", task.Id).Msgf(failedToUpdateTaskStatusTemplate, errStatusChange)
if err := updater.argo.state.SetTaskStatus(task.Id, apiFailureStatus, reason); err != nil {
log.Error().Str("id", task.Id).Msgf(failedToUpdateTaskStatusTemplate, err)
}
}
46 changes: 38 additions & 8 deletions cmd/argo-watcher/argo_status_updater_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ package main

import (
"fmt"
"github.com/stretchr/testify/assert"
"sync"
"testing"
"time"

Expand Down Expand Up @@ -47,7 +49,7 @@ func TestArgoStatusUpdaterCheck(t *testing.T) {
application.Status.Health.Status = "Healthy"

// mock calls
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil)
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil).Times(2)
metricsMock.EXPECT().ResetFailedDeployment(task.App)
stateMock.EXPECT().SetTaskStatus(task.Id, models.StatusDeployedMessage, "")

Expand Down Expand Up @@ -136,7 +138,7 @@ func TestArgoStatusUpdaterCheck(t *testing.T) {
application.Status.Health.Status = "Healthy"

// mock calls
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil)
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil).Times(2)
metricsMock.EXPECT().ResetFailedDeployment(task.App)
stateMock.EXPECT().SetTaskStatus(task.Id, models.StatusDeployedMessage, "")

Expand Down Expand Up @@ -177,7 +179,7 @@ func TestArgoStatusUpdaterCheck(t *testing.T) {
application.Status.Health.Status = "Healthy"

// mock calls
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil)
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil).Times(2)
metricsMock.EXPECT().AddFailedDeployment(task.App)
stateMock.EXPECT().SetTaskStatus(task.Id, models.StatusFailedMessage,
"Application deployment failed. Rollout status \"not available\"\n\nList of current images (last app check):\n\ttest-registry/ghcr.io/shini4i/argo-watcher:dev\n\nList of expected images:\n\tghcr.io/shini4i/argo-watcher:dev")
Expand Down Expand Up @@ -265,9 +267,9 @@ func TestArgoStatusUpdaterCheck(t *testing.T) {
}

// mock calls
apiMock.EXPECT().GetApplication(task.App).Return(nil, fmt.Errorf("Unexpected failure"))
apiMock.EXPECT().GetApplication(task.App).Return(nil, fmt.Errorf("unexpected failure"))
metricsMock.EXPECT().AddFailedDeployment(task.App)
stateMock.EXPECT().SetTaskStatus(task.Id, models.StatusFailedMessage, "ArgoCD API Error: Unexpected failure")
stateMock.EXPECT().SetTaskStatus(task.Id, models.StatusFailedMessage, "ArgoCD API Error: unexpected failure")

// run the rollout
updater.WaitForRollout(task)
Expand Down Expand Up @@ -304,7 +306,7 @@ func TestArgoStatusUpdaterCheck(t *testing.T) {
application.Status.Summary.Images = []string{"test-image:v0.0.1"}

// mock calls
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil)
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil).Times(2)
metricsMock.EXPECT().AddFailedDeployment(task.App)
stateMock.EXPECT().SetTaskStatus(task.Id, models.StatusFailedMessage,
"Application deployment failed. Rollout status \"not available\"\n\nList of current images (last app check):\n\ttest-image:v0.0.1\n\nList of expected images:\n\tghcr.io/shini4i/argo-watcher:dev")
Expand Down Expand Up @@ -348,7 +350,7 @@ func TestArgoStatusUpdaterCheck(t *testing.T) {
application.Status.OperationState.Message = "Not working test app"

// mock calls
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil)
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil).Times(2)
metricsMock.EXPECT().AddFailedDeployment(task.App)
stateMock.EXPECT().SetTaskStatus(task.Id, models.StatusFailedMessage,
"Application deployment failed. Rollout status \"not synced\"\n\nApp status \"NotWorking\"\nApp message \"Not working test app\"\nResources:\n\t")
Expand Down Expand Up @@ -390,7 +392,7 @@ func TestArgoStatusUpdaterCheck(t *testing.T) {
application.Status.Health.Status = "NotHealthy"

// mock calls
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil)
apiMock.EXPECT().GetApplication(task.App).Return(&application, nil).Times(2)
metricsMock.EXPECT().AddFailedDeployment(task.App)
stateMock.EXPECT().SetTaskStatus(task.Id, models.StatusFailedMessage,
"Application deployment failed. Rollout status \"not healthy\"\n\nApp sync status \"Synced\"\nApp health status \"NotHealthy\"\nResources:\n\t")
Expand All @@ -399,3 +401,31 @@ func TestArgoStatusUpdaterCheck(t *testing.T) {
updater.WaitForRollout(task)
})
}

func TestMutexMapGet(t *testing.T) {
mm := &MutexMap{}

key := "testKey"
mutex1 := mm.Get(key)
assert.NotNil(t, mutex1)

// Fetch the mutex again
mutex2 := mm.Get(key)
assert.NotNil(t, mutex2)

// Ensure they're the same
assert.Equal(t, mutex1, mutex2)

// Test concurrency
wg := &sync.WaitGroup{}
const numRoutines = 50
for i := 0; i < numRoutines; i++ {
wg.Add(1)
go func() {
defer wg.Done()
m := mm.Get(key)
assert.Equal(t, mutex1, m)
}()
}
wg.Wait()
}
3 changes: 2 additions & 1 deletion cmd/argo-watcher/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ import (
)

const (
LOG_FORMAT_TEXT = "text"
LogFormatText = "text"
)

type ServerConfig struct {
Expand All @@ -33,6 +33,7 @@ type ServerConfig struct {
DbUser string `required:"false" envconfig:"DB_USER"`
DbPassword string `required:"false" envconfig:"DB_PASSWORD"`
DbMigrationsPath string `required:"false" envconfig:"DB_MIGRATIONS_PATH" default:"db/migrations"` // deprecated
DeployToken string `required:"false" envconfig:"ARGO_WATCHER_DEPLOY_TOKEN"`
}

// NewServerConfig parses the server configuration from environment variables using the envconfig package.
Expand Down
16 changes: 16 additions & 0 deletions cmd/argo-watcher/router.go
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,22 @@ func (env *Env) addTask(c *gin.Context) {
return
}

// not an optimal solution, but for PoC it's fine
// need to find a better way to pass the token later
deployToken := c.GetHeader("ARGO_WATCHER_DEPLOY_TOKEN")

if deployToken != "" && deployToken == env.config.DeployToken {
log.Debug().Msgf("deploy token is validated for app %s", task.App)
task.Validated = true
} else if deployToken != "" && deployToken != env.config.DeployToken {
// if token is provided, but it's not valid we should not process the task
log.Warn().Msgf("deploy token is invalid for app %s, aborting", task.App)
c.JSON(http.StatusUnauthorized, models.TaskStatus{})
return
} else {
log.Debug().Msgf("deploy token is not provided for app %s", task.App)
}

newTask, err := env.argo.AddTask(task)
if err != nil {
log.Error().Msgf("Couldn't process new task. Got the following error: %s", err)
Expand Down
2 changes: 1 addition & 1 deletion cmd/argo-watcher/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ import (
// If the log level string is invalid, it falls back to the default InfoLevel.
func initLogs(logLevel string, logFormat string) {
// set log format
if logFormat == config.LOG_FORMAT_TEXT {
if logFormat == config.LogFormatText {
output := zerolog.ConsoleWriter{Out: os.Stdout, TimeFormat: time.RFC3339}
log.Logger = zerolog.New(output).With().Timestamp().Logger()
}
Expand Down
4 changes: 2 additions & 2 deletions cmd/argo-watcher/state/postgres_state.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ func (state *PostgresState) Connect(serverConfig *config.ServerConfig) error {
// create connection
ormConfig := &gorm.Config{}
// we can leave logger enabled only for text format
if serverConfig.LogFormat != config.LOG_FORMAT_TEXT {
if serverConfig.LogFormat != config.LogFormatText {
// disable logging until we implement zerolog logger for ORM
ormConfig.Logger = logger.Default.LogMode(logger.Silent)
} else {
Expand Down Expand Up @@ -242,7 +242,7 @@ func (state *PostgresState) doProcessPostgresObsoleteTasks() error {

var result *gorm.DB

log.Debug().Msg("Marking app not found tasks older than 1 hour as aborted...")
log.Debug().Msg("Removing app not found tasks older than 1 hour from the database...")
result = state.orm.Where("status = ?", models.StatusAppNotFoundMessage).Where("created < now() - interval '1 hour'").Delete(&state_models.TaskModel{})
if result.Error != nil {
return result.Error
Expand Down
Loading