Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the monit_syncd, from docker-syncd-brcm-dnx. #8003

Merged
merged 2 commits into from
Jun 29, 2021

Conversation

judyjoseph
Copy link
Contributor

@judyjoseph judyjoseph commented Jun 29, 2021

Why I did it

Remove the file platform/broadcom/docker-syncd-brcm-dnx/base_image_files/monit_syncd, from new docker "docker-syncd-brcm-dnx" which was added recently via #7598

There was a activity for monit in PR #7676, which was not updated in the PR #7598 ( both PRs were worked upon in parallel )

How I did it

How to verify it

This was resulting in config load_minigraph to fail. Verified after removing the file /etc/monit/conf.d/monit_syncd

admin@str-s6000-acs-11:~$ sudo config load_minigraph -y
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -H -m -j /etc/sonic/init_cfg.json   --write-to-db
Running command: pfcwd start_default
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Running command: config qos reload --no-dynamic-buffer
Running command: /usr/local/bin/sonic-cfggen  -d --write-to-db -t /usr/share/sonic/device/x86_64-dell_s6000_s1220-r0/Force10-S6000/buffers.json.j2,config-db -t /usr/share/sonic/device/x86_64-dell_s6000_s1220-r0/Force10-S6000/qos.json.j2,config-db -y /etc/sonic/sonic_version.yml
Resetting failed status on aaastatsd.service
Resetting failed status on bgp.service
Resetting failed status on caclmgrd.service
Resetting failed status on dhcp_relay.service
Resetting failed status on hostcfgd.service
Resetting failed status on hostname-config.service
Resetting failed status on interfaces-config.service
Resetting failed status on lldp.service
Resetting failed status on macsec.service
Resetting failed status on mgmt-framework.timer
Resetting failed status on nat.service
Resetting failed status on ntp-config.service
Resetting failed status on pmon.service
Resetting failed status on procdockerstatsd.service
Resetting failed status on radv.service
Resetting failed status on rsyslog-config.service
Resetting failed status on sflow.service
Resetting failed status on swss.service
Resetting failed status on syncd.service
Resetting failed status on teamd.service
Resetting failed status on telemetry.timer
Restarting SONiC target ...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Please note setting loaded from minigraph will be lost after system reboot. To preserve setting, run `config save`.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

@judyjoseph judyjoseph requested a review from lguohan as a code owner June 29, 2021 00:39
yozhao101
yozhao101 previously approved these changes Jun 29, 2021
@judyjoseph judyjoseph requested a review from yxieca June 29, 2021 01:12
@judyjoseph
Copy link
Contributor Author

judyjoseph commented Jun 29, 2021

Thanks @yxieca for identifying this from nightly test logs, verified config load_minigraph works ok on a sonic device without this file monit_syncd.

@jleveque jleveque requested a review from yozhao101 June 29, 2021 17:01
@lguohan
Copy link
Collaborator

lguohan commented Jun 30, 2021

it file should be only in dnx image, why is this affecting xgs image at all?

@judyjoseph
Copy link
Contributor Author

it file should be only in dnx image, why is this affecting xgs image at all?

I did some checks on why this happened.

Both the dnx/xgs images and their respective docker-syncd & docker-syncd-dnx are build together. The docker package name for xgs/dnx is different, but the container/process names is same for both.

template/docker-syncd-base.mk:$(DOCKER_SYNCD_BASE)_CONTAINER_NAME = syncd
docker-syncd-brcm-dnx.mk:$(DOCKER_SYNCD_DNX_BASE)_CONTAINER_NAME = syncd

docker-syncd-brcm.mk:$(DOCKER_SYNCD_BASE)_PACKAGE_NAME = syncd
docker-syncd-brcm-dnx.mk:$(DOCKER_SYNCD_DNX_BASE)_PACKAGE_NAME = syncd-dnx

So keeping the same monit_syncd file in both docker-syncd-brcm and docker-syncd-brcm-dnx was ok, as the docker/process names are same.

But when we copy these files while building the image - in files/build_templates/sonic_debian_extension.j2, we copy all the monit files present to /etc/monit/conf.d/ as below

sudo cp $IMAGE_CONFIGS/monit/conf.d/* $FILESYSTEM_ROOT/etc/monit/conf.d/

Since we include both the docker-syncd-brcm.mk & docker-syncd-brcm-dnx.mk in the platform/broadcom/rules.mk, the monit_syncd got copied to both the XGS and DNX builds and we didn't differentiate monit_syncd in each of the dockers as the processes they monitor were with same name.

@lguohan
Copy link
Collaborator

lguohan commented Jun 30, 2021

Since we include both the docker-syncd-brcm.mk & docker-syncd-brcm-dnx.mk in the platform/broadcom/rules.mk, the monit_syncd got copied to both the XGS and DNX builds and we didn't differentiate monit_syncd in each of the dockers as the processes they monitor were with same name.

this seems problematic, can we fix it?

@judyjoseph judyjoseph deleted the remove_monit_dnx branch July 1, 2021 00:02
@judyjoseph
Copy link
Contributor Author

judyjoseph commented Jul 1, 2021

Since we include both the docker-syncd-brcm.mk & docker-syncd-brcm-dnx.mk in the platform/broadcom/rules.mk, the monit_syncd got copied to both the XGS and DNX builds and we didn't differentiate monit_syncd in each of the dockers as the processes they monitor were with same name.

this seems problematic, can we fix it?

Since we do build both the syncd/syncd-dnx & broadcom/broadcim-dnx images in the same make command flow -- the approach taken was

  1. if we have common files which have same contents in both platforms, ( mostly the files with _BASE_IMAGE_FILES ) we just allow them to be copied as it was done currently. It should work fine independently in XGS/DNX platforms as we keep the docker/process name same between these dockers.

  2. There is a new variable _MACHINE which was added to docker-syncd*mk files that are used to differentiate files like for eg: the start scripts like syncd.sh which also has the same name, but the contents are different.

    Here as a intermediate step in slave.mk (https://github.com/Azure/sonic-buildimage/blob/master/slave.mk#L1012) it is prefixed with the _MACHINE defined and later copied correctly in sonic_debian_extension.j2 (https://github.com/Azure/sonic-buildimage/blob/master/files/build_templates/sonic_debian_extension.j2#L730)

docker-syncd-brcm-dnx.mk:$(DOCKER_SYNCD_DNX_BASE)_MACHINE = broadcom-dnx
docker-syncd-brcm.mk:$(DOCKER_SYNCD_BASE)_MACHINE = broadcom

This approach helped to differentiate though we include docker-syncd-brcm.mk & docker-syncd-brcm-dnx.mk in the platform/broadcom/rules.mk. Let me know your thoughts - thanks.

carl-nokia pushed a commit to carl-nokia/sonic-buildimage that referenced this pull request Aug 7, 2021
Remove the references to file monit_syncd from docker-syncd-brcm-dnx, which got missed as the PR sonic-net#7598 overlapped sonic-net#7676
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants