-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove calls to update vertica #39
Conversation
spilchen
commented
Aug 23, 2021
- removed calls to update_vertica in the install and uninstall reconcile. This significantly speeds up scale-up operations.
- new package (atconf) to handle modifying admintools.conf for install/uninstall
- added a method to copy a file into a pod. It is implemented as a cat exec call.
- had to explicitly accept the EULA now that we don't call update_vertica
- had to explicitly check the existence and permissions of some directories in /opt/vertica/config (share and logrotate)
- we pass the license in during the create_db command now that we don't call update_vertica anymore
- had to change the obfuscation code to check for --password rather than -w. I added code that did a call to 'test -w file', and the obfuscation code was adding '****' for the file name.
- moved ipv6 check function to its own package so that I can use it in the atconf package
- moved name of vertica server container to names package. This was necessary so that we can use the const in the new atconf package
@@ -1,14 +1,4 @@ | |||
apiVersion: v1 | |||
kind: Event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is been a while for me so could you tell me why we do not need this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the install is so fast, we no longer need the Event for it. So all of the events for install and uninstall were removed.
rm -rf /home/dbadmin/logrotate | ||
|
||
sudo mkdir -p /opt/vertica/config/licensing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we need sudo anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were over zealous with the sudo before and so it wasn't needed. The long term goal is to have no reliance on sudo as that will play nice when we support OpenShift. Removing calls to update_vertica was a key first step for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok now I get it thanks.
pkg/atconf/file_writer.go
Outdated
} | ||
} | ||
|
||
// AddHosts will had ips to an admintools.conf. New admintools.conf, stored in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor typo: had --> add.
I'm trying to get more details: is it only adding IPs to the hosts = entry in [Cluster] section?
Edit: nvm, found it's adding IPs to both hosts= entry and creating new compact node name entries in [Nodes] section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, does it check duplicate IPs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: nvm, upon looking at the called functions I think my questions are answered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. We handle duplicate IPs by treating it as a no-op. I will update the comment here.
return nil | ||
} | ||
|
||
// removeNodes will remove the nodes section for the given set of IPs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if this function is expected to remove only compat21 node entries, if yes, should we assert or check somewhere to make sure we do not touch regular database nodes, at least mention in the comment? I understand those regular database nodes will be removed and are supposed to be removed by db_remove_subcluster/db_remove_nodes before the uninstallation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. We only remove compat21 node entries. The AT -t db_remove_nodes will handle removing of regular database nodes. I will mention in a comment.
pkg/controllers/at.go
Outdated
if !p.isPodRunning { | ||
continue | ||
} | ||
_, _, err := pr.CopyToPod(ctx, p.name, names.ServerContainer, atConfTempFile, paths.AdminToolsConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the copying succeed on some Pods but fail on other Pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is possible. We make sure that all pods are running before we begin install/uninstall, but that just shortens the timing. Do you think it is necessary to have a step in the operator to ensure admintools.conf is the same on all of the pods? And copy if they are different. The same pod is always picked first, so it should be a well known location that has the correct version of admintools.conf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is necessary to have a step in the operator to ensure admintools.conf is the same on all of the pods?
-- Having a step regularly checking admintools.conf on all Pods may be overkilling, as long as the initiator of AT commands has the latest admintools.conf it should be ok. Maybe we can keep as is at this time, if tests show any problem, then we can see what we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the details of err message, just want to mention that it might be helpful if we know on which nodes the copy succeeded and on which the copy failed, so if there's at least one copy succeeded, then that copy can be the source of latest admintools.conf, such information could be helpful to help debugging and fix admintools.conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ningdeng based on your comments here, I improved the distributeAdmintoolsConf function.
- the first pod we copy to will always be the same. It is also the pod that we use for the base for any subsequent admintools.conf changes for install/uninstall
- if the first pod copied was okay, copy to all other pods. We save the errors and do checking at the end so that we attempt to copy to all pods
- if the copy failed at one of the pods, log an event saying that admintools.conf was partially copied.
You can see these changes in the commit c424386
This looks good to me overall, thanks! |
// Copy the admintools.conf to the rest of the pods. We will do error | ||
// checking at the end so that we try to copy it to each pod. | ||
errs := []error{} | ||
for _, p := range pf.Detail { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest improvements of copying admintools.conf change looks good to me. Thanks!
Just some minor notes as potential future improvements: I see this is a sequential operation as it uses a for-loop, starting from v11, AT uses a new file copying mechanism, which improves the performance of file distribution to the entire cluster a lot. So, just in case that in the future people suggest/request updating this copying to be a parallel operation on a very large cluster, AT distribute_config_files could be used to distribute the files. At this time, using CopyToPod is better IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks. This is good to know and I will keep it in mind.