Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log4j remediation - add fixes needed to rollout OCT-2021 RU (19.13) and introduce install-ahf.sh #104

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jcnars
Copy link
Collaborator

@jcnars jcnars commented Dec 30, 2021

This PR bundles all of the fixes that needed to be done to remediate log4j vuln in BMX hosts, namely:

  • AHF 21.4.0.0.0
  • rolling out 19.13 RU &

The install-ahf.sh is wrapped around @mfielding's AHF installer Ansible snippet at: https://b.corp.google.com/issues/210842382#comment5

I am briefly describing here the various errors that were faced while rolling out 19.13 and the fixes that were needed to fix them:

Error 1:

Command used:
/usr/log4j/bms-toolkit/install-oracle.sh --ora-swlib-bucket gs://oracle-software --instance-ssh-user ansible9 --instance-ssh-key /usr/.ssh/id_rsa_bms_toolkit --instance-ssh-extra-args '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentityAgent=no -o IdentitiesOnly=true -o ProxyCommand="ssh -W %h:%p -q user@control-node.you.com"' --backup-dest +RECO --ora-swlib-path /u01/oracle_install --ora-version 19 --ora-swlib-type gcs --ora-asm-disks mntrl_asm_node1.json --ora-data-mounts mntrl_data_mounts_config.json --cluster-type NONE --ora-data-diskgroup DATA --ora-reco-diskgroup RECO --ora-db-name orcl --ora-db-container false --instance-ip-addr 192.10.110.1 --instance-hostname at-000-somehost


Successfully created the ./inventory_files directory to save the inventory files.

getopt: unrecognized option '--instance-ssh-extra-args'
Invalid options provided: --ora-swlib-bucket gs://bmaas-testing-oracle-software --instance-ssh-user ansible9 --instance-ssh-key /usr/local/google/home/jcnarasimhan/.ssh/id_rsa_bms_toolkit --instance-ssh-extra-args '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentityAgent=no -o IdentitiesOnly=true -o ProxyCommand="ssh -W %h:%p -q jcnarasimhan@control-node.mfild.com"' --backup-dest +RECO --ora-swlib-path /u01/oracle_install --ora-version 19 --ora-swlib-type gcs --ora-asm-disks mntrl_asm_node1.json --ora-data-mounts mntrl_data_mounts_config.json --cluster-type NONE --ora-data-diskgroup DATA --ora-reco-diskgroup RECO --ora-db-name orcl --ora-db-container false --instance-ip-addr 192.10.110.1 --instance-hostname at-000-somehost

Fix 1:

Add the following in: install-oracle.sh:
GETOPT_OPTIONAL="$GETOPT_OPTIONAL,instance-ssh-key:,instance-hostname:,ntp-pref:,inventory-file:,compatible-rdbms:,instance-ssh-extra-args:"

Error 2:

Vanilla run of install-oracle.sh not installing latest RU, because recent changes done is not letting the value of oracle_rel to be populated. Research details in: https://docs.google.com/document/d/1uYVmeYyH2dVazcNtVaWwp6g6Mr4t10OBf8IuwUuD8bI/edit#heading=h.55g4i2azkfhl

Fix 2:

Add following in install-sw.yml

- name: Populate variables
  hosts: dbasm
  tasks:
  - include_role:
      name: common
      tasks_from: populate-vars.yml
  tags: populate-vars

Error 3:

install-oracle.shexited when it couldn't find ahahi daemon.

Fix 3:

Added ignore_errors: yes for the corresponding task in the file roles/base-provision/tasks/main.yml to let the installer continue.

Error 4:

This was the most painstaking error to figure out. Turns out we are not following suggested OFA standard for ORACLE_BASE for GI as the same as the ORACLE_BASE for RDBMS.
Errors details are listed in https://b.corp.google.com/issues/211656972#comment24, but calling it out here fir visibility:

grid_base error 1: when running gridSetup.sh:
[INS-35356] Oracle recommends that you install the database on all nodes that are part of the Oracle Grid Infrastructure cluster.
grid_base error 2: when running runInstaller for DB_HOME installation
TASK [rac-db-setup : rac-db-install | Run runInstaller] ************************
fatal: [at-00-002]: FAILED! => {"changed": true, "cmd": ["/u01/app/oracle/product/19.3.0/dbhome_1/runInstaller", "-silent", "-waitforcompletion", "-responseFile", "/u01/app/oracle/product/19.3.0/dbhome_1/db_install.rsp", "-ignorePrereqFailure"], "delta": "0:00:05.111164", "end": "2021-12-24 17:53:00.404171", "failed_when_result": true, "msg": "non-zero return code", "rc": 254, "start": "2021-12-24 17:52:55.293007", "stderr": "", "stderr_lines": [], "stdout": "Launching Oracle Database Setup Wizard...

[FATAL] [INS-32012] Unable to create directory: /u01/app/oracle, on this server.
   CAUSE: Either proper permissions were not granted to create the directory or there was no space left in the volume.
   ACTION: Check your permission on the selected directory or choose another directory.", "stdout_lines": ["Launching Oracle Database Setup Wizard...", "", "[FATAL] [INS-32012] Unable to create directory: /u01/app/oracle, on this server.", "   CAUSE: Either proper permissions were not granted to create the directory or there was no space left in the volume.", "   ACTION: Check your permission on the selected directory or choose another directory."]}
grid_base error 3: when patching:
[root@at-00-somesvr002 oracle_install]# /u01/app/19.3.0/grid/crs/install/rootcrs.sh -postpatch
Using configuration parameter file: /u01/app/19.3.0/grid/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/oracle/crsdata/at-00-somehost/crsconfig/crs_postpatch_apply_inplace_at-00-somehost_2021-12-24_08-45-35PM.log
2021/12/24 20:45:41 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.service'
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.evmd' on 'at-00-somehost'
...
...
CRS-2672: Attempting to start 'ora.asm' on 'at-00-somehost'
CRS-2676: Start of 'ora.asm' on 'at-00-somehost' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'at-00-somehost'
ORA-15077: could not locate ASM instance serving a required diskgroup
CRS-5055: unable to connect to an ASM instance because no ASM instance is running in the cluster
CRS-2883: Resource 'ora.storage' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-41053: checking Oracle Grid Infrastructure for file permission issues
PRVG-2031 : Owner of file "/u01/app/oracle" did not match the expected value on node "at-00-somehost". [Expected = "grid(802)" ; Found = "oracle(801)"]
PRVG-2031 : Owner of file "/u01/app/oracle/audit" did not match the expected value on node "at-00-somehost". [Expected = "grid(802)" ; Found = "oracle(801)"]
PRVG-2031 : Owner of file "/u01/app/oracle/admin" did not match the expected value on node "at-00-somehost". [Expected = "grid(802)" ; Found = "oracle(801)"]
CRS-4000: Command Start failed, or completed with errors.
2021/12/24 20:56:54 CLSRSC-117: Failed to start Oracle Clusterware stack from the Grid Infrastructure home /u01/app/19.3.0/grid
Died at /u01/app/19.3.0/grid/crs/install/crspatch.pm line 1811.
The command '/u01/app/19.3.0/grid/perl/bin/perl -I/u01/app/19.3.0/grid/perl/lib -I/u01/app/19.3.0/grid/crs/install -I/u01/app/19.3.0/grid/xag /u01/app/19.3.0/grid/crs/install/rootcrs.pl -postpatch' execution failed

Fix 4:

In light of the above errors, due to messed up permissions issue, added new code fix introducing a new variable called grid_base in the roles/common/defaults/main.yml and correspondingly modified roles/rac-gi-setup/templates/gridsetup.rsp.19.3.0.0.0.j2, so that the GI_BASE is separate from RDBMS_BASE

Error 5:

As noted in: https://mikedietrichde.com/2021/04/22/oracle-19c-installation-with-19-11-0-ru-ojvm-and-some-other-fixes/ (especially the portion highlighted in: https://screenshot.googleplex.com/7XbsnHPzu2aXH4g) and in SR 3-26257617871 : Error in invoking target 'irman ioracle idrdactl idrdalsnr idrdaproc' of makefile ins_rdbms.mk, when applyOneOffs is provided in conjunction with runInstaller, the installer fails with:

Error in invoking target 'irman ioracle idrdactl idrdalsnr idrdaproc' of makefile ins_rdbms.mk

Fix 5:

This is an ongoing bug since APR-2021 that's expected to be closed in the next quarter.
This left us with 2 options for 19.13 OCT'21 RU:
(1) Modify the existing codebase permanently to include a new logic for patching OJVM xeprately in the RDBMS_HOME
or
(2) Remove the applyOneOffs flag from the runInstaller command so that OJVM patch application shall be handled after install-oracle.sh finishes by exactly a single command: opatch apply

Went with (2) considering the introduction of additional logic and temporary nature of bug and removed the applyOneOffs.

Error 6:

Not really an error, but added 19.13 patch metadata in the file roles/common/defaults/main.yml.
The incorrect 19.13 patch metadata that is in #102 can be taken out and then submitted just for 12c & 18c after reverifying them.

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcnars

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant