Skip to content

Latest commit

 

History

History
464 lines (391 loc) · 42.3 KB

acceptance-tests.md

File metadata and controls

464 lines (391 loc) · 42.3 KB

These acceptance tests are based on the NERC Operational Use Cases.

Reference for NERC OpenStack: https://nerc-project.github.io/nerc-docs/get-started/user-onboarding-on-NERC/

Acceptance testers will require access to the following applications:

  • OpenShift admin access (cluster-admins, nerc-org-admins, nerc-ops groups in OpenShift) to access the Observability dashboards and cluster logging.
  • ColdFront admin access, because most OpenShift verification steps amd some of the ColdFront verification steps (delete a user) require admin access.
  • VPN access to XDMoD, to view the reports for OpenShift resources.

Managing Users

  1. Request a new account
    • As a new user, I should be able to create an account for myself in the NERC.
    • Criteria
      1. A prospective user follows the steps documented in https://nerc-project.github.io/nerc-docs/get-started/create-a-user-portal-account/ to create an account on the NERC.
    • Acceptance tests:
      1. Check that an OpenShift User exists, with access to the project allocations:
        1. Create a new user account following the How to Create a User Account documentation.
        2. The user must accept the Acceptable Use Notice during the sign up process. This is shown to all users and without approval doesn't allow account creation.
        3. For a given username of nerc-test-account for example, check that the given username is listed in the oc CLI:
          $ oc get user/nerc-test-account
          NAME               UID                                   FULL NAME  IDENTITIES
          nerc-test-account  db6324f8-e3df-4543-a40c-3fecb91b5204
        4. For a given username of nerc-test-account and a namespace of 01234567-89ab-cdef-0123-456789abcdef for example, check the user's RoleBindings exist:
          $ oc -n 01234567-89ab-cdef-0123-456789abcdef describe rolebinding/edit
          Name:         edit
          Labels:       <none>
          Annotations:  <none>
          Role:
            Kind:  ClusterRole
            Name:  edit
          Subjects:
            Kind  Name                       Namespace
            ----  ----                       ---------
            User  nerc-test-account
        5. Verify that the given username has a RoleBinding with Role ref of edit, for the project's namespaces.
        6. Try logging into OpenShift as the user to test the Keycloak authentication.
  2. Remove a user
    • As an administrator, I should be able to remove a user from their projects and allocations.
    • Criteria
      1. A NERC admin can follow the steps documented in What is NERC's ColdFront? to remove a user from a project on the NERC. A NERC admin can also use a private runbook to disable/ deactivate a user from KeyCloak.
      2. The user can no longer access their project.
    • Acceptance tests:
      1. A NERC admin can follow the steps documented in What is NERC's ColdFront? to remove a user from a project on the NERC.
      2. Check that the user no longer has access to the project in OpenShift.
        1. For a given username of nerc-test-account for example, Click here to check the user's RoleBindings.
        2. Verify that the given username no longer has a RoleBinding with Role ref of edit, for the project's namespaces.
  3. Add/Remove PI privilege to a user
    • For any user account, the administrator should be able to add or remove PI status associated with that account. A user may be a PI on multiple projects, but a project can have only 1 PI.
    • Criteria
      1. User creates a new account as described in section 1. See the ColdFront documentation here.
      2. User fills out the PI request form.
      3. To approve the request, NERC admin assigns the user to the PI role on KeyCloak`s user management.
      4. NERC admin approves the request and responds with a ticket reply.
      5. To remove the PI role, just reverse the step c. And send out an email to the user informing about it.
    • Acceptance tests:
      1. Create the new user account in ColdFront.
      2. Fill out the PI request form.
      3. Assign the USER the PI role in Keycloak.
      4. Check that an OpenShift User exists, with access to the project allocations:
        1. Approve the user's request in ColdFront.
        2. Because ColdFront users and managers will have the same access of edit on the project, there is no difference in OpenShift roles between a user and a manager.
        3. For a given username of nerc-test-account and a namespace of 01234567-89ab-cdef-0123-456789abcdef for example, check the user's RoleBindings exist:
          $ oc -n 01234567-89ab-cdef-0123-456789abcdef describe rolebinding/edit
          Name:         edit
          Labels:       <none>
          Annotations:  <none>
          Role:
            Kind:  ClusterRole
            Name:  edit
          Subjects:
            Kind  Name                       Namespace
            ----  ----                       ---------
            User  nerc-test-account
        4. Verify that the given username has a RoleBinding with Role ref of edit, for the project's namespaces.
      5. Check that an OpenShift User exists, with access to the project allocations:
        1. See the Adding User to Manager Role documentation to also remove the PI role from a user.
        2. Click on the edit icon next to the user's name on the Project Detail page.
        3. Then toggle the "Role" from Manager to User.
        4. Approve the request to remove the PI role from the user in ColdFront.
        5. Because ColdFront users and managers will have the same access of edit on the project, there is no difference in OpenShift roles between a user and a manager.
        6. For a given username of nerc-test-account and a namespace of 01234567-89ab-cdef-0123-456789abcdef for example, check the user's RoleBindings exist:
          $ oc -n 01234567-89ab-cdef-0123-456789abcdef describe rolebinding/edit
          Name:         edit
          Labels:       <none>
          Annotations:  <none>
          Role:
            Kind:  ClusterRole
            Name:  edit
          Subjects:
            Kind  Name                       Namespace
            ----  ----                       ---------
            User  nerc-test-account
        7. Verify that the given username has a RoleBinding with Role ref of edit, for the project's namespaces.

Manage Projects

  1. Add a new project
    • As a user who is a PI, I should be able to create a project by.
    • Criteria
      1. User has previously been set as a PI.
      2. User logs in to ColdFront and requests a project by going to Home → Projects and by clicking on Create Project. See the ColdFront documentation here.
      3. User requests a resource allocation on the OpenShift resource for the above created project.
      4. Administrator approves the request.
      5. Upon approval, a project will be created in OpenShift for this particular allocation. If the user does not already exist in OpenShift the user will be created. The project created will have the following attributes
        1. Project name prefixed by a random 6 char hex
        2. Project ID / namespace as a uuid
        3. Requested quota attributes.
      6. The user will be able to authenticate using Keycloak and their institutional login.
    • Acceptance tests:
      1. Setup the user as a PI in ColdFront, see Add/Remove PI privilege to a user above.
      2. Log into ColdFront as the user and request a project, See the ColdFront documentation here.
      3. As the user, request an OpenShift resource allocation in ColdFront.
        1. The PI must accept the End User License Agreement for the resource allocation request, for each new resource allocation. (See image below and "Placeholder for EULA" text box for OpenStack. It is only displayed to the PI requesting the allocation at the moment of the request and not to other users that may be later added.)

          Resource Allocation EULA

      4. Log into ColdFront as an admin and approve the request.
      5. Validate the project was created with the requested quota:
        1. For a given project named 012345myproject for example, check that the given project is listed in the oc CLI:

          $ oc get project/012345myproject
          NAME               DISPLAY NAME  STATUS
          012345myproject                  Active
        2. For a given namespace named 01234567-89ab-cdef-0123-456789abcdef for example, check that the given namespace is listed in the oc CLI:

          $ oc get namespace/01234567-89ab-cdef-0123-456789abcdef
          NAME                                  STATUS  AGE
          01234567-89ab-cdef-0123-456789abcdef  Active  14m
        3. You can explore quotas from within the Observability dashboard. For a given project named 012345myproject, and resource limits.cpu or limits.memory for example, Click here to check the value for the type=hard (max limit) and type=used (current value).

          Observability Limits CPU

      6. Check that an OpenShift User exists, with access to the project allocations:
        1. For a given username of nerc-test-account for example, check that the given username is listed in the oc CLI:
          $ oc get user/nerc-test-account
          NAME               UID                                   FULL NAME  IDENTITIES
          nerc-test-account  db6324f8-e3df-4543-a40c-3fecb91b5204
        2. For a given username of nerc-test-account and a namespace of 01234567-89ab-cdef-0123-456789abcdef for example, check the user's RoleBindings exist:
          $ oc -n 01234567-89ab-cdef-0123-456789abcdef describe rolebinding/edit
          Name:         edit
          Labels:       <none>
          Annotations:  <none>
          Role:
            Kind:  ClusterRole
            Name:  edit
          Subjects:
            Kind  Name                       Namespace
            ----  ----                       ---------
            User  nerc-test-account
        3. Verify that the given username has a RoleBinding with Role ref of edit, for the project's namespaces.
        4. Try logging into OpenShift as the user to test the Keycloak authentication.
  2. Deactivate a project or resource allocation
    • As an administrator, I should be able to archive any project or resource allocation and release the resources associated with it back to the pool.
    • Criteria
      1. A ColdFront admin can navigate to the project. See the ColdFront documentation here.
        1. They can archive a project and expire all associated allocations by clicking archive project by navigating to the project and clicking archive project.
        2. They can navigate to an allocation, set the status to Denied, and update the allocation
      2. Disabling an allocation will delete the associated OpenShift namespace, which differs from OpenStack behavior which simply disables the project.
    • Acceptance tests:
      1. Validate that the project allocations have been removed from the project users and managers.
        1. As an admin in ColdFront, archive the project.
        2. For a given username of nerc-test-account and a namespace of 01234567-89ab-cdef-0123-456789abcdef for example, check the user's RoleBindings have been removed:
          1. As an admin in ColdFront, set the allocation status to Denied, and update the allocation.
          2. Check that the RoleBinding no longer exists:
            $ oc -n 01234567-89ab-cdef-0123-456789abcdef describe rolebinding/edit
            Name:         edit
            Labels:       <none>
            Annotations:  <none>
            Role:
              Kind:  ClusterRole
              Name:  edit
            Subjects:
              Kind  Name                       Namespace
              ----  ----                       ---------
                ```
      2. Validate the project was deleted, as well as the namespaces:
        1. As an admin in ColdFront, disable the allocation for the project.
        2. For a given project named 012345myproject for example, check that the given project is no longer listed in the oc CLI:
          $ oc get project/012345myproject
          NAME               DISPLAY NAME  STATUS
        3. For a given namespace named 01234567-89ab-cdef-0123-456789abcdef for example, check that the given namespace is no longer listed in the oc CLI:
          $ oc get namespace/01234567-89ab-cdef-0123-456789abcdef
          NAME                                  STATUS  AGE
  3. Manage a project as a PI.
    • As a PI, I should be able to manage and share my project with others on the team, but no one except the the administrator should be able to remove the project.
    • Criteria
      1. A PI can add keycloak users to a ColdFront project under the users section in the given project (https://nerc-project.github.io/nerc-docs/get-started/get-an-allocation/#adding-and-removing-user-from-the-project)
      2. From here, a PI can set the user to a particular role.
        1. The manager role has an edit role to the project, and is the one that lets users create and remove allocations by delegating PI role/responsibilities in ColdFront.
        2. The user role also has an edit role to the project, but cannot create and remove allocations.
    • Acceptance tests:
      1. Because ColdFront gives the same edit role to a Manager and a User, you can expect all users and PIs in a project to share the same role. For a given namespace named 01234567-89ab-cdef-0123-456789abcdef, and a given user named nerc-test-account, and the given role edit, check that the given project contains a RoleBinding with a Role ref of edit, a Subject kind of User, and a Subject name of nerc-test-account in the oc CLI:
        1. oc -n 01234567-89ab-cdef-0123-456789abcdef describe rolebinding/edit
      2. After making any changes to user roles, check that the given project contains a RoleBinding with a Role ref of edit, a Subject kind of User, and a Subject name of nerc-test-account for all Users and PIs in the oc CLI:
        1. oc -n 01234567-89ab-cdef-0123-456789abcdef describe rolebinding/edit

Manage Quota

  1. Set and modify quotas for projects
    • As an administrator of the cluster, I should be able to set and modify compute, storage and object counts quotas for any project.
    • Criteria
      1. For modifying attributes, allocation change requests can be requested by navigating to the active allocation.See the ColdFront documentation here
      2. From here, an admin can approve the request and a call to the acct-mgt service will be made.
        1. For setting attributes, adding a new allocation attribute triggers a call to the acct-mgt service endpoint /projects/{project_id}/quota.
    • Acceptance tests:
      1. As a ColdFront admin, make a request to change an allocation's attributes.
      2. As a ColdFront admin, approve the request.
        1. You can explore quotas from within the Observability dashboard. For a given project named 012345myproject, and resource limits.cpu or limits.memory for example, Click here to check the value has been updated for the type=hard (max limit) and type=used (current value).

          Observability Limits CPU

Managing Authorization Policy

  1. View and Manage Role bindings
    • As an administrator of the cluster, I should be able to create, view and manage role bindings for the users in the cluster.
    • Criteria
      1. After a user is added, an admin can go to the user actions tab and set their role to manager or user.
        1. https://nerc-project.github.io/nerc-docs/get-started/get-an-allocation/#adding-and-removing-user-from-the-project
    • Acceptance tests:
      1. For a given username of nerc-test-account and a namespace of 01234567-89ab-cdef-0123-456789abcdef and a role of edit for example, check the user's RoleBindings exist:
        1. As a ColdFront admin, set the user's role to manager.
        2. Check the user role bindings.
          $ oc -n 01234567-89ab-cdef-0123-456789abcdef describe rolebinding/edit
          Name:         edit
          Labels:       <none>
          Annotations:  <none>
          Role:
            Kind:  ClusterRole
            Name:  edit
          Subjects:
            Kind  Name                       Namespace
            ----  ----                       ---------
            User  nerc-test-account
        3. Because ColdFront users and managers will have the same access of edit on the project, there is no difference in OpenShift roles between a user and a manager.

Documentation

  1. Online Documentation

Hardware Management

  1. Add and track new hardware
    • As an administrator of the cluster, I should be able to add nodes to the cluster. I should also be able to view all the nodes and their status.
    • Criteria
      1. Netbox
    • Acceptance tests:
      1. Add new nodes to the cluster.
        1. Here is the spreadsheet for managing new hardware.

        2. Click here to view the ACM Observability Grafana dashboards. These dashboards provide insights into Control Plane Health, Optimization, Capacity, Utilization and more. You can change the timespan in the top right to show results in terms of minutes, hours, days, months or years.

          Observability Dashboards

  2. Track faulty hardware
    • As an administrator of the cluster, I should be able to view and track the list of faulty nodes that need to be replaced.
    • Criteria
      1. Nagios or refer to notes in Netbox
    • Acceptance tests:
      1. Track faulty nodes that need to be replaced.
        1. Here is the spreadsheet for managing faulty hardware.

        2. Click here to view the ACM Observability Grafana dashboards. These dashboards provide insights into Control Plane Health, Optimization, Capacity, Utilization and more. You can change the timespan in the top right to show results in terms of minutes, hours, days, months or years.

          Observability Dashboards

Upgrade

  1. Establish OpenShift cluster upgrade process

Monitoring

  1. Generate and share operations alerts
  2. Logging
    • As an administrator of the cluster, I should be able to track all the events in the cluster using the logging system in OpenShift.
    • Criteria
      1. Click here to visit the Multi Cluster Logging.
      2. You can easily filter by recent date, or date range in the past.
      3. You can easily filter by content, namespaces, pods, and containers.
      4. You can also filter by log levels: critical, error, warning, info, debug, trace, unknown.
      5. Click "Show Query" to add more advanced filters like cluster ID:
        1. Here are the logs for the infra cluster, you can also add the following query to the end of your log query to filter on infra cluster logs: | openshift_cluster_id="b3c6e302-f119-4adb-bc48-e04c6aa2eaa5"
        2. Here are the logs for the prod cluster, you can also add the following query to the end of your log query to filter on infra cluster logs: | openshift_cluster_id="fcb727d6-3e61-4d23-913d-756cf41c7982"
      6. NERC Admins have access to application logs.
      7. Infrastructure and audit logs have always been reserved to cluster admins in OpenShift Logging ( even on the old stack with Elasticsearch). LokiStack is best configured for admin access via a group (currently we support three dedicated names cluster-admin, dedicated-admin and the standard group for kubeadmin). These groups require a ClusterRoleBinding to the ClusterAdmin ClusterRole.
    • Acceptance tests:
      1. Click here to visit the Multi Cluster Logging.

      2. Explore the logs as described and ensure you are finding the logs you are looking for.

        Application Logging

      3. Add any dashboards and alerts you wish to test.

        1. Log archiving and rollover could run the Ceph Storage out of space. Check on log storage space consumed vs. available using these OpenShift metrics:
          1. OpenShift Data Foundations Ceph Storage Total Storage

            Ceph Total Storage

          2. OpenShift Data Foundations Ceph Storage Storage Used

            Ceph Storage Used

          3. OpenShift Data Foundations Ceph Storage Percent Used

            Ceph Percent Used

  3. Monitoring and logging for the infrastructure hardware and software that is not OpenShift (for example Grafana)

Reporting

  1. Track/report usage of the cluster
    • As an administrator of the cluster, I should be able to view daily, weekly and monthly reports of the cluster infrastructure utilization.
    • Criteria
      1. Administrator logs into the associated XDMoD instance and views reports.
    • Acceptance tests:
      1. As an administrator, check that the XDMoD utilization of OpenShift resources matches the cpu and memory reported in ACM Observability:
        1. As an admin in XDMoD, view the reports for OpenShift resources.

        2. Click here to view the ACM Observability Grafana dashboards. These dashboards provide insights into Control Plane Health, Optimization, Capacity, Utilization and more. You can change the timespan in the top right to show results in terms of minutes, hours, days, months or years.

          Observability Dashboards

  2. Track/report usage of the project

Data Management

  1. Access operational logs for at least 30 days

    • As an administrator of the cluster, I should be able to access operational logs, error messages, alerts, and other relevant data used to investigate and resolve operational issues when they are created. I should also be able to access this information in place in the operations environment for at least 30 days following its creation.
    • Criteria
      1. Administrator logs, error messages, alerts, and other relevant data used to investigate and resolve operational issues in the logging Grafana and Observability Grafana instance for at least 30 days.
    • Acceptance tests:
      1. See the Monitoring and Reporting sections above for information about the logs, alerts, and dashboards.
      2. We are not yet able to configure retention of Loki logs to 30 days, because the RetentionStreamSpec feature is not yet released in the latest Loki Operator. We plan to enable this retention feature when it becomes available. Here is an issue where we are tracking this issue regarding log retention.
  2. Access audit logs for at least 90 days

    • As an administrator of the cluster, I should be able to access operational information that is specifically useful for security audits and investigations (e.g. records of privilege escalations, certificate changes, etc.) when it is created and for 90 days thereafter.
    • Criteria
      1. Access operational information that is specifically useful for security audits and investigations when it is created and for 90 days thereafter.
    • Acceptance tests:
      1. See the Monitoring and Reporting sections above for information about the logs, alerts, and dashboards.
      2. We are not yet able to configure retention of Loki logs to 30 days, because the RetentionStreamSpec feature is not yet released in the latest Loki Operator. We plan to enable this retention feature when it becomes available. Here is an issue where we are tracking this issue regarding log retention.
  3. Operational data should be archived and stored securely monthly

    • All operational data should be archived and stored securely outside the operations environment monthly. This operations data will eventually be provided to researchers after appropriate procedures have been established for protecting any sensitive data and controlling researcher access to the data. Current operations use cases do not call for deleting any archived data. (Defining procedures for allowing researchers access to the archived data is outside the scope of this document.)
    • Criteria
      1. All operational data should be archived and stored securely outside the operations environment monthly.
    • Acceptance tests:
      1. The NERC team will be meeting to discuss the approach and acceptance tests for this use case.
  4. Operations data is archived and then removed

    • Once operations data is archived, it can be removed from the operations environment.
    • Criteria
      1. Once operations data is archived, it can be removed from the operations environment.
    • Acceptance tests:
      1. The NERC team will be meeting to discuss the approach and acceptance tests for this use case.