Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade All Trusty Nodes to Ubuntu Latest #2036

Open
8 of 12 tasks
mekarpeles opened this issue Apr 9, 2019 · 11 comments
Open
8 of 12 tasks

Upgrade All Trusty Nodes to Ubuntu Latest #2036

mekarpeles opened this issue Apr 9, 2019 · 11 comments
Assignees
Labels
Affects: Configuration Related to the configuration of the dev/staging/prod environments, CI, docker, etc. [managed] Affects: Operations Affects the IA DevOps folks Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Needs: Detail Submitter needs to provide more detail for this issue to be assessed (see comments). [managed] Priority: 2 Important, as time permits. [managed] Theme: Development Issues related to the developer experience and the dev environment. [managed] Theme: Provisioning Type: Epic A feature or refactor that is big enough to require subissues. [managed]

Comments

@mekarpeles
Copy link
Member

mekarpeles commented Apr 9, 2019

Related to #703 (see aspirational #680)

  • Prove provisioning a generic minimal xenail VM (e.g. of the ol-mem flavor) and add it to the ol cluster (e.g. as ol-mem4)
  • Codify the ol-mem approach using Ansible @abezella -- following Switch Provisioning to use Ansible w/ a Production & Developer playbook #680 (comment)
  • Prove provisioning of an openlibrary-specific xenial VM (e.g. ol-web3) using Docker and add it to the ol cluster (e.g. as ol-web1). This requires preserving the existing /opt/openlibrary directory as a detachable volume and preserving to new instance.
  • Codify the openlibrary-specific approach (using Ansible + Docker) such that new ol-web xenial nodes can be added automatically into the ol pool.

View Architecture & Provisioning docs on the Wiki

Remaining Trusty Machines

Requirements

The Trusty 14.04 release of the Ubuntu operating system will reach end of life for LTS (long-term support) at the end of 2019. After this time, our VMs may no longer receive necessary security updates. Therefore, before 2020, we are required to re-provision all our ~11 production Open Library VMs to run Xenail.

Current Production Architecture

Today, our production service architecture consists of ~11 VMs:
68747470733a2f2f617263686976652e6f72672f646f776e6c6f61642f6f70656e6c6962726172792d646f63756d656e746174696f6e2f6f70656e6c6962726172792d70726f64756374696f6e2d6172636869746563747572652e706e67
(see: https://github.com/internetarchive/openlibrary/wiki/Production-Service-Architecture)

Current Provisioning Setup

Our current production setup process (as of 2019) for provisioning these 11 VMs is ostensibly manual and relies on a lot of manually scping directories around, as well as a separate repository called olsystem which contains the production configs, cron jobs, and infrastructure required to run the official openlibrary.org service.

Each of our 11 VMs are more-or-less provisioned identically:

  • Every VM has an /opt directory containing all the "business"
  • Within /opt there is an openlibrary/ and a petabox/ directory. It's very likely /opt/petabox is not required by all VMs, though it's not currently well understood which services may rely on it (e.g. the ol-home VM makes heavy use of olsystem which may reference petabox)
  • /opt/openlibrary contains all the business logic for the Open Library project:
/opt/
/opt/petabox
/opt/openlibrary
/opt/openlibrary/venv  -- python virtualenv
/opt/openlibrary/maxmind-geoip/  -- .dat file for anonymizing IPs
/opt/openlibrary/deploys  -- history of all deploys, hash-binned by service
/opt/openlibrary/deploys/openlibrary  -- history of openlibrary deploys
/opt/openlibrary/deploys/olsystem  -- history of openlibrary deploys
/opt/openlibrary/deploys/base  -- deprecated??
/opt/openlibrary/deploys/openlibrary/openlibrary  -- active openlibrary deploy
/opt/openlibrary/deploys/openlibrary/openlibrary  -- active olsystem deploy
/opt/openlibrary/olsystem/  -- symlink to active olsystem: /opt/deploys/openlibrary/olsystem
/opt/openlibrary/openlibrary -- symlink to active openlibrary: /opt/deploys/openlibrary/olsystem

Minimum Proposal

At minimum, re-provisioning a VM requires:

  • setting up firewall rules and installing core packages (e.g. git, docker) by running an ansible playbook
  • scp'ing over the legacy VM's /opt directory (preferably as an external mountable /1 volume which can be moved in the future)
  • Setting up olsystem so that its files within /opt/openlibrary/olsystem/etc symlink to the right locations within /etc

To copy /opt over from another server you'll have to:
on ol-mem2: sudo tar cpSlf /var/tmp/ol.tar --same-owner -C /opt openlibrary
scp /var/tmp/ol.tar ol-mem4:/var/tmp/ol.tar
on ol-mem4: tar xpBsf /var/tmp/ol.tar --same-owner -C /opt (edited)
(due to keys and needing to be root to get all of it i don't there's an easy way to just scp or rsync)

Ideal Proposal

An aspirational goal of this epic is to migrate Open Library VM provisioning to use a standard Ansible playbook (and possibly docker containers, a la our development environment) to support this re-provisioning.

Part of this effort includes decreasing production's dependence on the olsystem repository a la #680. Both developer and production systems should use have similar docker recipes and differ according to ansible playbooks.

Plan

The plan is to start with ol-mem0, ol-mem1, and ol-mem2 as they don't really require any infrastructure other than:

  1. setup 3 new memcached servers ol-mem3, ol-mem4, ol-mem4
  2. provision VMs with default ansible playbook: setup firewall rules + install docker
  3. use VM-specific ansible playbook to install setup docker w/ memcached (with upstart)
  4. update /opt/openlibrary/olsystem/etc/openlibrary.yml and infobase.yml configs to reference correct new memcached servers
  5. /etc (e.g. memcached) to symlink to the correct system configs in /opt/openlibrary/olsystem/etc/
  6. update /opt/openlibrary/olsystem/fabfile.py supervisord to update how memcached servers should be restarted (and to not deploy to ol-mem* during deploy)
  7. remove old memcached servers from the pool (one at a time)
@mekarpeles mekarpeles added Priority: 1 Do this week, receiving emails, time sensitive, . [managed] Theme: Provisioning Needs: Detail Submitter needs to provide more detail for this issue to be assessed (see comments). [managed] devops labels Apr 9, 2019
@mekarpeles
Copy link
Member Author

mekarpeles commented Apr 11, 2019

This task needs a checklist:

  • Prove provisioning a generic minimal xenial VM (e.g. of the ol-mem flavor) and add it to the ol cluster (e.g. as ol-mem4)
  • Codify the ol-mem approach using Ansible @abezella -- following Switch Provisioning to use Ansible w/ a Production & Developer playbook #680 (comment)
  • Prove provisioning of an openlibrary-specific xenial VM (e.g. ol-web3) using Docker and add it to the ol cluster (e.g. as ol-web1). This requires preserving the existing /opt/openlibrary directory as a detachable volume and preserving to new instance.
  • Codify the openlibrary-specific approach (using Ansible + Docker) such that new ol-web xenial nodes can be added automatically into the ol pool.

@mekarpeles mekarpeles added Type: Epic A feature or refactor that is big enough to require subissues. [managed] Requires Sub-Task Creation labels Apr 16, 2019
@brad2014 brad2014 added Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] and removed Requires Sub-Task Creation labels Apr 29, 2019
@brad2014 brad2014 added Affects: Configuration Related to the configuration of the dev/staging/prod environments, CI, docker, etc. [managed] and removed devops labels May 14, 2019
@tfmorris
Copy link
Contributor

Since Ubuntu 18.04 Bionic Beaver has been out for over a year, would it make more sense to skip Xenial?

@xayhewalo xayhewalo added the State: Work In Progress This issue is being actively worked on. [managed] label Oct 20, 2019
@mekarpeles mekarpeles added Priority: 2 Important, as time permits. [managed] and removed Priority: 1 Do this week, receiving emails, time sensitive, . [managed] labels Oct 30, 2019
@mekarpeles mekarpeles removed the State: Work In Progress This issue is being actively worked on. [managed] label Nov 25, 2019
@mekarpeles mekarpeles added Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] labels Dec 17, 2019
@tfmorris
Copy link
Contributor

Our Xenial Docker-based development environment is producing dire warnings Node.js and the bundled version of pip doesn't work. I think we should upgrade our dev environment to Bionic ASAP in preparation for a production move to Bionic.

@hornc hornc removed their assignment Mar 7, 2020
@mekarpeles mekarpeles changed the title 2021 H1 Upgrade All Nodes to Ubuntu Focal 2021 Q2 Upgrade All Nodes to Ubuntu Focal Feb 9, 2021
@cclauss
Copy link
Contributor

cclauss commented Feb 15, 2021

% cat ./ubuntu_versions.sh

#!/bin/bash

# Which Ubuntu release are we running on?  Do not fail if /etc/os-release does not exist.
# cat /etc/os-release | grep VERSION= || true  # VERSION="20.04.1 LTS (Focal Fossa)"

SERVERS="ol-backup0 ol-covers0 ol-db1 ol-db2 ol-dev0 ol-dev1 ol-home ol-home0 ol-mem0 ol-mem1 ol-mem2 ol-solr0 ol-solr1 ol-web1 ol-web2 ol-www0"
parallel --quote ssh {} "hostname --short ; cat /etc/os-release | grep VERSION= ; docker --version ; docker compose version || true" ::: $SERVERS

@jimman2003
Copy link
Contributor

jimman2003 commented Jun 11, 2021

Some of the PPAs might have deleted the xenial debs/packages so the CI is failng.. So we should bump this up?

@mekarpeles
Copy link
Member Author

mekarpeles commented Jun 11, 2021

#2036 (comment)

@dhruvmanila, @cclauss, or @BharatKalluri -- is this one you may have a few minutes to quickly investigate? If it seems like it may be a pain, @cdrini and I can prioritize for next week. @cdrini is currently PTO and I'm getting my 2nd covid shot tomorrow and will likely be out of commission for at least some of the weekend 😬

@cclauss
Copy link
Contributor

cclauss commented Jun 11, 2021

It would be important to look as upgrading both:

  1. https://github.com/internetarchive/openlibrary/blob/master/docker/Dockerfile.olbase#L1
  2. https://github.com/internetarchive/openlibrary/blob/master/.github/workflows/python_tests.yml#L13

@jimman2003
Copy link
Contributor

also:

FROM ubuntu:xenial

@cclauss
Copy link
Contributor

cclauss commented Oct 21, 2021

On ol-mem0...

  • sudo apt-get install memcached && memcached --version # --> memcached 1.5.22
  • firm rules
  • do we need to git pull ol-system and openlibrary?
  • Web 1 and Web 2 will need access to which memcache servers to use.
  • Is there a "performance config"
  • Coverstore yaml, ansible hosts, ansible
  • ol-system -- grep ol-system and openlibrary for the config

https://github.com/internetarchive/olsystem/search?q=memcache
ol-mem

https://internetarchive.slack.com/archives/G019YBYM35M/p1602178331011000

https://internetarchive.slack.com/archives/G019YBYM35M/p1602179875012700?thread_ts=1602178331.011000&cid=G019YBYM35M

https://github.com/internetarchive/openlibrary/wiki/Production-Service-Architecture

@mekarpeles mekarpeles added this to the Next (proposed) milestone Feb 28, 2022
@mekarpeles mekarpeles changed the title 2021 Q2 Upgrade All Nodes to Ubuntu Focal Upgrade All Nodes to Ubuntu Focal Feb 28, 2022
@mekarpeles
Copy link
Member Author

I think we're close:
ol-db[1,2], ol-backup, ol-www0->1, and presumably ol-home disappears if we can remove stats-solr (?)
@cclauss ?? :|

@mekarpeles mekarpeles changed the title Upgrade All Nodes to Ubuntu Focal Upgrade All Trusty Nodes to Ubuntu Latest Feb 8, 2023
@mekarpeles mekarpeles modified the milestones: Next (proposed), 2023 Mar 21, 2023
@cclauss
Copy link
Contributor

cclauss commented Mar 23, 2023

Using the script at #7676 (comment)

ol-home0% ./ubuntu_versions.sh

  • ol-db1: Permission denied (publickey). --> Try updating local environment to postgres 13 or 16 #5686
  • ol-dev1: VERSION="18.04.6 LTS (Bionic Beaver)"
  • ol-home: VERSION="14.04.1 LTS, Trusty Tahr"
  • ol-www1: VERSION="14.04.1 LTS, Trusty Tahr" --> Replace ol-www1 with docker-based ol-www0 #4252
  • ol-covers0: VERSION="20.04.6 LTS (Focal Fossa)"
  • ol-home0: VERSION="20.04.6 LTS (Focal Fossa)"
  • ol-mem0: VERSION="20.04.6 LTS (Focal Fossa)"
  • ol-mem1: VERSION="20.04.6 LTS (Focal Fossa)"
  • ol-mem2: VERSION="20.04.6 LTS (Focal Fossa)"
  • ol-solr0: VERSION="20.04.6 LTS (Focal Fossa)"
  • ol-solr1: VERSION="20.04.6 LTS (Focal Fossa)"
  • ol-web1: VERSION="20.04.6 LTS (Focal Fossa)"
  • ol-web2: VERSION="20.04.6 LTS (Focal Fossa)"
  • ol-www0: VERSION="20.04.6 LTS (Focal Fossa)"

@cclauss cclauss modified the milestones: 2023, Sprint 2023-04 Mar 24, 2023
@cdrini cdrini modified the milestones: Sprint 2023-04, 2023 Mar 27, 2023
@mekarpeles mekarpeles added Needs: Lead Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] and removed Lead: @cclauss Issues overseen by Chris (Python3 & Dev-ops lead 2019-2021) [managed] Needs: Lead labels Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects: Configuration Related to the configuration of the dev/staging/prod environments, CI, docker, etc. [managed] Affects: Operations Affects the IA DevOps folks Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] Needs: Breakdown This big issue needs a checklist or subissues to describe a breakdown of work. [managed] Needs: Detail Submitter needs to provide more detail for this issue to be assessed (see comments). [managed] Priority: 2 Important, as time permits. [managed] Theme: Development Issues related to the developer experience and the dev environment. [managed] Theme: Provisioning Type: Epic A feature or refactor that is big enough to require subissues. [managed]
Projects
None yet
Development

No branches or pull requests

8 participants