Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick series of pg_upgrade related work. #898

Merged
merged 23 commits into from
Feb 1, 2025
Merged

Conversation

reshke
Copy link
Contributor

@reshke reshke commented Jan 29, 2025

cherry-picked work from here
https://github.com/greenplum-db/gpdb-archive/commits/main/src/bin/pg_upgrade?after=482967c1b49028cf072c15935462f75bc3e4b045+69

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


Bhuvnesh Chaudhary and others added 10 commits January 29, 2025 10:22
optarg is a global and is not required to be passed to
process_greenplum_option. Also, for bool parameters pg_strdup must not
be called else it will fail with cannot duplicate null pointer (internal error)

Forward-ported from 6X_STABLE: 18224693a18db5243ec76e0d99c2ab2e3ed17ee5
- unmerged paths are deleted; code patch added to correct files
- pg_upgrade code moved from `contrib` to `src/bin/` directory in master

Co-authored-by: Gaurab Dey <gaurabd@vmware.com>
Co-authored-by: Bhuvnesh Chaudhary <bchaudhary@vmware.com>
- Need to be used along with check flag `-c`
- Indent log messages for proper output rendering

The --continue-check-on-fatal flag lets us traverse through the
pg_upgrade --check list without exiting the first error encountered. It
allows the user to get a report of the entire list of non-upgradable
objects in their cluster at one go.

Note: We can't ignore certain checks even with the flag; they may be
critical to the check process . One such example is
heck_proper_datallowconn(), which ensures that we can connect to user
databases to perform the checks themselves.  Specific checks are
multi-step, like the check to ensure that only the install user is the
sole user in the target database. If we get fatal: "could not determine
the number of users", we can't proceed with the check. Thus, in these
cases, the flag will not respect the flag.

- Example:

```
pg_upgrade \
-b /usr/local/5XSTABLE/bin/ \
-B /usr/local/6XSTABLE/bin/ \
-d /data/5XStable/demoDataDir-1 \
-D /data/6XStable/demoDataDir-1 \
-p 15432 \
-P 6000 \
-c \
--continue-check-on-fatal
```

Co-authored-by: Gaurab Dey <gaurabd@vmware.com>
Co-authored-by: Bhuvnesh Chaudhary <bchaudhary@vmware.com>
The flag allows us to skip target cluster checks performed by pg_upgrade
--check. It lets users find out incompatible objects on the source
cluster without the hassle of setting up a target cluster. For instance,
it is often non-trivial to set up GPDB specific extensions, and users
invariably run into a failure in check_loadable_libraries().

- need to be used along with check flag (-c)
- when you are using --skip-target-check, we don't need to provide:
  - new-bindir (-B)
  - new-datadir (-D)
  - new-port (-P)
- we can use it along with `--continue-check-on-fatal`

Example:
```
pg_upgrade \
-b /usr/local/5XSTABLE/bin/ \
-d /data/5XStable/demoDataDir-1 \
-p 15432 \
-c \
--continue-check-on-fatal \
--skip-target-check
```

Co-authored-by: Gaurab Dey <gaurabd@vmware.com>
Co-authored-by: Bhuvnesh Chaudhary <bchaudhary@vmware.com>
Detailed steps that we need to follow to make sure upstream merge is
updated to support additional flags we added for pg_upgrade related to
check mode.

Co-authored-by: Gaurab Dey <gaurabd@vmware.com>
Co-authored-by: Soumyadeep Chakraborty <soumyadeepc@vmware.com>
* Add check for appendonly materialized view to pg_upgrade

A materialized view of append only mode must have invalid relfrozenxid (0).
However, some views might have valid relfrozenxid due to a known code issue.
We need to check the problematical view before upgrading.
The problem can be fixed by issuing
"REFRESH MATERIALIZED VIEW <schemaname>.<viewname>"
with latest GPDB 6 release.
See the PR for details:

https://github.com/greenplum-db/gpdb/pull/11662/
1.genfile.c:Commit b554f88 upgrade the adminpack functions to new version,
the old version function won’t be used. And postgres keep the old version functions in backend,
so add the declaration of these functions in builtins.h file to fix the compiler warning.
2.util.c:warning: function ‘gp_fatal_log’ might be a candidate for ‘gnu_printf’ format attribute
[-Wsuggest-attribute=format], add the attribute gnu_printf to function gp_fatal_log.
3.nodeShareInputScan.c: the format issue, change the whitespace to tab.
gpupgrade relies on a non-zero exit status code when there are errors in
pg_upgrade in order to correctly bubble up errors. Thus, when there is
at least one fatal error during check return a non-zero exit status.
The primary user of pg_upgrade is gpupgrade where users need to know the
full path location to the pg_upgrade check output files.
Since columnar tables in GPDB7 do not have toast tables, trying to set
the toast OID confuses pg_upgrade. Have pg_dump omit those values to
avoid the problem.
@reshke reshke added the cherry-pick cherry-pick upstream commts label Jan 29, 2025
@reshke reshke requested review from x4m and my-ship-it January 29, 2025 17:12
GPDB4 is not supported by pg_upgrade, the minimum
version is GPDB5 (Postgres 8.3)

Authored-by: Brent Doil <bdoil@vmware.com>
Authored-by: Brent Doil <bdoil@vmware.com>
Authored-by: Brent Doil <bdoil@vmware.com>
Authored-by: Brent Doil <bdoil@vmware.com>
The following situation is resolved by 2122e7a5b02, which added
partition children to the TableInfo array created by getTables.

 create table parttab (i int4, p serial) partition by range (i) (start (1) end (2));
 create table ex (i int4, p serial) ;
 alter table parttab exchange partition for (rank(1)) with table ex;

 After these commands, the sequence implictly created for ex.p
 column, 'ex_p_seq', is still Owned By the original 'ex' table,
 which is now partition of 'parttab'. But because partitions are not
 put into the list of tables, findTableByOid() will return NULL.

findTableByOid() will no longer return NULL in the above situation
and the sanity check can be reintroduced.

Authored-by: Brent Doil <bdoil@vmware.com>
Authored-by: Brent Doil <bdoil@vmware.com>
prepare_new_globals() was introduced in b3f8401
which moves handling of database properties from pg_dumpall to pg_dump.

pg_dump handles all properties attached to a single database while
pg_dumpall --globals-only --resource-groups --resource-queues will
dump role, tablespace, resource group, and resource queue related
output.

Restoring the databases is now handled in create_new_objects.

is_appendonly() can be used to determine if a relation is AO.

Authored-by: Brent Doil <bdoil@vmware.com>
adam8157 and others added 5 commits January 29, 2025 20:05
We have no plan to support that, backup and restore instead.
Disable pg_upgrade's broken parallel tablespace transfer. By doing this,
we can utilize the remaining parallelism from the --jobs flag to achieve
a performance boost.

The --jobs flag in pgupgrade allows for parallelization in two ways:

1. It enables parallel schema upgrades by running multiple instances of
   pg_dump/pg_restore to upgrade the database schema simultaneously.
2. It supports parallel data transfer of tablespaces when using link
   mode.

However, there is currently a bug in the parallel tablespace transfer
functionality that causes pg_upgrade to fail with the following error:

```
error while creating link for relation [relation_name]
      ([old_relfilenode] to [new_relfilenode]): File exists
Failure, exiting
```
Since pg_upgrade checks can take a long time add --skip-checks. This is
useful for internal testing to reduce the feedback cycle.

The skip was implemented using a label to reduce the amount of diffs for
upstream merges.

Co-authored-by: Kalen Krempely <kkrempely@vmware.com>
Co-authored-by: Kevin Yeap <kyeap@vmware.com>
Before:
```
Checking for roles starting with "pg_"                      ok
Checking for appendonly materialized view with relfrozenxid
ok
Checking for presence of required libraries                 ok

```

After:
```
Checking for roles starting with "pg_"                      ok
Checking for appendonly materialized view with relfrozenxid ok
Checking for presence of required libraries                 ok

```
@yjhjstz
Copy link
Member

yjhjstz commented Jan 30, 2025

pls do not add src/test/regress/regression.diffs.

@reshke
Copy link
Contributor Author

reshke commented Jan 30, 2025

pls do not add src/test/regress/regression.diffs.

In GPDB 6 they used to be ignored...
But in version 7 it is not due to greenplum-db/gpdb-archive@ccf292c
well... maybe revert this?)

@yjhjstz
Copy link
Member

yjhjstz commented Jan 30, 2025

greenplum-db/gpdb-archive@ccf292c

committed on Dec 18, 2014, too old for now.

@reshke reshke merged commit 332a444 into apache:main Feb 1, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.