[#1127] YSQL: Collation Support (part 2)

Summary: Currently YSQL has very limited support for collation. Only C collation is supported. Collation is about text data sorting rules and postgres provides collation support via the underlining OS library including libc and libicu. Simply put, given two text values (aka strings) s1 and s2, collation sort comparison must be done via strcoll(s1, s2) rather than strcmp(s1, s2). Postgres supports two types of collations: deterministic collations and non-deterministic collations. Postgres 11.2 only supports deterministic collations. Postgres 12 and beyond support both deterministic collations and non-deterministic collations. Non-deterministic collation comparison of s1 and s2 directly translates to strcoll(s1, s2), and deterministic collation comparison uses strcmp(s1, s2) as a "tie-breaker" when strcoll(s1, s2) == 0. This is the second change to support deterministic collation in YSQL to bring YSQL collation support in sync with PG 11.2. With this change, --with-icu is now always on when building postgres so that ICU code is always built into postgres. The normal build are not expected to have any behavior change and YSQL will continue to report collation related errors. We'll produce a special build with ICU enabled via a new gflag: FLAGS_TEST_pg_collation_enabled=true. This gflag is used during development to enable collation support. However, once collation encoded data lands in docdb, turning off this gflag is not supported. The plan is that once we fully support deterministic collations in YSQL, we'll turn on FLAGS_TEST_pg_collation_enabled=true by default. The high level changes are: (1) In addition to type and modifier, for character data types that YSQL expects, collation related information is now also passed from postgres layer to PgGate when constructing any PgExpr. Collation info currently includes: ``` typedef struct PgCollationInfo { bool collate_is_valid_non_c; const char *sortkey; } YBCPgCollationInfo; ``` collate_is_valid_non_c indicates whether the collation is a valid non-C collation. A non-C collation requires strcoll semantics. In contrast C collation only requires strcmp semantics which is much more efficient. sortkey is the collation sortkey that will be stored in docdb and used for byte-wise comparison. It is null-terminated. (2) When building PgExpr, collate_is_valid_non_c is recorded as 'collate_is_valid_non_c_' in PgExpr. (3) When building PgConstant, sortkey is recorded as collation_sortkey_ in PgConstant. (4) For C collation PgConstant, collation_sortkey_ should be nullptr and we continue to store the string value that postgres layer has passed down as we currently do. In this way we try to minimize the performance penalty for text data that do not use collation. (5) For non-C collation PgConstant, we store a collation-encoded string value. which is composed of a fixed leading bytes, followed by a null-terminated sort key, followed by the original character string value. (6) A sort key of a string value is computed such that strcmp(sortkey(s1), sortkey(s2)) == strcoll(s1, s2). Two different s1 and s2 may have identical sort keys. For deterministic collations, we need to use strcmp(s1, s2) as a tie-breaker when strcmp(sortkey(s1), sortkey(s2)) == 0. That's why the original string value is appended after the null-terminated sort key to complete a collation-encoded string. (7) The collation-encoded string value replaces the original value and is sent to DocDB for storage. (8) On the way back from DocDB, PgGate will use the null byte that separates the sort key from the original string value to strip off the leading bytes and the sort key, only hands the original string value over to postgres layer above. (9) A new test only C constant kTestOnlyUseOSDefaultCollation is used for testing purpose. It can only be set when FLAGS_pg_collation_enabled is true (otherwise assertion fails). Once set to true, a clean build is needed which includes initdb and CreateInitialSysCatalogSnapshot. All the initial databases will have text columns default to en_US.UTF-8 collation. Therefore in regression tests all text columns in user tables will also have en_US.UTF-8 collation. Because this is a non-C collation, the new code path that stores collation-encoded strings is tested. (10) Planed optimization: store only the original value for a non-key column to save storage space, also for hash key, consider store only the original value as well if feasible. (11) Known issue: a sort key is a null-terminated byte sequence (without any embedded \0 byte). As a result the collation encoded string may contain invalid UTF-8 characters. However it is set as a QLValue string field in place of the original string value which is UTF-8. Protobuf reports invalid UTF-8 as an ERROR even though the collation-encoded string is still sent across the wire without any loss. Test Plan: 1. Do default build and run regression tests. Verify: (1.1) pg_collation has only 4 rows as before. ``` yugabyte=# select * from pg_collation; select * from pg_collation; collname | collnamespace | collowner | collprovider | collencoding | collcollate | collctype | collversion -----------+---------------+-----------+--------------+--------------+------------- +-----------+------------- default | 11 | 10 | d | -1 | | | C | 11 | 10 | c | -1 | C | C | POSIX | 11 | 10 | c | -1 | POSIX | POSIX | ucs_basic | 11 | 10 | c | 6 | C | C | (4 rows) ``` (1.2) We continue to report collation related errors. ``` yugabyte=# create collation nd (provider = 'libc', locale=''); create collation nd (provider = 'libc', locale=''); ERROR: CREATE COLLATION not supported yet LINE 1: create collation nd (provider = 'libc', locale=''); ^ HINT: Please report the issue on https://github.com/YugaByte/yugabyte-db/issues yugabyte=# create table foo(id text collate "C"); create table foo(id text collate "C"); ERROR: COLLATE not supported yet LINE 1: create table foo(id text collate "C"); ^ HINT: See #1127. Click '+' on the description to raise its priority ``` (1.3) All initial databases have collation "C" as expected. ``` yugabyte=# select datname, encoding, datcollate from pg_database; select datname, encoding, datcollate from pg_database; datname | encoding | datcollate -----------------+----------+------------ template1 | 6 | C template0 | 6 | C postgres | 6 | C yugabyte | 6 | C system_platform | 6 | C (5 rows) ``` 2. Do build with FLAGS_TEST_pg_collation_enabled=true, also in YBIsCollationEnabled set default env value of YBCIsEnvVarTrueWithDefault("FLAGS_TEST_pg_collation_enabled" to true, run regression tests. Verify: (2.1) pg_collation has many rows because it has imported many ICU collations from OS provided libicu library. ``` yugabyte=# select * from pg_collation limit 10; select * from pg_collation limit 10; collname | collnamespace | collowner | collprovider | collencoding | collcollat e | collctype | collversion -------------+---------------+-----------+--------------+--------------+----------- --+------------+------------- default | 11 | 10 | d | -1 | | | C | 11 | 10 | c | -1 | C | C | POSIX | 11 | 10 | c | -1 | POSIX | POSIX | ucs_basic | 11 | 10 | c | 6 | C | C | en_US.utf8 | 11 | 10 | c | 6 | en_US.utf8 | en_US.utf8 | en_US | 11 | 10 | c | 6 | en_US.utf8 | en_US.utf8 | und-x-icu | 11 | 10 | i | -1 | und | und | 153.14 af-x-icu | 11 | 10 | i | -1 | af | af | 153.14.37 af-NA-x-icu | 11 | 10 | i | -1 | af-NA | af-NA | 153.14.37 af-ZA-x-icu | 11 | 10 | i | -1 | af-ZA | af-ZA | 153.14.37 (10 rows) yugabyte=# select count(*) from pg_collation; select count(*) from pg_collation; count ------- 789 (1 row) ``` (2.2) All initial databases have collation "C" as expected. ``` yugabyte=# select datname, encoding, datcollate from pg_database; select datname, encoding, datcollate from pg_database; datname | encoding | datcollate -----------------+----------+------------ template1 | 6 | C template0 | 6 | C postgres | 6 | C yugabyte | 6 | C system_platform | 6 | C (5 rows) ``` (2.3) We can now create collation and also support column collation. ``` yugabyte=# create collation nd (provider = 'libc', locale=''); create collation nd (provider = 'libc', locale=''); CREATE COLLATION yugabyte=# create table foo(id text collate "C"); create table foo(id text collate "C"); CREATE TABLE yugabyte=# create table bar(id text collate "en_US.utf8"); create table bar(id text collate "en_US.utf8"); ``` (2.4) We can see collation "C" and "en_US.utf8" sort 'a' and 'A' differently which is consistent with postgres. ``` yugabyte=# insert into foo values ('a'); insert into foo values ('a'); INSERT 0 1 yugabyte=# insert into foo values ('A'); insert into foo values ('A'); INSERT 0 1 yugabyte=# select id from foo order by id; select id from foo order by id; id ---- A a (2 rows) yugabyte=# insert into bar values ('a'); insert into bar values ('a'); INSERT 0 1 yugabyte=# insert into bar values ('A'); insert into bar values ('A'); INSERT 0 1 yugabyte=# select id from bar order by id; select id from bar order by id; id ---- a A (2 rows) ``` 3. Do build with same gflags settings as 2, also set kTestOnlyUseOSDefaultCollation=true and run regression tests. Verify: (3.1) Same as (2.1) (3.2) All initial databases have collation "en_US.UTF-8". ``` yugabyte=# select datname, encoding, datcollate from pg_database; select datname, encoding, datcollate from pg_database; datname | encoding | datcollate -----------------+----------+------------- template1 | 6 | en_US.UTF-8 template0 | 6 | en_US.UTF-8 postgres | 6 | en_US.UTF-8 yugabyte | 6 | en_US.UTF-8 system_platform | 6 | en_US.UTF-8 (5 rows) ``` (3.3) Same as (2.3) (3.4) Because "en_US.UTF-8" is the default collation, a simple text column will use it implicitly and will be sorted according to collation "en_US.UTF-8". ``` yugabyte=# create table text_tab (id text); create table text_tab (id text); CREATE TABLE yugabyte=# insert into text_tab values ('a'); insert into text_tab values ('a'); INSERT 0 1 yugabyte=# insert into text_tab values ('A'); insert into text_tab values ('A'); INSERT 0 1 yugabyte=# select id from text_tab order by id; select id from text_tab order by id; id ---- a A (2 rows) ``` (3.5) Explicitly specified collation "C" will be sorted according to collation "C". ``` yugabyte=# create table text_tab2(id text collate "C"); create table text_tab2(id text collate "C"); CREATE TABLE yugabyte=# insert into text_tab2 values ('a'); insert into text_tab2 values ('a'); INSERT 0 1 yugabyte=# insert into text_tab2 values ('A'); insert into text_tab2 values ('A'); INSERT 0 1 yugabyte=# select id from text_tab2 order by id; select id from text_tab2 order by id; id ---- A a (2 rows) ``` Reviewers: mihnea, mbautin, neil, dmitry Reviewed By: dmitry Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D12330
yugabyte · Sep 8, 2021 · 6c8deb0 · 6c8deb0
1 parent 39bf768
commit 6c8deb0
Show file tree

Hide file tree

Showing 40 changed files with 830 additions and 179 deletions.
diff --git a/build-support/ubsan-suppressions.txt b/build-support/ubsan-suppressions.txt
@@ -101,3 +101,7 @@ vptr:libcassandra.so
 # are such castings with overflow in a few tests. (For example: QLTestSelectedExpr.TestCastDecimal)
 # (See https://github.com/YugaByte/yugabyte-db/issues/1241 for details.)
 float-cast-overflow:yb::bfql::SetNumericResult
+
+# UBSAN issues in ICU 67.
+nonnull-attribute:libicuuc.so
+function:libicuuc.so
diff --git a/python/yb/build_postgres.py b/python/yb/build_postgres.py
@@ -442,6 +442,7 @@ def configure_postgres(self) -> None:
                 '--prefix', self.pg_prefix,
                 '--with-extra-version=-YB-' + self.get_yb_version(),
                 '--enable-depend',
+                '--with-icu',
                 '--with-ldap',
                 '--with-openssl',
                 # Options are ossp (original/old implementation), bsd (BSD) and e2fs
@@ -454,8 +455,6 @@ def configure_postgres(self) -> None:
                 '--enable-debug']
         if not get_bool_env_var('YB_NO_PG_CONFIG_CACHE'):
             configure_cmd_line.append('--config-cache')
-        if get_bool_env_var('YB_POSTGRES_WITH_ICU'):
-            configure_cmd_line.append('--with-icu')
 
         # We get readline-related errors in ASAN/TSAN, so let's disable readline there.
         if self.build_type in ['asan', 'tsan']:

diff --git a/src/postgres/src/backend/access/ybc/ybcam.c b/src/postgres/src/backend/access/ybc/ybcam.c
@@ -163,8 +163,9 @@ static Oid ybc_get_atttypid(TupleDesc bind_desc, AttrNumber attnum)
 static void ybcBindColumn(YbScanDesc ybScan, TupleDesc bind_desc, AttrNumber attnum, Datum value, bool is_null)
 {
 	Oid	atttypid = ybc_get_atttypid(bind_desc, attnum);
+	Oid	attcollation = ybc_get_attcollation(bind_desc, attnum);
 
-	YBCPgExpr ybc_expr = YBCNewConstant(ybScan->handle, atttypid, value, is_null);
+	YBCPgExpr ybc_expr = YBCNewConstant(ybScan->handle, atttypid, attcollation, value, is_null);
 
 	HandleYBStatus(YBCPgDmlBindColumn(ybScan->handle, attnum, ybc_expr));
 }
@@ -173,14 +174,17 @@ static void ybcBindColumnCondBetween(YbScanDesc ybScan, TupleDesc bind_desc, Att
                                      bool start_valid, Datum value, bool end_valid, Datum value_end)
 {
 	Oid	atttypid = ybc_get_atttypid(bind_desc, attnum);
+	Oid	attcollation = ybc_get_attcollation(bind_desc, attnum);
 
 	YBCPgExpr ybc_expr = start_valid ? YBCNewConstant(ybScan->handle,
 													  atttypid,
+													  attcollation,
 													  value,
 													  false /* isnull */)
 									 : NULL;
 	YBCPgExpr ybc_expr_end = end_valid ? YBCNewConstant(ybScan->handle,
 														atttypid,
+														attcollation,
 														value_end,
 														false /* isnull */)
 									   : NULL;
@@ -195,6 +199,7 @@ static void ybcBindColumnCondIn(YbScanDesc ybScan, TupleDesc bind_desc, AttrNumb
                                 int nvalues, Datum *values)
 {
 	Oid	atttypid = ybc_get_atttypid(bind_desc, attnum);
+	Oid	attcollation = ybc_get_attcollation(bind_desc, attnum);
 
 	YBCPgExpr ybc_exprs[nvalues]; /* VLA - scratch space */
 	for (int i = 0; i < nvalues; i++) {
@@ -203,7 +208,8 @@ static void ybcBindColumnCondIn(YbScanDesc ybScan, TupleDesc bind_desc, AttrNumb
 		 * getting here (relying on btree/lsm operators being strict).
 		 * So we can safely set is_null to false for all options left here.
 		 */
-		ybc_exprs[i] = YBCNewConstant(ybScan->handle, atttypid, values[i], false /* is_null */);
+		ybc_exprs[i] = YBCNewConstant(ybScan->handle, atttypid, attcollation,
+									  values[i], false /* is_null */);
 	}
 
 	HandleYBStatus(YBCPgDmlBindColumnCondIn(ybScan->handle, attnum, nvalues, ybc_exprs));
@@ -216,6 +222,7 @@ static void ybcAddTargetColumn(YbScanDesc ybScan, AttrNumber attnum)
 {
 	/* Regular (non-system) attribute. */
 	Oid atttypid = InvalidOid;
+	Oid attcollation = InvalidOid;
 	int32 atttypmod = 0;
 	if (attnum > 0)
 	{
@@ -225,10 +232,11 @@ static void ybcAddTargetColumn(YbScanDesc ybScan, AttrNumber attnum)
 			return;
 		atttypid = attr->atttypid;
 		atttypmod = attr->atttypmod;
+		attcollation = attr->attcollation;
 	}
 
 	YBCPgTypeAttrs type_attrs = { atttypmod };
-	YBCPgExpr expr = YBCNewColumnRef(ybScan->handle, attnum, atttypid, &type_attrs);
+	YBCPgExpr expr = YBCNewColumnRef(ybScan->handle, attnum, atttypid, attcollation, &type_attrs);
 	HandleYBStatus(YBCPgDmlAppendTarget(ybScan->handle, expr));
 }
 
@@ -1512,6 +1520,11 @@ static double ybcIndexEvalClauseSelectivity(Bitmapset *qual_cols,
 	return YBC_HASH_SCAN_SELECTIVITY;
 }
 
+Oid ybc_get_attcollation(TupleDesc desc, AttrNumber attnum)
+{
+	return attnum > 0 ? TupleDescAttr(desc, attnum - 1)->attcollation : InvalidOid;
+}
+
 void ybcIndexCostEstimate(IndexPath *path, Selectivity *selectivity,
 						  Cost *startup_cost, Cost *total_cost)
 {
@@ -1639,6 +1652,7 @@ HeapTuple YBCFetchTuple(Relation relation, Datum ybctid)
 	/* Bind ybctid to identify the current row. */
 	YBCPgExpr ybctid_expr = YBCNewConstant(ybc_stmt,
 										   BYTEAOID,
+										   InvalidOid,
 										   ybctid,
 										   false);
 	HandleYBStatus(YBCPgDmlBindColumn(ybc_stmt, YBTupleIdAttributeNumber, ybctid_expr));
@@ -1649,20 +1663,21 @@ HeapTuple YBCFetchTuple(Relation relation, Datum ybctid)
 	if (RelationGetForm(relation)->relhasoids)
 	{
 		YBCPgTypeAttrs type_attrs = { 0 };
-		YBCPgExpr   expr = YBCNewColumnRef(ybc_stmt, ObjectIdAttributeNumber, InvalidOid,
-										   &type_attrs);
+		YBCPgExpr   expr = YBCNewColumnRef(ybc_stmt, ObjectIdAttributeNumber,
+										   InvalidOid, InvalidOid, &type_attrs);
 		HandleYBStatus(YBCPgDmlAppendTarget(ybc_stmt, expr));
 	}
 	for (AttrNumber attnum = 1; attnum <= tupdesc->natts; attnum++)
 	{
 		Form_pg_attribute att = TupleDescAttr(tupdesc, attnum - 1);
 		YBCPgTypeAttrs type_attrs = { att->atttypmod };
-		YBCPgExpr   expr = YBCNewColumnRef(ybc_stmt, attnum, att->atttypid, &type_attrs);
+		YBCPgExpr   expr = YBCNewColumnRef(ybc_stmt, attnum, att->atttypid,
+										   att->attcollation, &type_attrs);
 		HandleYBStatus(YBCPgDmlAppendTarget(ybc_stmt, expr));
 	}
 	YBCPgTypeAttrs type_attrs = { 0 };
-	YBCPgExpr   expr = YBCNewColumnRef(ybc_stmt, YBTupleIdAttributeNumber, InvalidOid,
-									   &type_attrs);
+	YBCPgExpr   expr = YBCNewColumnRef(ybc_stmt, YBTupleIdAttributeNumber,
+									   InvalidOid, InvalidOid, &type_attrs);
 	HandleYBStatus(YBCPgDmlAppendTarget(ybc_stmt, expr));
 
 	/*
@@ -1737,6 +1752,7 @@ ybBeginSample(Relation rel, int targrows)
 		YBCPgExpr	expr = YBCNewColumnRef(ybSample->handle,
 										   ObjectIdAttributeNumber,
 										   InvalidOid,
+										   InvalidOid,
 										   &type_attrs);
 		HandleYBStatus(YBCPgDmlAppendTarget(ybSample->handle, expr));
 	}
@@ -1747,6 +1763,7 @@ ybBeginSample(Relation rel, int targrows)
 		YBCPgExpr   expr = YBCNewColumnRef(ybSample->handle,
 										   attnum,
 										   att->atttypid,
+										   att->attcollation,
 										   &type_attrs);
 		HandleYBStatus(YBCPgDmlAppendTarget(ybSample->handle, expr));
 	}

diff --git a/src/postgres/src/backend/catalog/ybc_catalog_version.c b/src/postgres/src/backend/catalog/ybc_catalog_version.c
@@ -93,7 +93,7 @@ bool YBCIncrementMasterCatalogVersionTableEntry(bool is_breaking_change)
 	                                  RelationGetDescr(rel));
 
 	/* Bind ybctid to identify the current row. */
-	YBCPgExpr ybctid_expr = YBCNewConstant(update_stmt, BYTEAOID, ybctid,
+	YBCPgExpr ybctid_expr = YBCNewConstant(update_stmt, BYTEAOID, InvalidOid, ybctid,
 										   false /* is_null */);
 	HandleYBStatus(YBCPgDmlBindColumn(update_stmt, YBTupleIdAttributeNumber, ybctid_expr));
 
@@ -128,7 +128,8 @@ bool YBCIncrementMasterCatalogVersionTableEntry(bool is_breaking_change)
 	                                                   (Expr *) expr,
 	                                                   attnum,
 	                                                   INT8OID,
-	                                                   0);
+	                                                   0,
+	                                                   InvalidOid);
 
 	HandleYBStatus(YBCPgDmlAssignColumn(update_stmt, attnum, ybc_expr));
 
@@ -139,10 +140,12 @@ bool YBCIncrementMasterCatalogVersionTableEntry(bool is_breaking_change)
 		params[0].attno = attnum + 1;
 		params[0].typid = INT8OID;
 		params[0].typmod = 0;
+		params[0].collid = InvalidOid;
 
 		params[1].attno = attnum;
 		params[1].typid = INT8OID;
 		params[1].typmod = 0;
+		params[1].collid = InvalidOid;
 
 		YBCPgExpr ybc_expr = YBCNewEvalExprCall(update_stmt, (Expr *) expr, params, 2);
 		HandleYBStatus(YBCPgDmlAssignColumn(update_stmt, attnum + 1, ybc_expr));
@@ -258,6 +261,7 @@ void YBCGetMasterCatalogVersionFromTable(uint64_t *version) {
 	Datum oid_datum = Int32GetDatum(TemplateDbOid);
 	YBCPgExpr pkey_expr = YBCNewConstant(ybc_stmt,
 	                                     oid_attrdesc->atttypid,
+	                                     oid_attrdesc->attcollation,
 	                                     oid_datum,
 	                                     false /* is_null */);
 
@@ -268,7 +272,8 @@ void YBCGetMasterCatalogVersionFromTable(uint64_t *version) {
 	{
 		Form_pg_attribute att = &Desc_pg_yb_catalog_version[attnum - 1];
 		YBCPgTypeAttrs type_attrs = { att->atttypmod };
-		YBCPgExpr   expr = YBCNewColumnRef(ybc_stmt, attnum, att->atttypid, &type_attrs);
+		YBCPgExpr   expr = YBCNewColumnRef(ybc_stmt, attnum, att->atttypid,
+										   att->attcollation, &type_attrs);
 		HandleYBStatus(YBCPgDmlAppendTarget(ybc_stmt, expr));
 	}
 

diff --git a/src/postgres/src/backend/commands/collationcmds.c b/src/postgres/src/backend/commands/collationcmds.c
@@ -585,7 +585,7 @@ pg_import_system_collations(PG_FUNCTION_ARGS)
 			/*
 			 * For libc, Yugabyte only supports the basic locales.
 			 */
-			if (IsYugaByteEnabled() && !IsYBSupportedLibcLocale(localebuf))
+			if (IsYugaByteEnabled() && !YBIsSupportedLibcLocale(localebuf))
 				continue;
 
 			enc = pg_get_encoding_from_locale(localebuf, false);

diff --git a/src/postgres/src/backend/commands/dbcommands.c b/src/postgres/src/backend/commands/dbcommands.c
@@ -415,7 +415,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 							 "https://github.com/yugabyte/yugabyte-db/issues"),
 					 parser_errposition(pstate, dencoding->location)));
 
-		if (dcollate && dbcollate && strcmp(dbcollate, "C") != 0)
+		if (!YBIsCollationEnabled() && dcollate && dbcollate && strcmp(dbcollate, "C") != 0)
 			ereport(YBUnsupportedFeatureSignalLevel(),
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("Value other than 'C' for lc_collate "

diff --git a/src/postgres/src/backend/commands/tablecmds.c b/src/postgres/src/backend/commands/tablecmds.c
@@ -7760,7 +7760,8 @@ YBCloneRelationSetPrimaryKey(Relation* mutable_rel, IndexStmt* stmt)
 				elog(ERROR, "cache lookup failed for type %u", attr_form->atttypid);
 			Form_pg_type attr_type_form = (Form_pg_type) GETSTRUCT(tuple);
 
-			if (attr_form->attcollation != attr_type_form->typcollation)
+			if (!YBIsCollationEnabled() &&
+				attr_form->attcollation != attr_type_form->typcollation)
 				elog(ERROR, "adding primary key to a table with collated columns "
 				            "is not yet implemented");
 

diff --git a/src/postgres/src/backend/commands/ybccmds.c b/src/postgres/src/backend/commands/ybccmds.c
@@ -323,8 +323,8 @@ YBTransformPartitionSplitPoints(YBCPgStatement yb_stmt,
 				{
 					/* Given value is not null. Convert it to YugaByte format. */
 					Const *value = castNode(Const, datums[idx]->value);
-					exprs[idx] = YBCNewConstant(yb_stmt, value->consttype, value->constvalue,
-												false /* is_null */);
+					exprs[idx] = YBCNewConstant(yb_stmt, value->consttype, value->constcollid,
+												value->constvalue, false /* is_null */);
 					break;
 				}
 
@@ -333,7 +333,7 @@ YBTransformPartitionSplitPoints(YBCPgStatement yb_stmt,
 					/* Create MINVALUE in YugaByte format */
 					Form_pg_attribute attr = attrs[idx];
 					exprs[idx] = YBCNewConstantVirtual(yb_stmt, attr->atttypid,
-													   YB_YQL_DATUM_LIMIT_MAX);
+													   attr->attcollation, YB_YQL_DATUM_LIMIT_MAX);
 					break;
 				}
 
@@ -342,7 +342,7 @@ YBTransformPartitionSplitPoints(YBCPgStatement yb_stmt,
 					/* Create MINVALUE in YugaByte format */
 					Form_pg_attribute attr = attrs[idx];
 					exprs[idx] = YBCNewConstantVirtual(yb_stmt, attr->atttypid,
-													   YB_YQL_DATUM_LIMIT_MIN);
+													   attr->attcollation, YB_YQL_DATUM_LIMIT_MIN);
 					break;
 				}
 			}
@@ -351,7 +351,8 @@ YBTransformPartitionSplitPoints(YBCPgStatement yb_stmt,
 		/* Defaulted to MINVALUE for the rest of the columns that are not assigned a value */
 		for (; idx < attr_count; idx++) {
 			Form_pg_attribute attr = attrs[idx];
-			exprs[idx] = YBCNewConstantVirtual(yb_stmt, attr->atttypid, YB_YQL_DATUM_LIMIT_MIN);
+			exprs[idx] = YBCNewConstantVirtual(yb_stmt, attr->atttypid,
+											   attr->attcollation, YB_YQL_DATUM_LIMIT_MIN);
 		}
 
 		/* Add the split boundary to CREATE statement */

diff --git a/src/postgres/src/backend/executor/nodeAgg.c b/src/postgres/src/backend/executor/nodeAgg.c
@@ -1609,6 +1609,22 @@ yb_agg_pushdown_supported(AggState *aggstate)
 			aggref->aggtranstype == NUMERICOID)
 			return;
 
+		/*
+		 * The builtin functions max and min imply comparison. Character type
+		 * comparison requires postgres collation info which is not accessible
+		 * by DocDB. Because DocDB only does byte-wise comparison, it will not
+		 * be correct for any non-C collations. In order to allow min/max
+		 * pushdown for a non-C collation, we need to ensure that the argument
+		 * is a key-column with a deterministic non-C collation. In such a
+		 * case we store a collation-encoded string by concatenating the
+		 * collation sort key with the original text value so that the byte-wise
+		 * comparison result is correct.
+		 */
+		if ((strcmp(func_name, "min") == 0 || strcmp(func_name, "max") == 0) &&
+			(YBIsCollationValidNonC(aggref->aggcollid) ||
+			 YBIsCollationValidNonC(aggref->inputcollid)))
+			return;
+
 		foreach(lc_arg, aggref->args)
 		{
 			TargetEntry *tle = lfirst_node(TargetEntry, lc_arg);