Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expression: check max_allowed_packet constraint for function insert #7502

Merged
merged 6 commits into from
Aug 29, 2018

Conversation

eurekaka
Copy link
Contributor

@eurekaka eurekaka commented Aug 27, 2018

What problem does this PR solve?

Return NULL and a warning when the result of function insert exceeds max_allowed_packet.

Before this PR, SELECT INSERT('abc', 1, -1, 'abcd'); would return result 'abcd' even if @@global.max_allowed_packet is set 3.

After this PR:

mysql> select @@max_allowed_packet;
+----------------------+
| @@max_allowed_packet |
+----------------------+
| 3                    |
+----------------------+
1 row in set (0.00 sec)

mysql> select insert('abc', 1, -1, 'abc');
+-----------------------------+
| insert('abc', 1, -1, 'abc') |
+-----------------------------+
| abc                         |
+-----------------------------+
1 row in set (0.00 sec)

mysql> select insert('abc', 1, -1, 'abcd');
+------------------------------+
| insert('abc', 1, -1, 'abcd') |
+------------------------------+
| NULL                         |
+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+-----------------------------------------------------------------------+
| Level   | Code | Message                                                               |
+---------+------+-----------------------------------------------------------------------+
| Warning | 1301 | Result of insert() was larger than max_allowed_packet (3) - truncated |
+---------+------+-----------------------------------------------------------------------+
1 row in set (0.00 sec)

What is changed and how it works?

  • check result length in valString method of builtinInsertSig and builtinInsertBinarySig against max_allowed_packet;

Check List

Tests

  • Unit test

Related changes

  • Need to be included in the release note

@CLAassistant
Copy link

CLAassistant commented Aug 27, 2018

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ eurekaka
✅ zz-jason
❌ Kenan Yao


Kenan Yao seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@eurekaka
Copy link
Contributor Author

Tests green, please take a look at this, thx @zz-jason @winoros @XuHuaiyu

@eurekaka eurekaka requested a review from zz-jason August 27, 2018 12:17
if uint64(pos-1+int64(len(newstr)))*uint64(mysql.MaxBytesOfCharacter) > b.maxAllowedPacket {
b.ctx.GetSessionVars().StmtCtx.AppendWarning(errWarnAllowedPacketOverflowed.GenByArgs("insert", b.maxAllowedPacket))
return "", true, nil
}
return str[0:pos-1] + newstr, false, nil
}
return str[0:pos-1] + newstr + str[pos+length-1:], false, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check whether len(str[0:pos-1] + newstr + str[pos+length-1:]) > b. maxAllowedPacket ?

@@ -3216,18 +3225,24 @@ func (b *builtinInsertBinarySig) evalString(row chunk.Row) (string, bool, error)
}

if length > strLength-pos+1 || length < 0 {
if uint64(pos-1+int64(len(newstr)))*uint64(mysql.MaxBytesOfCharacter) > b.maxAllowedPacket {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function len() returns the number of bytes the string contains, no need to multiple mysql.MaxBytesOfCharacter.

@zz-jason
Copy link
Member

to #7153

eurekaka and others added 2 commits August 28, 2018 16:04
- compute bytes resulted correctly;
- cover more cases of insert parameters;
@zz-jason
Copy link
Member

LGTM

@zz-jason
Copy link
Member

/run-all-tests

@zz-jason
Copy link
Member

/run-integration-common-test
/run-common-test

@zz-jason
Copy link
Member

@winoros @lamxTyler PTAL

@zz-jason zz-jason added status/LGT1 Indicates that a PR has LGTM 1. status/all tests passed labels Aug 28, 2018
length = runeLength - pos + 1
}

if uint64(runeLength-length)*uint64(mysql.MaxBytesOfCharacter)+uint64(len(newstr)) > b.maxAllowedPacket {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the number of bytes of some character is less than MaxBytesOfCharacter? Will this raise warning while it should not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this false-negative case exists, but to compute the exact bytes of these characters, we may need the charset info of the string, and then compute the length of specific characters for this charset, which incurs too much overhead, I guess that is why we use MaxBytesOfCharacter in other functions such as builtinLpadSig::evalString. @zz-jason should we compute the exact bytes for the string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can build the string first, and use len() to compute the consumed bytes, but is it possible that panic is raised in building the string because it is too large? I think the purpose of max_allowed_packet check here is to prevent this kind of panic to some extent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in MySQL, the byte count for lpad is calculated by:

byte_count = count * collation.collation->mbmaxlen;

as for the insert function, seems MySQL calculates the exact bytes for the string:

  /*
    There is one exception not handled (intentionaly) by the character set
    aggregation code. If one string is strong side and is binary, and
    another one is weak side and is a multi-byte character string,
    then we need to operate on the second string in terms on bytes when
    calling ::numchars() and ::charpos(), rather than in terms of characters.
    Lets substitute its character set to binary.
  */
  if (collation.collation == &my_charset_bin) {
    res->set_charset(&my_charset_bin);
    res2->set_charset(&my_charset_bin);
  }

  /* start and length are now sufficiently valid to pass to charpos function */
  start = res->charpos((int)start);
  length = res->charpos((int)length, (uint32)start);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why first build it? It just constitutes of three parts, so sum them would be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems there isn't a convenient and efficient way to calculate the byte count of a []rune. @tiancaiamao any idea?

Copy link
Member

@zz-jason zz-jason Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another method is to write a utf-8 code point iterator like this: https://gist.github.com/zz-jason/078110974bb931b7f8e3432775ecfd05, we can iterate on the origin []byte, find the code point located at pos and pos+length, and the count the bytes for each []rune prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you are right... updated

@eurekaka
Copy link
Contributor Author

/run-all-tests

2 similar comments
@eurekaka
Copy link
Contributor Author

/run-all-tests

@eurekaka
Copy link
Contributor Author

/run-all-tests

@iamxy
Copy link
Member

iamxy commented Aug 29, 2018

/rebuild

@iamxy
Copy link
Member

iamxy commented Aug 29, 2018

/run-all-tests

@eurekaka
Copy link
Contributor Author

/run-all-tests

@eurekaka
Copy link
Contributor Author

tests green, @zz-jason , @lamxTyler PTAL, thx

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alivxxx alivxxx merged commit c625c27 into pingcap:master Aug 29, 2018
@eurekaka eurekaka deleted the str_udf_overflow_insert branch August 29, 2018 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/expression status/LGT1 Indicates that a PR has LGTM 1. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants