-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setup mysql tables as utf8mb4 and convert them #3516
Conversation
But for new instance tables would still be created as utf8 |
@lafriks this should not be the case because of the |
Not yet. Sorry that was more like question just without question mark :) |
Codecov Report
@@ Coverage Diff @@
## master #3516 +/- ##
==========================================
+ Coverage 20.08% 35.7% +15.61%
==========================================
Files 146 285 +139
Lines 29867 40835 +10968
==========================================
+ Hits 6000 14579 +8579
- Misses 22961 24094 +1133
- Partials 906 2162 +1256
Continue to review full report at Codecov.
|
So you have to check the connection string and enforce utf8mb4 |
Tried to create the db from scratch locally, and I got this error:
MariaDB 10.1.29. It looks like it will be needed to reduce the length of VARCHAR fields. Or maybe another solution, idk 🤷♂️ |
@thehowl strange. It works fine for me using MariaDB 10.2.13, see my xorm log file. Installed gitea 1.4.0+rc1 with my PR from scratch. |
I can reproduce it with MariaDB < 10.2 and MySQL < 5.7, as their index key prefix can only be up to 767 bytes long[1]. InnoDB Versions >= 5.7 (MariaDB >= 10.2, MySQL >= 5.7) handle up to 3072 bytes by default. To retain compatibility, the indexed fields must be a maximum of 191 (
The varchar is hardcoded at 255 bytes-stuff[2] is something we shouldn't change imho, because there might already be repositores in the wild with names or tags longer than 191 chars. What's your opinion? [1] There's a workaround by using large prefixes: set global innodb_file_format = Barracuda;
set global innodb_file_per_table = on;
set global innodb_large_prefix = 1;
alter table `foo` ROW_FORMAT = DYNAMIC; -- COMPRESSED also works [2] vendor/github.com/go-xorm/core/type.go:264: |
So that, we have to use xorm tags to define the length. For example, |
3c06930
to
956d792
Compare
Ok, I changed every indexed column I could find to |
956d792
to
8d6b224
Compare
models/migrations/v58.go
Outdated
log.Info("%s: converting table to utf8mb4", table.Name) | ||
if _, err := x.Exec("alter table `" + table.Name + "` convert to character set utf8mb4"); err != nil { | ||
return fmt.Errorf("conversation of %s failed: %v", table, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Honestly I'd prefer if this handled effectively the case of some rows being >191 chars. I would suggest:
- adding at the beginning of the for loop a check to see if any data would be lost (
select 1 from tbl where char_length(field1) > 191 [ or char_length(field2) > 191 ]
) - If so, abort the migration for that table and tell the user how to manually update by logging the needed statements.
models/migrations/v58.go
Outdated
} | ||
} | ||
default: | ||
log.Info("Nothing to do") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the top
if !setting.UseMySQL {
log.Info("Nothing to do")
return nil
}
and remove switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea! I'll provide the requested changes tomorrow.
It's funny how I'm concerned about reducing field sized, because the gui doesn't allow one to use user names (35 chars) or full names (100 chars) that long. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly style comments, the logic is sound.
models/migrations/v58.go
Outdated
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This level of indentation makes me a bit dizzy 😵
col := table.GetColumn(ix)
if col != nil {
continue
}
if col.SQLType.Name != "VARCHAR" || col.Length <= maxvc {
continue
}
and so on would be better I think. (Except for the error handling at the end. This is sort of my style to write Go code, so feel free to trash my opinion if you feel I'm wrong, but my general approach is to a) handle at the top level indentation the case we are looking for like varchar indexes, and b) handle in if branches the cases where something differs from the case we are looking for. Errors are often still unexpected and thus deserve to be placed in an if branch.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. Such indentations are the result of grown code. Sorry for the eye cancer.
models/migrations/v58.go
Outdated
} | ||
|
||
const maxvc = 191 | ||
var migration_success = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go's naming convention inside of functions is to use camelCase
. Also, := true
instead of var
?
models/migrations/v58.go
Outdated
return fmt.Errorf("cannot get tables: %v", err) | ||
} | ||
for _, table := range tables { | ||
var ready_for_conversion = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the previous style comment
4759e31
to
8e7e29b
Compare
models/migrations/v58.go
Outdated
continue | ||
} | ||
log.Info("reducing column %s.%s from %d to %d bytes", table.Name, ix, col.Length, maxvc) | ||
var sqlstmt = fmt.Sprintf("alter table `%s` change column `%s` `%s` varchar(%d)", table.Name, ix, ix, maxvc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, something that slipped through the cracks: this one should be :=
as well.
8e7e29b
to
9cd9e29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I tried running it on my own instance. Creating DB from scratch works fine, the migration? Not so much.
models/migrations/v58.go
Outdated
@@ -0,0 +1,68 @@ | |||
// Copyright 2017 The Gitea Authors. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh also it's 2018 :P
models/migrations/v58.go
Outdated
continue | ||
} | ||
log.Info("reducing column %s.%s from %d to %d bytes", table.Name, ix, col.Length, maxvc) | ||
sqlstmt := fmt.Sprintf("alter table `%s` change column `%s` `%s` varchar(%d)", table.Name, ix, ix, maxvc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here lies the assumption that column name == index name, which is incorrect. (Took me about 20 minutes to track this down...) Since we already have the column anyway, use col.Name
.
But alas, the issue is deeper. GetColumn
only returns the first column of the index, but that is not to say that there may be more. So it's actually better if instead you iterate over table.Columns()
and then check whether it has any indexes (len(col.Indexes) > 0
), and if it does then you proceed to reducing the col size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, thanks. Also we not only need to check for indexes but also for primary keys as they might need to be cut down, too.
Somehow I don't like that now indexed text column values will be decreased for all databases (for new instances) to 191 character max. While currently this is not the problem and even openid URI would probably be fine with 191 character, in future just to support older MySQL versions every indexed text column for all databases can not be created with more than 191 characters. |
@lafriks Good idea for xorm to do that smartly. Currently, it always |
I do not think we should need to have >191 char indexes. If anything, we might consider hashing the text to get e.g. a 128 char long field, but I think allowing more than 191 chars is bad. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't undersand why a limit is being added to structures ("VARCHAR(191)") - also I'd feel much better if you added a test for the fix, in one of the existing integration tests
models/branches.go
Outdated
@@ -26,7 +26,7 @@ const ( | |||
type ProtectedBranch struct { | |||
ID int64 `xorm:"pk autoincr"` | |||
RepoID int64 `xorm:"UNIQUE(s)"` | |||
BranchName string `xorm:"UNIQUE(s)"` | |||
BranchName string `xorm:"VARCHAR(191) UNIQUE(s)"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is a limit being added here ?
Name string `xorm:"UNIQUE(s) NOT NULL"` | ||
Commit string `xorm:"UNIQUE(s) NOT NULL"` | ||
Name string `xorm:"VARCHAR(191) UNIQUE(s) NOT NULL"` | ||
Commit string `xorm:"VARCHAR(191) UNIQUE(s) NOT NULL"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is a limit being added here ?
@@ -8,7 +8,7 @@ import "github.com/markbates/goth" | |||
|
|||
// ExternalLoginUser makes the connecting between some existing user and additional external login sources | |||
type ExternalLoginUser struct { | |||
ExternalID string `xorm:"pk NOT NULL"` | |||
ExternalID string `xorm:"VARCHAR(191) pk NOT NULL"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is a limit being added here ?
@@ -18,7 +18,7 @@ import ( | |||
// Reaction represents a reactions on issues and comments. | |||
type Reaction struct { | |||
ID int64 `xorm:"pk autoincr"` | |||
Type string `xorm:"INDEX UNIQUE(s) NOT NULL"` | |||
Type string `xorm:"VARCHAR(191) INDEX UNIQUE(s) NOT NULL"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is a limit being added here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@strk please take a look at #3516 (comment) where I explained where that limit comes from. In short, it's for compatibility with MariaDB/MySQL versions that run the InnoDB 5.6 engine as we cannot have indexed fields that are > 767 bytes long (191x 4 bytes for utf8mb4).
@philfry I also think we should give the meaningful length that columns but not all are 191. |
@lunny about reasonable lengths… let's take the branch name as example: $ git init ; touch foo ; git add foo ; git commit -am "."
$ for i in {255..128}; do git checkout -b $(perl -e "print 'a'x$i") >& /dev/null && { echo $i; break; }; done
250 afaik github allows branch names up to 255 chars, locally, on an ext4 I'm apparently limited to 250 chars. That's still more than 191. afaict we have multiple possibilities:
create table `branch` (
`name` varchar(255) character set latin1 not null,
`bar` varchar(255) character set latin1 not null,
primary key (`name`), key barkey (`bar`)
) default charset=utf8mb4;
¯(°_o)/¯ |
For no. 4 instead of latin1 you can use utf8 for the character sets of the specific columns, as long as it isn't utf8mb4. (although no. 4 is my least preferred option) My vote is for no. 3 (due to being backwards compatible, and also who needs > 191 chars for a branch name). Although I'd be fine with the other options (as long as it isn't no. 4). |
@philfry Since only some columns need |
…ar(255), which is the default, to varchar(191) in order to deal with utf8mb4/innodb 5.6
It's too complicated to implement the charset-thingy the other way round because the default charset (utf8mb4) is and has to be defined within the connector. Maybe xorm can help out by
Let's close this PR as "wontfix". Whoever is interested in using 4-byte-chars in gitea and is running mysql/mariadb with at least innodb 5.7: change the connstr in models/models.go and do a manual conversion of all tables. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 months. Thank you for your contributions. |
This pull request has been automatically closed because of inactivity. You can re-open it if needed. |
But this is not a nice solution when using the Docker image. Can we at least get something like a docker-compose environment where we can set the desired connection charset? |
So is UTF8MB4 support dead for Gitea? |
Closing currently as probably this problem should be fixed a bit otherwise. Please reopen or submit other pr |
I'm not a native speaker, may you please clear that up for me? Does this mean it's a low-priority problem and won't be fixed soon or did it mean something like this problem may be solved via another fix that's coming? Thanks for your time! |
I meant that we need to find better solution, at least review sizes for more sane column lengths. |
fixes #3513
looks like these changes in
models.go
are sufficient for database creation. This PR also adds a migration module for converting the mysql tables toutf8mb4
.Tested so far:
show create table issue
showsand
show create table issue
showsshow create table issue
shows