把 UUID 或者 GUID 作为主键？你得小心啦！ #1804

zaraguo · 2017-06-24T10:12:22Z

@sqrthree 翻译完成

update from origin

update

zaraguo · 2017-06-27T04:48:19Z

@Glowin 邀请校对

canonxu · 2017-06-27T05:58:55Z

校对认领 @sqrthree

linhe0x0 · 2017-06-27T05:59:01Z

@canonxu 好的呢 🍺

yifili09 · 2017-06-27T07:34:41Z

@sqrthree 申请一个校对。

linhe0x0 · 2017-06-27T09:31:22Z

@yifili09 好哒

yifili09

也增长了我对 UUID 的认识。加油加油加油

yifili09 · 2017-06-28T02:40:35Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md

-1. At scale, when you have multiple databases containing a segment (shard) of your data, for example a set of customers, using a UUID means that one ID is unique across *all* databases, not just the one you’re in now. This makes moving data across databases safe. Or in my case where all of our database shards are merged onto our Hadoop cluster as one, no key conflicts.
-2. You can know your PK before insertion, which avoids a round trip DB hit, and simplifies transactional logic in which you need to know the PK before inserting child records using that key as it’s foreign key (FK)
-3. UUIDs do not reveal information about your data, so would be safer to use in a URL, for example. If I am customer 12345678, it’s easy to guess that there are customers 12345677 and 1234569, and this makes for an attack vector. (But see below for a better alternative).
+1. 在扩展数据库的时候，当你有多个数据库包含同一段（片）数据时，比如一个顾客集，使用 UUID 意味着该 ID 可以跨所有数据库唯一，而不是仅仅本数据库唯一。这保障了跨数据库迁移数据的安全。又比如，我曾在项目中把多个数据库分片合并到一个 Hadoop 集群中，也没有产生键的冲突。


使用 UUID 意味着该 ID 可以跨所有数据库唯一

使用 UUID 意味着这个 ID 在所有的数据库中是唯一标识的。

yifili09 · 2017-06-28T02:46:51Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md

-2. You can know your PK before insertion, which avoids a round trip DB hit, and simplifies transactional logic in which you need to know the PK before inserting child records using that key as it’s foreign key (FK)
-3. UUIDs do not reveal information about your data, so would be safer to use in a URL, for example. If I am customer 12345678, it’s easy to guess that there are customers 12345677 and 1234569, and this makes for an attack vector. (But see below for a better alternative).
+1. 在扩展数据库的时候，当你有多个数据库包含同一段（片）数据时，比如一个顾客集，使用 UUID 意味着该 ID 可以跨所有数据库唯一，而不是仅仅本数据库唯一。这保障了跨数据库迁移数据的安全。又比如，我曾在项目中把多个数据库分片合并到一个 Hadoop 集群中，也没有产生键的冲突。
+2. 你可以在插入之前知道你的主键值，这避免了一轮的数据查找，简化了在插入将主键值作为外键的子记录前需要知道该主键值这一场景的逻辑。


你可以在插入之前知道你的主键值

在插入数据之前，你就能知道这个主键的值，

简化了在插入将主键值作为外键的子记录前需要知道该主键值这一场景的逻辑。

并且简化了交易事物的逻辑，既在你插入子记录之前，因为需要使用这个主键作为一个外键，你必须要知道这个主键的值。

yifili09 · 2017-06-28T03:29:27Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md

-3. UUIDs do not reveal information about your data, so would be safer to use in a URL, for example. If I am customer 12345678, it’s easy to guess that there are customers 12345677 and 1234569, and this makes for an attack vector. (But see below for a better alternative).
+1. 在扩展数据库的时候，当你有多个数据库包含同一段（片）数据时，比如一个顾客集，使用 UUID 意味着该 ID 可以跨所有数据库唯一，而不是仅仅本数据库唯一。这保障了跨数据库迁移数据的安全。又比如，我曾在项目中把多个数据库分片合并到一个 Hadoop 集群中，也没有产生键的冲突。
+2. 你可以在插入之前知道你的主键值，这避免了一轮的数据查找，简化了在插入将主键值作为外键的子记录前需要知道该主键值这一场景的逻辑。
+3. UUIDs 不会透露数据的信息，因此被用在 URL 中也比自增整数更安全。比如，我是编号 12345678 号顾客，那么人们就会猜测编号为 12345677 和 12345679 的顾客的存在，这就提供了一种攻击向量。（但是后面我们会看到一个更好的替代品）


攻击向量

攻击途径

攻击向量是专业名词呢。

yifili09 · 2017-06-28T05:29:35Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-A naive use of a UUID, which might look like `70E2E8DE-500E-4630-B3CB-166131D35C21`, would be to treat as a string, e.g. `varchar(36)` — don’t do that!!
+一个基础的 UUID 大概是这个样子的： `70E2E8DE-500E-4630-B3CB-166131D35C21`，它将会被视为字符串对待，比如 `varchar(36)` - 千万不要这么做！


一个基础的 UUID 大概是这个样子的： 70E2E8DE-500E-4630-B3CB-166131D35C21，它将会被视为字符串对待

把一个 UUID（它看上去可能是 70E2E8DE-500E-4630-B3CB-166131D35C21 这样的）以字符串形式对待是缺乏经验的表现。

yifili09 · 2017-06-28T05:30:50Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Think twice — in two cases of very large databases I have inherited at relatively large companies, this was *exactly* the implementation. Aside from the 9x cost in size (36 vs. 4 bytes for an int), strings don’t sort as fast as numbers because they rely on collation rules.
+我想了想 - 就我所接触的就有两个来自我先前公司的大型数据库是这么设计的。除了 9 倍的多余开销外（比起 36 字节，整数类型只占了 4 字节），字符串在排序上也没有数字快，因为它们依赖校对规则。


我想了想 -

我再三考虑了下，

就我所接触的就有两个来自我先前公司的大型数据库是这么设计的

就我所接手的两个大型企业级数据库来看，他们确实是那么实施的。

collation rules是排序规则，校对规则是什么？

yifili09 · 2017-06-28T06:06:44Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Not just on disk but during joins and sorts these keys need to live in memory. Memory is getting cheaper, but whether disk or RAM, it’s limited. And neither is free.
+不单单在磁盘上，在进行 join 和 sort 时这些 key 还需要载入到内存中。内存的确越来越便宜了，但是无论磁盘还是内存它们都是有限的，并且也都不是免费的。


memory

存储器

RAM

闪存 ??

RAM 应该翻成内存也可以吧。 @canonxu 你怎么看？

调整下语序，首先我们应该要意识到...

超出 -> 溢出

20亿大小，是条数，还是20亿M，20亿K...？容易混淆，建议"20亿条数据记录"

“PostgreSQL 和 PostgreSQL这类“，PostgreSQL和PostgreSQL是哪类？ORDBMS？ORDBMS都有16 字节的原生类型吗？建议改成"这些"，后者干脆去掉“这类”

“进行统计“，统计什么？建议改成“评估开销“

根据上下文可得，20 亿这里指的是 20亿这个数字。因为 20亿超过了 int（4 字节）的表示范围。

yifili09 · 2017-06-28T06:09:07Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-#### It’s really hard to sort random numbers
+#### 随机数排序十分困难


对随机排列的数字进行排序是十分困难的

the extra size of foreign keys adds up fast，再联系上下文揣度下意思？我猜这段话想表达的意思是：如果UUID再作为外键的话，空间开销会更快速增大。

yifili09 · 2017-06-28T06:09:42Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Another problem is fragmentation — because UUIDs are random, they have no natural ordering so cannot be used for clustering. This is why SQL Server has implemented a `newsequentialid()` function that is suitable for use in clustered indexes, and is [probably the right implementation](https://msdn.microsoft.com/en-us/library/ms189786.aspx) for all UUID PKs. It is probable that there are similar solutions for other databases, certainly PostgreSQL, MySQL and likely the rest.
+另外一个问题就是分裂 - 因为 UUIDs 是随机的，他们没有天然的生成顺序因此不能够被用于集群。这就是为什么 SQL Server 实现了一个 `newsequentialid()` 方法用于集群化索引的使用，这可能就是将 UUIDs 作为主键使用的[正确实践](https://msdn.microsoft.com/en-us/library/ms189786.aspx)了。其他的数据库可能也有类似的解决方案，PostgreSQL，MySQL 肯定是有的，其他的可能有。


另外一个问题就是分裂 fragmentation

另外一个问题是碎片化

建议[正确实践] -> [正确打开方式]，嘿嘿

@canonxu 哈哈，我一开始也是这么想的。

yifili09 · 2017-06-28T07:08:42Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-I would argue that *using a PK in any public context is a bad idea.*
+下面我将阐明 *在公开环境中暴露主键是十分不好的* 这一观点。


阐明

讨论？

yifili09 · 2017-06-28T07:13:10Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-But there’s a far more compelling reason not to use any kind of PK in a public context: if you *ever* need to change keys, all your external references are broken. Think “404 Page Not Found”.
+不在公开环境使用主键还有一个无法反驳的原因：如果你 *一旦* 需要改变这个键值，那么所有外在的引用就不可用了。想象一下 “404 页面无法找到”的情形。 


无法反驳的原因

更有说服力的原因

如果，一旦，语义重复

yifili09 · 2017-06-28T07:44:24Z

@sqrthree 老板，我看完啦。收工，等盒饭！

canonxu

翻译得很好很用心！赞！

全局唯一ID是分库分表的一个难题，这篇文章解释UUID通俗易懂，我也学到了很多！

canonxu · 2017-06-28T17:21:22Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-I just read a post on ways to scale your database that hit home with me — the author suggests the use of UUIDs (similar to GUIDs) as the primary key (PK) of database tables.
+在阅读时，一篇谈论如何扩展数据库的文章引起了我的关注 - 作者在文中建议大家使用 UUIDs（类似 GUIDs）作为数据库表的主键。


just，加上最近

canonxu · 2017-06-28T17:36:03Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Think twice — in two cases of very large databases I have inherited at relatively large companies, this was *exactly* the implementation. Aside from the 9x cost in size (36 vs. 4 bytes for an int), strings don’t sort as fast as numbers because they rely on collation rules.
+我想了想 - 就我所接触的就有两个来自我先前公司的大型数据库是这么设计的。除了 9 倍的多余开销外（比起 36 字节，整数类型只占了 4 字节），字符串在排序上也没有数字快，因为它们依赖校对规则。


collation rules是排序规则，校对规则是什么？

canonxu · 2017-06-28T17:38:41Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Things got really bad in one company where they had originally decided to use Latin-1 character set. When we converted to UTF-8 several of the compound-key indexes were not big enough to contain the larger strings. Doh!
+在一家公司还曾发生过十分糟糕的事情，一开始他们使用 Latin-1 字符集。当我们打算转为 UTF-8 时，好几个联合索引因为太大而存不下。哦！


建议"组合索引"或者"联合索引"，二选一吧

canonxu · 2017-06-28T17:43:52Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Not just on disk but during joins and sorts these keys need to live in memory. Memory is getting cheaper, but whether disk or RAM, it’s limited. And neither is free.
+不单单在磁盘上，在进行 join 和 sort 时这些 key 还需要载入到内存中。内存的确越来越便宜了，但是无论磁盘还是内存它们都是有限的，并且也都不是免费的。


调整下语序，首先我们应该要意识到...

canonxu · 2017-06-28T17:45:00Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Not just on disk but during joins and sorts these keys need to live in memory. Memory is getting cheaper, but whether disk or RAM, it’s limited. And neither is free.
+不单单在磁盘上，在进行 join 和 sort 时这些 key 还需要载入到内存中。内存的确越来越便宜了，但是无论磁盘还是内存它们都是有限的，并且也都不是免费的。


超出 -> 溢出

canonxu · 2017-06-28T18:33:56Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Indeed, my current company’s context is a perfect example of why UUIDs are needed, and why they are costly, and why exposing primary keys is an issue.
+事实上，我现在公司的环境就是为什么需要 UUIDs 的最好例子，以及为什么 UUIDs 开销巨大，为什么在公开环境中暴露主键是一个问题。


环境 -> 场景

canonxu · 2017-06-28T18:37:44Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-One solution used in several different contexts that has worked for me is, in short, to use both. (Please note: not a good solution — see note about response to original post below).
+有一个解决方法在多个不同的场景中都起到了作用，简单来说就是，两者都用。（请注意：这不是一个好方法 - 请看下面我记录的 Chris 对原始博文回复）


有一个在多个不同场景下都有效的解决办法，。。

canonxu · 2017-06-28T18:40:50Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Then *add a column* populated with a UUID (perhaps as a trigger on insert). Within the scope of the database itself, relationships can be managed using the usual PKs and FKs.
+然后 *增加一列* 用于存放 UUID（可以将其设计进插入的预处理操作里）。在一个数据库自身的范围内，可以使用普通的主键和外键来管理关系。


perhaps as a trigger on insert：伴随着insert操作一起插入

我理解的是将插入 UUID 设置成一个 insert 的 hook，简单翻译成一起插入感觉少了点什么。

canonxu · 2017-06-28T18:45:57Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-In another case, we would generate a “slug” of text (e.g. in blog posts like this one) that would make the URL a little more human friendly. If we had a duplicate, we would just append a hashed value.
+另一种情况，我会生成了一“段”文本（例如在像本篇一样的博文）用于 URL 使其更加对用户友好的。如果有冲突，那么只需追加一段哈希值。


（e.g. in blog posts like this one），作者的意思应该是像本篇博文的URL一样：
https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439

是的，这里我笔误了，多了一个在字，其实是：我会生成了一“段”文本（例如像本篇一样的博文）用于 URL。你的意思是说我需要把 url 中的文本部分加进文章么？

canonxu · 2017-06-28T18:59:48Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Use integers because they are efficient. Use the database implementation of UUIDs in addition for any external reference to obfuscate.
+使用整型是因为它们是高效的。另外也可将数据库实现的 UUIDs 用于混淆外部引用。 :TOBECONFIRMED


用于混淆外部引用，啥意思？作者的意思应该是：使得对外部引用无规律，避免暴力破解吧？ obfuscate怎么翻，迷惑？模糊化？再想想。。。

linhe0x0 · 2017-06-29T02:35:22Z

@zaraguo 两位校对者都已经校对好了~ 可以来根据校对意见进行调整了哈 ┏ (゜ω゜)=☞

zaraguo · 2017-06-29T09:56:14Z

@yifili09 @canonxu @sqrthree 已根据意见修改，可以再看下还有什么问题。

canonxu · 2017-06-29T11:06:12Z

OK，挺好，没有问题 @zaraguo @sqrthree

linhe0x0

还有一丢丢小问题辛苦调整下好

linhe0x0 · 2017-06-30T02:19:46Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md

@@ -1,133 +1,133 @@
 > * 原文地址：[UUID or GUID as Primary Keys? Be Careful!](https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439)
 > * 原文作者：[Tom Harrison Jr](https://tomharrisonjr.com/@tomharrisonjr)
 > * 译文出自：[掘金翻译计划](https://github.com/xitu/gold-miner)
-> * 译者：
+> * 译者：[zaraguo](https://github.com/zaraguo)
 > * 校对者：


校对者信息要加上哈

linhe0x0 · 2017-06-30T02:31:16Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-If our goal is to scale, and I mean *really scale* let’s first acknowledge that an `int` is not big enough in many cases, maxing out at around 2 billion, which needs 4 bytes. We have way more than 2 billion transactions in each of several databases.
+如果我们的目标是扩展，我是说 *真正的扩展*。那么首先让我们意识到 `int` 类型在很多情况下是不够大的。在大约 20 亿（需要 4 字节）的时候就溢出了。然而每个数据库中我们都有远超 20 亿大小的数据存在。


斜体的问题，请改成加粗哈。

linhe0x0 · 2017-06-30T02:32:33Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Our database has plenty of intermediate tables that are mainly containers for the foreign keys of others, especially in 1-to-many relations. Accounts have multiple card numbers, addresses, phone numbers, usernames, and all that. For each of these columns in a set of table with billions of accounts, the extra size of foreign keys adds up fast.
+我们的数据库用大量的关系表来存储外键，尤其是在一对多的关系中。账户表内含有多个卡号，地址，电话号码，用户名等等。对于拥有数十亿账户的一组表中的任意一列，外键的空间开销的增长都是十分快速的。


『多个卡号，地址，电话号码，用户名』=>『多个卡号、地址、电话号码、用户名』

linhe0x0 · 2017-06-30T02:33:10Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Another problem is fragmentation — because UUIDs are random, they have no natural ordering so cannot be used for clustering. This is why SQL Server has implemented a `newsequentialid()` function that is suitable for use in clustered indexes, and is [probably the right implementation](https://msdn.microsoft.com/en-us/library/ms189786.aspx) for all UUID PKs. It is probable that there are similar solutions for other databases, certainly PostgreSQL, MySQL and likely the rest.
+另外一个问题就是碎片化 - 因为 UUIDs 是随机的，他们没有天然的生成顺序因此不能够被用于集群。这就是为什么 SQL Server 实现了一个 `newsequentialid()` 方法用于集群化索引的使用，这可能就是将 UUIDs 作为主键使用的[正确打开方式](https://msdn.microsoft.com/en-us/library/ms189786.aspx)了。其他的数据库可能也有类似的解决方案，PostgreSQL，MySQL 肯定是有的，其他的可能有。


『PostgreSQL，MySQL』=>『PostgreSQL、MySQL』

linhe0x0 · 2017-06-30T02:35:29Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-I would argue that *using a PK in any public context is a bad idea.*
+下面我将阐明 *在公开环境中暴露主键是十分不好的* 这一观点。


斜体的问题呢。

linhe0x0 · 2017-06-30T02:36:45Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-But there’s a far more compelling reason not to use any kind of PK in a public context: if you *ever* need to change keys, all your external references are broken. Think “404 Page Not Found”.
+不在公开环境使用主键还有一个无法反驳的原因：你 *一旦* 需要改变这个键值，那么所有外在的引用就不可用了。想象一下 “404 页面无法找到”的情形。 


还有斜体哟

linhe0x0 · 2017-06-30T02:38:17Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-Then *add a column* populated with a UUID (perhaps as a trigger on insert). Within the scope of the database itself, relationships can be managed using the usual PKs and FKs.
+然后 *增加一列* 用于存放 UUID（可以将其设计进插入的预处理操作里）。在一个数据库自身的范围内，可以使用普通的主键和外键来管理关系。


还有斜体哈

linhe0x0 · 2017-06-30T02:40:43Z

TODO/uuid-or-guid-as-primary-keys-be-careful.md


-But when a reference to the data needs to be exposed to the outside world, *even when “outside” means another internal system,* they must rely only on the UUID.
+当需要暴露一个数据的引用到外部时，*即使这里的“外部”是另一个内部系统，*它们则必须依赖 UUID。


zaraguo · 2017-06-30T07:39:36Z

@sqrthree done

linhe0x0 · 2017-06-30T07:52:43Z

@zaraguo 已经 merge 啦~ 快快麻溜发布到掘金专栏然后给我发下链接，方便及时添加积分哟。

zaraguo · 2017-06-30T09:57:07Z

@sqrthree 已发布到掘金
 @canonxu @yifili09 感觉校对

zaraguo and others added 3 commits June 1, 2017 13:49

Merge pull request #1 from xitu/master

1c435fe

update from origin

Merge pull request #2 from xitu/master

d51a52f

update

uuid-or-guid-as-primary-keys-be-careful translate

ea1c0e6

linhe0x0 added the 校对认领 label Jun 24, 2017

linhe0x0 mentioned this pull request Jun 24, 2017

把 UUID 或者 GUID 作为主键？你得小心啦！ #1784

Closed

linhe0x0 added the 后端 label Jun 24, 2017

linhe0x0 added the 正在校对 label Jun 27, 2017

zaraguo closed this Jun 27, 2017

zaraguo reopened this Jun 27, 2017

linhe0x0 removed the 校对认领 label Jun 27, 2017

yifili09 reviewed Jun 28, 2017

View reviewed changes

canonxu approved these changes Jun 28, 2017

View reviewed changes

update according suggestion

6f759fe

linhe0x0 requested changes Jun 30, 2017

View reviewed changes

style: transfer italics to bold

bca6c82

linhe0x0 approved these changes Jun 30, 2017

View reviewed changes

linhe0x0 merged commit e364d79 into xitu:master Jun 30, 2017

linhe0x0 added 翻译完成 and removed 正在校对 labels Jun 30, 2017

zaraguo deleted the translate branch June 30, 2017 09:57


		A naive use of a UUID, which might look like `70E2E8DE-500E-4630-B3CB-166131D35C21`, would be to treat as a string, e.g. `varchar(36)` — don’t do that!!
		一个基础的 UUID 大概是这个样子的： `70E2E8DE-500E-4630-B3CB-166131D35C21`，它将会被视为字符串对待，比如 `varchar(36)` - 千万不要这么做！


		Think twice — in two cases of very large databases I have inherited at relatively large companies, this was exactly the implementation. Aside from the 9x cost in size (36 vs. 4 bytes for an int), strings don’t sort as fast as numbers because they rely on collation rules.
		我想了想 - 就我所接触的就有两个来自我先前公司的大型数据库是这么设计的。除了 9 倍的多余开销外（比起 36 字节，整数类型只占了 4 字节），字符串在排序上也没有数字快，因为它们依赖校对规则。


		Not just on disk but during joins and sorts these keys need to live in memory. Memory is getting cheaper, but whether disk or RAM, it’s limited. And neither is free.
		不单单在磁盘上，在进行 join 和 sort 时这些 key 还需要载入到内存中。内存的确越来越便宜了，但是无论磁盘还是内存它们都是有限的，并且也都不是免费的。


		#### It’s really hard to sort random numbers
		#### 随机数排序十分困难


		Another problem is fragmentation — because UUIDs are random, they have no natural ordering so cannot be used for clustering. This is why SQL Server has implemented a `newsequentialid()` function that is suitable for use in clustered indexes, and is [probably the right implementation](https://msdn.microsoft.com/en-us/library/ms189786.aspx) for all UUID PKs. It is probable that there are similar solutions for other databases, certainly PostgreSQL, MySQL and likely the rest.
		另外一个问题就是分裂 - 因为 UUIDs 是随机的，他们没有天然的生成顺序因此不能够被用于集群。这就是为什么 SQL Server 实现了一个 `newsequentialid()` 方法用于集群化索引的使用，这可能就是将 UUIDs 作为主键使用的[正确实践](https://msdn.microsoft.com/en-us/library/ms189786.aspx)了。其他的数据库可能也有类似的解决方案，PostgreSQL，MySQL 肯定是有的，其他的可能有。


		I would argue that using a PK in any public context is a bad idea.
		下面我将阐明在公开环境中暴露主键是十分不好的这一观点。


		But there’s a far more compelling reason not to use any kind of PK in a public context: if you ever need to change keys, all your external references are broken. Think “404 Page Not Found”.
		不在公开环境使用主键还有一个无法反驳的原因：如果你一旦需要改变这个键值，那么所有外在的引用就不可用了。想象一下 “404 页面无法找到”的情形。


		I just read a post on ways to scale your database that hit home with me — the author suggests the use of UUIDs (similar to GUIDs) as the primary key (PK) of database tables.
		在阅读时，一篇谈论如何扩展数据库的文章引起了我的关注 - 作者在文中建议大家使用 UUIDs（类似 GUIDs）作为数据库表的主键。


		Things got really bad in one company where they had originally decided to use Latin-1 character set. When we converted to UTF-8 several of the compound-key indexes were not big enough to contain the larger strings. Doh!
		在一家公司还曾发生过十分糟糕的事情，一开始他们使用 Latin-1 字符集。当我们打算转为 UTF-8 时，好几个联合索引因为太大而存不下。哦！


		Indeed, my current company’s context is a perfect example of why UUIDs are needed, and why they are costly, and why exposing primary keys is an issue.
		事实上，我现在公司的环境就是为什么需要 UUIDs 的最好例子，以及为什么 UUIDs 开销巨大，为什么在公开环境中暴露主键是一个问题。


		One solution used in several different contexts that has worked for me is, in short, to use both. (Please note: not a good solution — see note about response to original post below).
		有一个解决方法在多个不同的场景中都起到了作用，简单来说就是，两者都用。（请注意：这不是一个好方法 - 请看下面我记录的 Chris 对原始博文回复）


		Then add a column populated with a UUID (perhaps as a trigger on insert). Within the scope of the database itself, relationships can be managed using the usual PKs and FKs.
		然后增加一列用于存放 UUID（可以将其设计进插入的预处理操作里）。在一个数据库自身的范围内，可以使用普通的主键和外键来管理关系。


		In another case, we would generate a “slug” of text (e.g. in blog posts like this one) that would make the URL a little more human friendly. If we had a duplicate, we would just append a hashed value.
		另一种情况，我会生成了一“段”文本（例如在像本篇一样的博文）用于 URL 使其更加对用户友好的。如果有冲突，那么只需追加一段哈希值。


		Use integers because they are efficient. Use the database implementation of UUIDs in addition for any external reference to obfuscate.
		使用整型是因为它们是高效的。另外也可将数据库实现的 UUIDs 用于混淆外部引用。 :TOBECONFIRMED


		If our goal is to scale, and I mean really scale let’s first acknowledge that an `int` is not big enough in many cases, maxing out at around 2 billion, which needs 4 bytes. We have way more than 2 billion transactions in each of several databases.
		如果我们的目标是扩展，我是说真正的扩展。那么首先让我们意识到 `int` 类型在很多情况下是不够大的。在大约 20 亿（需要 4 字节）的时候就溢出了。然而每个数据库中我们都有远超 20 亿大小的数据存在。


		Our database has plenty of intermediate tables that are mainly containers for the foreign keys of others, especially in 1-to-many relations. Accounts have multiple card numbers, addresses, phone numbers, usernames, and all that. For each of these columns in a set of table with billions of accounts, the extra size of foreign keys adds up fast.
		我们的数据库用大量的关系表来存储外键，尤其是在一对多的关系中。账户表内含有多个卡号，地址，电话号码，用户名等等。对于拥有数十亿账户的一组表中的任意一列，外键的空间开销的增长都是十分快速的。


		Another problem is fragmentation — because UUIDs are random, they have no natural ordering so cannot be used for clustering. This is why SQL Server has implemented a `newsequentialid()` function that is suitable for use in clustered indexes, and is [probably the right implementation](https://msdn.microsoft.com/en-us/library/ms189786.aspx) for all UUID PKs. It is probable that there are similar solutions for other databases, certainly PostgreSQL, MySQL and likely the rest.
		另外一个问题就是碎片化 - 因为 UUIDs 是随机的，他们没有天然的生成顺序因此不能够被用于集群。这就是为什么 SQL Server 实现了一个 `newsequentialid()` 方法用于集群化索引的使用，这可能就是将 UUIDs 作为主键使用的[正确打开方式](https://msdn.microsoft.com/en-us/library/ms189786.aspx)了。其他的数据库可能也有类似的解决方案，PostgreSQL，MySQL 肯定是有的，其他的可能有。


		But there’s a far more compelling reason not to use any kind of PK in a public context: if you ever need to change keys, all your external references are broken. Think “404 Page Not Found”.
		不在公开环境使用主键还有一个无法反驳的原因：你一旦需要改变这个键值，那么所有外在的引用就不可用了。想象一下 “404 页面无法找到”的情形。


		But when a reference to the data needs to be exposed to the outside world, even when “outside” means another internal system, they must rely only on the UUID.
		当需要暴露一个数据的引用到外部时，即使这里的“外部”是另一个内部系统，它们则必须依赖 UUID。

把 UUID 或者 GUID 作为主键？你得小心啦！ #1804

把 UUID 或者 GUID 作为主键？你得小心啦！ #1804

Conversation

zaraguo commented Jun 24, 2017

zaraguo commented Jun 27, 2017

canonxu commented Jun 27, 2017

linhe0x0 commented Jun 27, 2017

yifili09 commented Jun 27, 2017

linhe0x0 commented Jun 27, 2017

yifili09 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yifili09 commented Jun 28, 2017

canonxu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linhe0x0 commented Jun 29, 2017

zaraguo commented Jun 29, 2017

canonxu commented Jun 29, 2017 • edited Loading

linhe0x0 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaraguo commented Jun 30, 2017

linhe0x0 commented Jun 30, 2017

zaraguo commented Jun 30, 2017

canonxu commented Jun 29, 2017 •

edited

Loading