Skip to content

Commit

Permalink
update image
Browse files Browse the repository at this point in the history
  • Loading branch information
ascoders committed Feb 19, 2024
1 parent fd91225 commit 752ea28
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions 机器学习/291.机器学习简介: 寻找函数的艺术.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

## 机器学习就是找函数

<img width=300 src="https://private-user-images.githubusercontent.com/7970947/303656877-67390915-f4a9-464d-a712-9958ffbf6703.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODM2ODIsIm5iZiI6MTcwNzQ4MzM4MiwicGF0aCI6Ii83OTcwOTQ3LzMwMzY1Njg3Ny02NzM5MDkxNS1mNGE5LTQ2NGQtYTcxMi05OTU4ZmZiZjY3MDMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTI1NjIyWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZWM3NWVhYTZhOTU0MTM2YTFlM2M1MTQzZDg2M2UyZGJkNjkzMDU4MGZlMmRlZGJlNjMzMmM5ODMyNDc3NjNlOSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.ppU_jvck--6Zd8_tHkVPj2plOvVQ5ywVowPjBz0wroI">
<img width=300 src="https://github.com/ascoders/blog/assets/7970947/67390915-f4a9-464d-a712-9958ffbf6703">

以我对机器学习的理解,认为其本质就是 **找函数**。我需要从两个角度解释,为什么机器学习就是找函数。

Expand Down Expand Up @@ -69,7 +69,7 @@ define model function 就是定义函数,这可不是一步到位定义函数

假设我们定义一个简单的一元一次函数:

<img width=110 src="https://private-user-images.githubusercontent.com/7970947/303662928-167fac7d-ee5e-4aaa-b04e-e6f27b6ecda6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODUyNzQsIm5iZiI6MTcwNzQ4NDk3NCwicGF0aCI6Ii83OTcwOTQ3LzMwMzY2MjkyOC0xNjdmYWM3ZC1lZTVlLTRhYWEtYjA0ZS1lNmYyN2I2ZWNkYTYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTMyMjU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZDY2MTRiODE1YWE1NDc2ZTc1NmE0NDFiYzI0NWRiMzFjN2RiMTdhNDViNDZjMzM4NDkwMTg4ZmZkOWU0OWY0YiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.nQN0NNCXZ9SWnBz6K0ZcoRdzIKaGx54GGbSHkMNHkSo">
<img width=110 src="https://github.com/ascoders/blog/assets/7970947/167fac7d-ee5e-4aaa-b04e-e6f27b6ecda6">

其中未知参数是 w 和 b,也就是我们假设最终要找的函数可以表示为 b + wx,但具体 w 和 b 的值是多少,是需要寻找的。我们可以这么定义:

Expand All @@ -91,7 +91,7 @@ define loss function 就是定义损失函数,这个损失可以理解为距

有很多种方法定义 loss 函数,一种最朴素的方法就是均方误差:

<img width=190 src="https://private-user-images.githubusercontent.com/7970947/303665423-2a6f8788-bb55-4072-89ea-c3f5db940f87.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODU4OTAsIm5iZiI6MTcwNzQ4NTU5MCwicGF0aCI6Ii83OTcwOTQ3LzMwMzY2NTQyMy0yYTZmODc4OC1iYjU1LTQwNzItODllYS1jM2Y1ZGI5NDBmODcucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTMzMzEwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YmViZDQ5OTdkYWVjN2EzMGY1YmY1ZjE3ZDRhYzU1MzMzYTQ3OWJjMmE0ZjEwMWYyYjExMTk3Y2M3M2FmNzhjMCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.a4yVUqCp9YGNzjgEgoU8uWAGxs3-cm-17HExe1TiFbg">
<img width=190 src="https://github.com/ascoders/blog/assets/7970947/2a6f8788-bb55-4072-89ea-c3f5db940f87">

即计算当前实际值 `modelFunction(b,w)(x)` 与目标值 `3x` 的平方差。那么 loss 函数可以这样定义:

Expand Down Expand Up @@ -121,19 +121,19 @@ optimization 就是优化函数的参数,使 loss 函数值最小。

而寻找 loss function 的最小值,需要不断更新未知参数,如果把 loss 函数画成一个函数图像,我们想让函数图像向较低的值走,就需要对当前值求偏导,判断参数更新方向:

<img width=300 src="https://private-user-images.githubusercontent.com/7970947/303600726-6e3544a2-c4b6-4874-b1be-d3969207406b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0NjkyODcsIm5iZiI6MTcwNzQ2ODk4NywicGF0aCI6Ii83OTcwOTQ3LzMwMzYwMDcyNi02ZTM1NDRhMi1jNGI2LTQ4NzQtYjFiZS1kMzk2OTIwNzQwNmIucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMDg1NjI3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NGExZTkwYjQ3YjhiNWE4ZTdmNTcwMzZiNDk1MjIwYzdlZDkzNDcwOGUzYTlkN2QxYzQ5MjM3OTU3NzA5ZTY0OSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.P9h5W0FD_q76Ih_V0rkkn4UjFCjxS1Edi4lLAwJrzzs">
<img width=300 src="https://github.com/ascoders/blog/assets/7970947/6e3544a2-c4b6-4874-b1be-d3969207406b">

如上图所示,假设上图的 x 轴是参数 w,y 轴是此时所有 training data 得到的 loss 值,那么只要对 loss 函数做 w 的偏导,就能知道 w 要怎么改变,可以让 loss 变得更小(当偏导数为负数时,右移,即 w 增大可以使 loss 减小,反之亦然)。

根据 loss function 的定义,我们可以分别写出 loss function 对参数 b 与 w 的偏导公式:

对 b 偏导:

<img width=340 src="https://private-user-images.githubusercontent.com/7970947/303667829-d5ea9819-2f33-4ea6-88b6-0d1d5beba31e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODY0NzEsIm5iZiI6MTcwNzQ4NjE3MSwicGF0aCI6Ii83OTcwOTQ3LzMwMzY2NzgyOS1kNWVhOTgxOS0yZjMzLTRlYTYtODhiNi0wZDFkNWJlYmEzMWUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTM0MjUxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OWZhMGQ1YzNiMjRjM2RmOTJiMjVlYTkyYjhlNzcxMGJlZDk2YjllZTFjZTY2MjgzNDA2ZjhmODMwYmZhYzI3MCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.EQo5hX9CMcvd6kWheBF3jpyuuTBCX8DPpVCWIsMcDP4">
<img width=340 src="https://github.com/ascoders/blog/assets/7970947/d5ea9819-2f33-4ea6-88b6-0d1d5beba31e">

对 w 偏导:

<img width=360 src="https://private-user-images.githubusercontent.com/7970947/303667857-92ef497d-6e94-4c98-ac23-476aa31642fe.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODY0NzEsIm5iZiI6MTcwNzQ4NjE3MSwicGF0aCI6Ii83OTcwOTQ3LzMwMzY2Nzg1Ny05MmVmNDk3ZC02ZTk0LTRjOTgtYWMyMy00NzZhYTMxNjQyZmUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTM0MjUxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Yjg4ZDI5OGRhNjQ3ZWQ2ZDZkZDVkOWViODcyYWU5MGVlZTY1Nzc2Yzg5MjMzYmNkMDM1NzIzMGJkNzg2MjU4NSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.boyXK17UIbV2fiRx-n2C0Y9ZaznM6k3vsyhZsLcZj5A">
<img width=360 src="https://github.com/ascoders/blog/assets/7970947/92ef497d-6e94-4c98-ac23-476aa31642fe">

> 注意,这里仅计算针对某一个 training data 的偏导数,而不用把所有 training data 的偏导数结果加总,因为后续如何利用这些偏导数还有不同的策略。
Expand Down Expand Up @@ -203,7 +203,7 @@ for (let i = 0; i < 500; i++) {

把函数寻找过程可视化,就形成了下图:

<img width=500 src="https://private-user-images.githubusercontent.com/7970947/303650736-fa3cb64a-426c-4bec-a6f9-674ba84ec6e6.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODIyMTksIm5iZiI6MTcwNzQ4MTkxOSwicGF0aCI6Ii83OTcwOTQ3LzMwMzY1MDczNi1mYTNjYjY0YS00MjZjLTRiZWMtYTZmOS02NzRiYTg0ZWM2ZTYuZ2lmP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTIzMTU5WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZTcxZDY2NDY0OTBmMmMyZjAyMzI3OWRlOWExZjdiNTc4YzRlOGI4NzRlNGJjYzljM2Y3YmUzYjQ2MzdkYzRjMyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.AEU4WivgSPXQVsgu2889U69U8SMT9TGwiwSQFtvMB1c">
<img width=500 src="https://github.com/ascoders/blog/assets/7970947/fa3cb64a-426c-4bec-a6f9-674ba84ec6e6">

可以发现,无论初始值参数 b 和 w 怎么选取,最终 loss 收敛时,b 都会趋近于 0,而 w 趋近于 3,即无限接近 y=3x 这个函数。

Expand All @@ -217,6 +217,6 @@ for (let i = 0; i < 500; i++) {

也许你已经发现,我们设定的 y = b + wx 的函数架构太过于简单,它只能解决线性问题,我们只要稍稍修改 training data 让它变成非线性结构,就会发现 loss 小到某一个值后,就再也无法减少了。通过图可以很明显的发现,不是我们的 define loss function 或者 optimization 过程有问题,而是 define model function 定义的函数架构根本就不可能完美匹配 training data:

<img width=500 src="https://private-user-images.githubusercontent.com/7970947/303651368-7625f0f3-2fc0-49f1-8638-0c9b9bb3cd76.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MDc0ODM0NDgsIm5iZiI6MTcwNzQ4MzE0OCwicGF0aCI6Ii83OTcwOTQ3LzMwMzY1MTM2OC03NjI1ZjBmMy0yZmMwLTQ5ZjEtODYzOC0wYzliOWJiM2NkNzYuZ2lmP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDIwOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDAyMDlUMTI1MjI4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NTJmNGYyY2YwYzM0OGZjMmI3ZTU4ZjQ3ZTk1OWYxOTJmODI0MDIzNWU1ZDZlNzQ1OWNjYzZmMTliZjY2NzZlOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.1soHCDrRdXc5aklFAYnAwOH4emQvYcveW8c30LLnznI">
<img width=500 src="https://github.com/ascoders/blog/assets/7970947/7625f0f3-2fc0-49f1-8638-0c9b9bb3cd76">

这种情况称为 model bias,此时我们必须升级 model function 的复杂度,升级复杂度后的函数却很难 train 起来,由此引发了一系列解决问题 - 发现新问题 - 再解决新问题的过程,这也是机器学习的发展史,非常精彩,而且读到这里如果你对接下来的挑战以及怎么解决这些挑战非常感兴趣,你就具备了入门机器学习的基本好奇心,我们下一篇就来介绍,如何定义一个理论上能逼近一切实现的函数。

0 comments on commit 752ea28

Please sign in to comment.