-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
775 lines (513 loc) · 60.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width">
<meta name="theme-color" content="#222"><meta name="generator" content="Hexo 5.4.2">
<link rel="apple-touch-icon" sizes="180x180" href="/images/apple-touch-icon-next.png">
<link rel="icon" type="image/png" sizes="32x32" href="/images/favicon-32x32-next.png">
<link rel="icon" type="image/png" sizes="16x16" href="/images/favicon-16x16-next.png">
<link rel="mask-icon" href="/images/logo.svg" color="#222">
<link rel="stylesheet" href="/css/main.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css" integrity="sha256-HtsXJanqjKTc8vVQjO4YMhiqFoXkfBsjBWcX91T1jr8=" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/animate.css/3.1.1/animate.min.css" integrity="sha256-PR7ttpcvz8qrF57fur/yAx1qXMFJeJFiA6pSzWi0OIE=" crossorigin="anonymous">
<script class="next-config" data-name="main" type="application/json">{"hostname":"changye-chen.github.io.git","root":"/","images":"/images","scheme":"Mist","darkmode":false,"version":"8.16.0","exturl":false,"sidebar":{"position":"left","display":"post","padding":18,"offset":12},"copycode":{"enable":false,"style":null},"bookmark":{"enable":false,"color":"#222","save":"auto"},"mediumzoom":false,"lazyload":false,"pangu":false,"comments":{"style":"tabs","active":null,"storage":true,"lazyload":false,"nav":null},"stickytabs":false,"motion":{"enable":true,"async":false,"transition":{"menu_item":"fadeInDown","post_block":"fadeIn","post_header":"fadeInDown","post_body":"fadeInDown","coll_header":"fadeInLeft","sidebar":"fadeInUp"}},"prism":false,"i18n":{"placeholder":"搜索...","empty":"没有找到任何搜索结果:${query}","hits_time":"找到 ${hits} 个搜索结果(用时 ${time} 毫秒)","hits":"找到 ${hits} 个搜索结果"}}</script><script src="/js/config.js"></script>
<meta property="og:type" content="website">
<meta property="og:title" content="changye-chen">
<meta property="og:url" content="https://changye-chen.github.io.git/index.html">
<meta property="og:site_name" content="changye-chen">
<meta property="og:locale" content="zh_CN">
<meta property="article:author" content="Czh">
<meta property="article:tag" content="C/C++, LLM, Bash, study">
<meta name="twitter:card" content="summary">
<link rel="canonical" href="https://changye-chen.github.io.git/">
<script class="next-config" data-name="page" type="application/json">{"sidebar":"","isHome":true,"isPost":false,"lang":"zh-CN","comments":"","permalink":"","path":"index.html","title":""}</script>
<script class="next-config" data-name="calendar" type="application/json">""</script>
<title>changye-chen</title>
<noscript>
<link rel="stylesheet" href="/css/noscript.css">
</noscript>
</head>
<body itemscope itemtype="http://schema.org/WebPage" class="use-motion">
<div class="headband"></div>
<main class="main">
<div class="column">
<header class="header" itemscope itemtype="http://schema.org/WPHeader"><div class="site-brand-container">
<div class="site-nav-toggle">
<div class="toggle" aria-label="切换导航栏" role="button">
<span class="toggle-line"></span>
<span class="toggle-line"></span>
<span class="toggle-line"></span>
</div>
</div>
<div class="site-meta">
<a href="/" class="brand" rel="start">
<i class="logo-line"></i>
<h1 class="site-title">changye-chen</h1>
<i class="logo-line"></i>
</a>
<p class="site-subtitle" itemprop="description">长夜未至</p>
</div>
<div class="site-nav-right">
<div class="toggle popup-trigger" aria-label="搜索" role="button">
</div>
</div>
</div>
<nav class="site-nav">
<ul class="main-menu menu"><li class="menu-item menu-item-home"><a href="/" rel="section"><i class="fa fa-home fa-fw"></i>首页</a></li><li class="menu-item menu-item-about"><a href="/about/" rel="section"><i class="fa fa-user fa-fw"></i>关于</a></li><li class="menu-item menu-item-categories"><a href="/categories/" rel="section"><i class="fa fa-th fa-fw"></i>分类</a></li><li class="menu-item menu-item-archives"><a href="/archives/" rel="section"><i class="fa fa-archive fa-fw"></i>归档</a></li><li class="menu-item menu-item-schedule"><a href="/schedule/" rel="section"><i class="fa fa-calendar fa-fw"></i>日程表</a></li><li class="menu-item menu-item-sitemap"><a href="/sitemap.xml" rel="section"><i class="fa fa-sitemap fa-fw"></i>站点地图</a></li>
</ul>
</nav>
</header>
<aside class="sidebar">
<div class="sidebar-inner sidebar-overview-active">
<ul class="sidebar-nav">
<li class="sidebar-nav-toc">
文章目录
</li>
<li class="sidebar-nav-overview">
站点概览
</li>
</ul>
<div class="sidebar-panel-container">
<!--noindex-->
<div class="post-toc-wrap sidebar-panel">
</div>
<!--/noindex-->
<div class="site-overview-wrap sidebar-panel">
<div class="site-author animated" itemprop="author" itemscope itemtype="http://schema.org/Person">
<p class="site-author-name" itemprop="name">Czh</p>
<div class="site-description" itemprop="description"></div>
</div>
<div class="site-state-wrap animated">
<nav class="site-state">
<div class="site-state-item site-state-posts">
<a href="/archives/">
<span class="site-state-item-count">6</span>
<span class="site-state-item-name">日志</span>
</a>
</div>
<div class="site-state-item site-state-tags">
<span class="site-state-item-count">3</span>
<span class="site-state-item-name">标签</span>
</div>
</nav>
</div>
</div>
</div>
</div>
</aside>
</div>
<div class="main-inner index posts-expand">
<div class="post-block">
<article itemscope itemtype="http://schema.org/Article" class="post-content" lang="">
<link itemprop="mainEntityOfPage" href="https://changye-chen.github.io.git/2024/05/24/%E5%88%A9%E7%94%A8MLC%E9%83%A8%E7%BD%B2LLM%E8%87%B3Android%E8%AE%BE%E5%A4%87/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="image" content="/images/avatar.gif">
<meta itemprop="name" content="Czh">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="changye-chen">
<meta itemprop="description" content="">
</span>
<span hidden itemprop="post" itemscope itemtype="http://schema.org/CreativeWork">
<meta itemprop="name" content="undefined | changye-chen">
<meta itemprop="description" content="">
</span>
<header class="post-header">
<h2 class="post-title" itemprop="name headline">
<a href="/2024/05/24/%E5%88%A9%E7%94%A8MLC%E9%83%A8%E7%BD%B2LLM%E8%87%B3Android%E8%AE%BE%E5%A4%87/" class="post-title-link" itemprop="url">利用MLC部署LLM至Android设备</a>
</h2>
<div class="post-meta-container">
<div class="post-meta">
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar"></i>
</span>
<span class="post-meta-item-text">发表于</span>
<time title="创建时间:2024-05-24 17:54:22 / 修改时间:19:47:04" itemprop="dateCreated datePublished" datetime="2024-05-24T17:54:22+08:00">2024-05-24</time>
</span>
</div>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<h1 id="安装MLC"><a href="#安装MLC" class="headerlink" title="安装MLC"></a>安装MLC</h1><h3 id="install-from-pip"><a href="#install-from-pip" class="headerlink" title="install from pip"></a>install from pip</h3><p>从<a target="_blank" rel="noopener" href="https://llm.mlc.ai/docs/install/mlc_llm.html#install-mlc-packages">Document</a>获知<br>使用下面的命令安装MLC</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">conda activate your-environment</span><br><span class="line">python -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cu121 mlc-ai-nightly-cu121</span><br></pre></td></tr></table></figure>
<p>(注意CUDA版本)</p>
<h3 id="检查安装"><a href="#检查安装" class="headerlink" title="检查安装"></a>检查安装</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">python -c "import mlc_llm; print(mlc_llm)"</span><br></pre></td></tr></table></figure>
<p>如果没有报错,说明安装成功</p>
<h1 id="准备环境"><a href="#准备环境" class="headerlink" title="准备环境"></a>准备环境</h1><h3 id="安装Android-Studio"><a href="#安装Android-Studio" class="headerlink" title="安装Android Studio"></a>安装Android Studio</h3><p>从<a target="_blank" rel="noopener" href="https://developer.android.com/studio">官网</a>下载安装<br>第一次启动会提示配置代理,按下图配置</p>
<img src="/2024/05/24/%E5%88%A9%E7%94%A8MLC%E9%83%A8%E7%BD%B2LLM%E8%87%B3Android%E8%AE%BE%E5%A4%87/image.png" class="">
<p>图中代理更换为你自己的网络代理即可,可以使用connection check</p>
<img src="/2024/05/24/%E5%88%A9%E7%94%A8MLC%E9%83%A8%E7%BD%B2LLM%E8%87%B3Android%E8%AE%BE%E5%A4%87/image-1.png" class="">
<h3 id="配置Android-Studio"><a href="#配置Android-Studio" class="headerlink" title="配置Android Studio"></a>配置Android Studio</h3><p>打开Android Studio,点击Configure->SDK Manager<br>在SDK Tools中保证勾选NDK,CMAKE,AndroidPlatformTools</p>
<img src="/2024/05/24/%E5%88%A9%E7%94%A8MLC%E9%83%A8%E7%BD%B2LLM%E8%87%B3Android%E8%AE%BE%E5%A4%87/image-2.png" class="">
<p>图中没有勾选的CMAKE是因为作者已经安装了CMAKE,如果没有安装请勾选</p>
<h3 id="安装JAVA"><a href="#安装JAVA" class="headerlink" title="安装JAVA"></a>安装JAVA</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install openjdk-17-jdk</span><br></pre></td></tr></table></figure>
<p>保证PATH中有JAVA_HOME且jdk版本为>17</p>
<h3 id="克隆项目mlc-llm"><a href="#克隆项目mlc-llm" class="headerlink" title="克隆项目mlc-llm"></a>克隆项目mlc-llm</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">git clone https://github.com/mlc-ai/mlc-llm.git</span><br><span class="line">cd mlc-llm</span><br><span class="line">git submodule update --init --recursive</span><br></pre></td></tr></table></figure>
<h3 id="配置环境变量"><a href="#配置环境变量" class="headerlink" title="配置环境变量"></a>配置环境变量</h3><p>在.bashrc或者.zshrc中添加</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">function mlc-config() {</span><br><span class="line"> export ANDROID_HOME=$HOME/Android/Sdk</span><br><span class="line"> export ANDROID_NDK=$HOME/Android/Sdk/ndk/27.0.11718014</span><br><span class="line"> export PATH=$PATH:$HOME/Android/Sdk/build-tools</span><br><span class="line"> export PATH=$PATH:$HOME/Android/Sdk/platform-tools</span><br><span class="line"> export TVM_NDK_CC=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android29-clang</span><br><span class="line"> export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64</span><br><span class="line"> export TVM_SOURCE_DIR=$HOME/mlc/mlc-llm/3rdparty/tvm</span><br><span class="line"> export MLC_LLM_HOME=$HOME/mlc/mlc-llm</span><br><span class="line"> export MLC_LLM_SOURCE_DIR=$HOME/mlc/mlc-llm</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>如果是windows系统,在系统环境变量中添加<br>此处的mlc-llm即为上述克隆的项目路径,其中含有第三方库tvm<br>上述代码为作者的环境变量配置,根据自己的安装路径修改</p>
<h3 id="使环境变量生效"><a href="#使环境变量生效" class="headerlink" title="使环境变量生效"></a>使环境变量生效</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">source .bashrc</span><br><span class="line">mlc-config</span><br></pre></td></tr></table></figure>
<h1 id="准备模型"><a href="#准备模型" class="headerlink" title="准备模型"></a>准备模型</h1><h3 id="下载模型"><a href="#下载模型" class="headerlink" title="下载模型"></a>下载模型</h3><p>这里有两种模型,一种是基于TVM的模型,一种是基于huggingface的模型, mlc.ai已经提供了一些TVM模型,可以从<a target="_blank" rel="noopener" href="https://huggingface.co/mlc-ai">这里</a>下载,huggingface模型可以从<a target="_blank" rel="noopener" href="https://huggingface.co/models">这里</a>下载</p>
<h3 id="转换模型"><a href="#转换模型" class="headerlink" title="转换模型"></a>转换模型</h3><p>根据命令:</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">mlc_llm convert_weight \</span><br><span class="line"> CONFIG \</span><br><span class="line"> --quantization QUANTIZATION_MODE \</span><br><span class="line"> [--model-type MODEL_TYPE] \</span><br><span class="line"> [--device DEVICE] \</span><br><span class="line"> [--source SOURCE] \ #The path to original model weight, infer from config if missing.</span><br><span class="line"> [--source-format SOURCE_FORMAT] \ #The path to original model weight, infer from config if missing.</span><br><span class="line"> --output OUTPUT</span><br></pre></td></tr></table></figure>
<p>我们知道,config一般和模型权重文件在同一目录下,所以我们省去了source参数,我们需要提供quantization, model-type, device, output参数<br>以microsoft/Phi-3-mini-4k-instruct为例,该模型是基于huggingface的safetensor模型,我们需要将其转换为TVM模型</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">MODLE_NAME=path/to/microsoft/Phi-3-mini-4k-instruct # 模型路径</span><br><span class="line">QUANTIZATION=q4f16_1 # 量化方式, 使用mlc-llm convert_weight -h查看支持的量化方式</span><br><span class="line">OUTPUT_DIR=path/to/output # 输出路径</span><br><span class="line">DEVICE=Android # 设备</span><br><span class="line">MODEL_TYPE=Phi-3 # 模型架构</span><br><span class="line">mlc_llm convert_weight $MODLE_NAME \</span><br><span class="line"> --quantization $QUANTIZATION \</span><br><span class="line"> --device $DEVICE \</span><br><span class="line"> --model_type $MODEL_TYPE \</span><br><span class="line"> --output $OUTPUT_DIR </span><br></pre></td></tr></table></figure>
<h3 id="生成mlc-chat-config-json文件"><a href="#生成mlc-chat-config-json文件" class="headerlink" title="生成mlc-chat-config.json文件"></a>生成mlc-chat-config.json文件</h3><p>mlc-chat-config.json是一个用于部署模型的配置文件,我们可以使用mlc_llm gen_config命令生成<br>mlc-chat-config.json的作用有两个:</p>
<ol>
<li>为TVM编译提供模型的配置信息</li>
<li>提供了chat模板,用于部署模型</li>
</ol>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">mlc_llm gen_config $MODLE_NAME \</span><br><span class="line"> --quantization $QUANTIZATION \</span><br><span class="line"> --device $DEVICE \</span><br><span class="line"> --model_type $MODEL_TYPE \</span><br><span class="line"> --output $OUTPUT_DIR</span><br></pre></td></tr></table></figure>
<p>执行完上述命令后,会在$OUTPUT_DIR下生成mlc-chat-config.json文件</p>
<p>上述所有步骤完成后,我们已经准备好了模型<br>我们得到的目录结构如下</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">-added_tokens.json </span><br><span class="line">-mlc-chat-config.json </span><br><span class="line">-ndarray-cache.json</span><br><span class="line">-params_shard_1.bin</span><br><span class="line">...</span><br><span class="line">-tokenizer.json</span><br><span class="line">-tokenizer.model</span><br><span class="line">-tokenizer_config.json</span><br></pre></td></tr></table></figure>
<h3 id="编译并打包模型"><a href="#编译并打包模型" class="headerlink" title="编译并打包模型"></a>编译并打包模型</h3><p>在多数时候,你并不需要手动编译模型,在本文中,我们要构建一个Android的apk<br>mlc-llm提供了一个命令,可以直接完成编译和打包工作</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">cd path/to/mlc-llm/android/MLCChat</span><br><span class="line">vim mlc-package-config.json</span><br></pre></td></tr></table></figure>
<p>在这里,我们需要修改mlc-package-config.json文件</p>
<figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">{</span></span><br><span class="line"> <span class="attr">"device"</span><span class="punctuation">:</span> <span class="string">"android"</span><span class="punctuation">,</span></span><br><span class="line"> <span class="attr">"model_list"</span><span class="punctuation">:</span> <span class="punctuation">[</span></span><br><span class="line"> <span class="punctuation">{</span></span><br><span class="line"> <span class="attr">"model"</span><span class="punctuation">:</span> <span class="string">"path/to/output/Phi-3-mini-4k-instruct"</span><span class="punctuation">,</span></span><br><span class="line"> <span class="attr">"model_id"</span><span class="punctuation">:</span> <span class="string">"Phi-3-mini-4k-instruct"</span><span class="punctuation">,</span></span><br><span class="line"> <span class="attr">"estimated_vram_bytes"</span><span class="punctuation">:</span> <span class="number">2152151044</span><span class="punctuation">,</span></span><br><span class="line"> <span class="attr">"bundle_weight"</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">true</span></span></span><br><span class="line"> <span class="punctuation">}</span></span><br><span class="line"> <span class="punctuation">]</span></span><br><span class="line"><span class="punctuation">}</span></span><br></pre></td></tr></table></figure>
<p>estimated_vram_bytes是模型的内存占用,可以通过下面的命令查看</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">du -b path/to/output/Phi-3-mini-4k-instruct</span><br></pre></td></tr></table></figure>
<p>bundle_weight代表是否将模型打包进apk,如果为false,则模型会被下载到设备上,这个时候model字段应该是模型的下载链接<br>修改完后,我们可以执行下面的命令</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mlc_llm package</span><br></pre></td></tr></table></figure>
<p>等待编译完成</p>
<h3 id="构建Android项目和安装apk"><a href="#构建Android项目和安装apk" class="headerlink" title="构建Android项目和安装apk"></a>构建Android项目和安装apk</h3><p>使用Android Studio打开path/to/mlc-llm/android/MLCChat<br>等待gradle同步完成后,点击build->build bundle(s)/APK(s)->Build APK(s)<br>这里需要签名,可以根据<a target="_blank" rel="noopener" href="https://developer.android.com/studio/publish/app-signing#generate-key">官网</a>生成签名文件</p>
<p>经过build android/MLCChat/app/release/app-release.apk就是我们要的apk文件</p>
<p>注意虚拟机无法运行MLCChat,因为模拟器不支持GPU,所以需要使用真机测试<br>要将apk安装到设备上,使用adb<br>先确保设备连接,并开启usb调试模式,然后执行下面的命令</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">adb devices # 查看设备是否连接</span><br><span class="line">cd path/to/mlc-llm/android/MLCChat/</span><br><span class="line">python bundle_weight.py --apk-path app/release/app-release.apk #这个命令先会将apk安装到设备上,然后会将模型放在设备上</span><br></pre></td></tr></table></figure>
<p>安装完成后,我们可以在设备上看到一个名为MLCChat的应用,打开后即可使用模型</p>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</article>
</div>
<div class="post-block">
<article itemscope itemtype="http://schema.org/Article" class="post-content" lang="">
<link itemprop="mainEntityOfPage" href="https://changye-chen.github.io.git/2023/10/23/LLM%E8%AE%BA%E6%96%87%E8%AF%A6%E8%A7%A3-%E3%80%8A%E5%A4%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E5%A6%82%E4%BD%95%E6%8D%95%E6%8D%89%E5%8F%98%E5%8C%96%E7%9A%84%E4%B8%96%E7%95%8C%E7%9F%A5%E8%AF%86%EF%BC%9F%E3%80%8B%EF%BC%88%E7%BB%BC%E8%BF%B0%EF%BC%89/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="image" content="/images/avatar.gif">
<meta itemprop="name" content="Czh">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="changye-chen">
<meta itemprop="description" content="">
</span>
<span hidden itemprop="post" itemscope itemtype="http://schema.org/CreativeWork">
<meta itemprop="name" content="undefined | changye-chen">
<meta itemprop="description" content="">
</span>
<header class="post-header">
<h2 class="post-title" itemprop="name headline">
<a href="/2023/10/23/LLM%E8%AE%BA%E6%96%87%E8%AF%A6%E8%A7%A3-%E3%80%8A%E5%A4%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E5%A6%82%E4%BD%95%E6%8D%95%E6%8D%89%E5%8F%98%E5%8C%96%E7%9A%84%E4%B8%96%E7%95%8C%E7%9F%A5%E8%AF%86%EF%BC%9F%E3%80%8B%EF%BC%88%E7%BB%BC%E8%BF%B0%EF%BC%89/" class="post-title-link" itemprop="url">LLM论文详解-《大语言模型如何捕捉变化的世界知识?》(综述)</a>
</h2>
<div class="post-meta-container">
<div class="post-meta">
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar"></i>
</span>
<span class="post-meta-item-text">发表于</span>
<time title="创建时间:2023-10-23 22:30:39" itemprop="dateCreated datePublished" datetime="2023-10-23T22:30:39+08:00">2023-10-23</time>
</span>
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar-check"></i>
</span>
<span class="post-meta-item-text">更新于</span>
<time title="修改时间:2023-10-25 20:57:13" itemprop="dateModified" datetime="2023-10-25T20:57:13+08:00">2023-10-25</time>
</span>
</div>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>尽管大语言模型在许多任务令人印象深刻,但预训练好的模型是静态的,无法应对日新月异的世界变化,<br>例如如果你问chatgpt谁赢得了世界杯冠军,他会回答法国(2017),而不是阿根廷(2022),即使询<br>问时指定2021年,它只会告诉你它的知识来自于2021年以前,并且不知道这之后的事情。这篇文章提供<br>了如何使大语言模型对齐始终变化的世界而不用从零开始重新训练一个的预训练模型的一系列最新进展</p>
<h2 id="介绍"><a href="#介绍" class="headerlink" title="介绍"></a>介绍</h2><p>大语言模型诞生于大量的种类繁多的语料资源库(github,wikipedia,etc),它们的一些重要特征使得<br>其能够适应很多不同的自然语言处理任务。但是,当大语言模型完成预训练以后,就成为静态的模型了<br>,没有机制能够使它们自我更新以适应变化的世界。<br>从直觉上看,要更新一个预训练模型,你可以选择通过隐式的更新模型内部参数的方式,或是显式的用<br>新的检索到的知识改变输出,替换模型参数蕴藏的那些淘汰掉的知识。有大量的工作已经被提出,本文<br>通过方法划分的方式系统研究了其中具有代表性的文章。</p>
<p>分类法:基于该方法是否倾向于直接改变llm中隐式存储的知识,或者利用外部资源覆盖过时的知识,<br>我们将它们大致分为隐式或显式方法。</p>
<ul>
<li>隐式方法指这些方法寻求直接取代模型参数中存储的知识</li>
<li>显示方法指这些方法寻求使用外部资源取代模型部分输出</li>
</ul>
<h2 id="模型与世界知识隐式对齐"><a href="#模型与世界知识隐式对齐" class="headerlink" title="模型与世界知识隐式对齐"></a>模型与世界知识隐式对齐</h2><p>此前的研究表明,要改变预训练模型的参数有两种方式,重新训练或者是微调。然而重新训练对十亿为<br>数量级的模型参数是不可接受的,微调则可能会导致蝴蝶效应,影响模型的整体表现。为了应对这些问<br>题,研究者们提出knowledge editing和continual learning。</p>
<h3 id="knowledge-editing"><a href="#knowledge-editing" class="headerlink" title="knowledge editing"></a>knowledge editing</h3><p>由于微调学习新知识的不可行,研究者们开始寻求更直接、特定、微粒度的更新知识方法。本文将其分<br>类为meta-learning, hypernetwork和locate-and-editbased方法</p>
<h4 id="meta-learning"><a href="#meta-learning" class="headerlink" title="meta-learning"></a>meta-learning</h4><p>这个方法专注于模型自身的可修改性,目标是在推理时能轻易的更改模型参数。Sinitin等人提出一种<br>元学习方法,用这个方法训练的模型很容易改变它的参数。Chen等人提出一个双循环框架,内循环采用<br>少量梯度更新使预训练的gpt2模型能有效记住外部知识,外循环中,模型参数通过最优元学习动态调整<br>,以纳入有助于推理任务的额外知识。</p>
<h4 id="Hypernetwork-Editer"><a href="#Hypernetwork-Editer" class="headerlink" title="Hypernetwork Editer"></a>Hypernetwork Editer</h4><p>这个方法专注于冻结模型,避免对基础模型做任何修改。早在2017年就有人提出过冻结原始权重训练并<br>仅训练偏移权重来模拟权重更新。更进一步,为了适应大参数模型,2022年lora被提出,它将模型参数<br>矩阵进行低秩分解以显著降低参数量,在原始模型任一层都可以引入分解矩阵,代替原矩阵参与训练。<br>2023年提出老师-学生知识蒸馏模型,以及在表示层进行知识编辑的方法。</p>
<h4 id="locate-and-Edit"><a href="#locate-and-Edit" class="headerlink" title="locate and Edit"></a>locate and Edit</h4><p>这个方法专注于对症下药,通过一系列假设推定特定知识的参数位置,直接更新这部分权重来代表知识<br>更新,2017年一篇文章曾经指出transformer的前馈神经网络事实上是键值对记忆体,Dai等人提出知识<br>神经元的概念,并给出一种在前馈神经网络中标识知识神经元的方法。不久后,在没有微调的情况下,<br>他们直接修改了一些关联的值群,成功更新或删除了知识。</p>
<p>下面一段内容我现在不是很了解,暂时贴译文<br>与Geva等人(2021)的每个神经元视图不同,孟等人(2022a)对GPT-2进行随意跟踪分析,并假设<br>Transformer MLP可以被视为线性联想记忆。他们通过使用一级更新直接更新中间层 MLP 权重来验证他<br>们的假设(Baum 等人,2020)。继孟等人(2022a)的工作之后,孟等人(2023)提出了一种可扩展的多层<br>方法来同时更新具有数千个事实的LLM,在保持泛化和特异性的同时显著提高了编辑效率。古普塔等人。<br>(2023a) 进一步适应它修复常识性错误。Li等人(2023b)发现Multi-Head SelfAttention (MHSA)权重在<br>引入新知识时不需要更新。在此基础上,他们提出通过同时优化MHSA和FFN的Transformer组件隐藏状态<br>来记忆目标知识来精确更新FFN权重。Chen等人(2023b)提出了一种架构适应的多语言集成梯度方法,以<br>在多个架构和语言中精确地定位知识神经元。Geva等人(2023)分析了自回归lm中事实关联的内部回忆过<br>程,为知识定位和模型编辑开辟了新的研究方向。</p>
<h4 id="其它"><a href="#其它" class="headerlink" title="其它"></a>其它</h4><p>任何方法都需要评估有效性的标准,这段主要提及评估知识编辑有效性的框架和数据集</p>
<h3 id="continual-learning"><a href="#continual-learning" class="headerlink" title="continual learning"></a>continual learning</h3><p>持续学习旨在使模型具有学习连续性数据流且减少灾难性遗忘的能力,本文将其划分为continual<br>pre-training 和 continual knowledge editing</p>
<h4 id="continual-pre-training"><a href="#continual-pre-training" class="headerlink" title="continual pre-training"></a>continual pre-training</h4><p>continual pre-training 意在关注关于字面意义上连续预训练的方法,早期的一些工作表明在预训练模<br>型的基础上继续预训练是有一定潜力的,并且有一些文章已经对世界知识做了讨论,当不断学习时,模<br>型应该保留、获取和更新这些知识。continual pre-training 可以划分为正则化,重映和基于结构的方<br>法。</p>
<h5 id="正则化"><a href="#正则化" class="headerlink" title="正则化"></a>正则化</h5><p>基于正则化的方法是为了应对灾难性遗忘所提出的,它提出一些规则来惩罚先前数据中学习的关键参数<br>的变化,Ke等人(2023)使用基于模型鲁棒性的代理计算每个单元(即注意头和神经元)对LM中一般知识的<br>重要性,以保留学习到的知识。当持续学习新领域时,该方法可以防止对一般知识和领域知识的灾难性<br>遗忘,并通过软掩蔽和对比损失鼓励知识转移。</p>
<h5 id="重映"><a href="#重映" class="headerlink" title="重映"></a>重映</h5><p>这种方法很好理解,在初始预训练语料库可用的情况下,继续预训练时掺入或替换合理质量的新世界知<br>识,以期达到既不遗忘以前的知识,又能学习新的知识的目的。</p>
<h5 id="结构化"><a href="#结构化" class="headerlink" title="结构化"></a>结构化</h5><p>这种方法考虑现实因素,采用不同参数子集来应对不同的语言任务,例如适配器网络,通过冻结原始参<br>数以保留学习到的知识,并添加轻量级的可调参数来持续学习新知识。(我觉得这就是微调呢)当然还<br>有不冻结原始参数的,例如采用</p>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</article>
</div>
<div class="post-block">
<article itemscope itemtype="http://schema.org/Article" class="post-content" lang="">
<link itemprop="mainEntityOfPage" href="https://changye-chen.github.io.git/2023/06/03/LLM%E9%83%A8%E7%BD%B2-%E9%83%A8%E7%BD%B2%E7%AF%87/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="image" content="/images/avatar.gif">
<meta itemprop="name" content="Czh">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="changye-chen">
<meta itemprop="description" content="">
</span>
<span hidden itemprop="post" itemscope itemtype="http://schema.org/CreativeWork">
<meta itemprop="name" content="undefined | changye-chen">
<meta itemprop="description" content="">
</span>
<header class="post-header">
<h2 class="post-title" itemprop="name headline">
<a href="/2023/06/03/LLM%E9%83%A8%E7%BD%B2-%E9%83%A8%E7%BD%B2%E7%AF%87/" class="post-title-link" itemprop="url">LLM部署-部署篇</a>
</h2>
<div class="post-meta-container">
<div class="post-meta">
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar"></i>
</span>
<span class="post-meta-item-text">发表于</span>
<time title="创建时间:2023-06-03 14:38:35 / 修改时间:20:00:02" itemprop="dateCreated datePublished" datetime="2023-06-03T14:38:35+08:00">2023-06-03</time>
</span>
</div>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<hr>
<h3 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h3><p>完成环境部署后,已经可以开始部署大语言模型了。<br>本文的模型完全基于<a target="_blank" rel="noopener" href="https://huggingface.co/">Hugging Face</a>。<br>内容包括:</p>
<ul>
<li>下载</li>
<li>llama.cpp </li>
<li>text-generation-webui部署</li>
</ul>
<hr>
<h4 id="下载"><a href="#下载" class="headerlink" title="下载"></a>下载</h4><p><em><strong>以下方法中的decapoda-research/llama-7b-hf只是一个实例</strong></em></p>
<ul>
<li><p>方法一<br> 最简单的方法是<em><strong>Use in Transformers</strong></em>法<br> <strong>确保你拥有transformers module</strong><br> 用下面的命令安装</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pip install transformers</span><br></pre></td></tr></table></figure>
<p> 然后使用python解释器执行以下代码</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> transformers <span class="keyword">import</span> AutoTokenizer, AutoModelForCausalLM</span><br><span class="line"></span><br><span class="line">tokenizer = AutoTokenizer.from_pretrained(<span class="string">"decapoda-research/llama-7b-hf"</span>)</span><br><span class="line"></span><br><span class="line">model = AutoModelForCausalLM.from_pretrained(<span class="string">"decapoda-research/llama-7b-hf"</span>)</span><br></pre></td></tr></table></figure>
<p> 这段代码会从huggingface获取用户decapoda-research的模型llama-7b-hf下所有文件,并加载模型到model中,等待使用,所以tokenizer行的代码是不必要的。<br> 这就相当于使用transformers module帮我们下载了模型文件并存放在~/.cache/huggingface/hub/models–decapoda-research–llama-7b-hf/blobs/中。<br> 此方法的缺点是下载的文件名以sha256编码显示,且只适用于有model card的模型</p>
</li>
<li><p>方法二<br> 仅限于原始llama模型下载的<em><strong>pyllama</strong></em>法</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">pip install -U pyllama transformers</span><br><span class="line">python -m llama.download --model_size=7B --folder=llama</span><br></pre></td></tr></table></figure>
<p> 此方法只能下载原始llama pth权重</p>
</li>
<li><p>方法三<br> 通用的方法是<em><strong>git lfs</strong></em>法<br> <strong>确保你拥有git</strong><br> 用下面的命令安装git和git lfs</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">sudo apt install git</span><br><span class="line">curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash</span><br><span class="line">sudo apt-get install git-lfs</span><br><span class="line">git lfs install</span><br></pre></td></tr></table></figure>
<p> 现在你可以使用下面的命令安装指定的hugging face仓库</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git lfs install</span><br><span class="line">GIT_LFS_SKIP_SMUDGE=1 git <span class="built_in">clone</span> https://huggingface.co/decapoda-research/llama-7b-hf</span><br></pre></td></tr></table></figure>
<blockquote>
<p>上述命令不会下载大文件</p>
</blockquote>
<p> 用下面的命令下载指定的文件</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> llama-7b-hf/</span><br><span class="line">git lfs pull --include=<span class="string">"[filename]"</span></span><br></pre></td></tr></table></figure>
<p> [filename]支持通配符,比如你可以</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">git lfs pull --include=<span class="string">"*.bin"</span></span><br></pre></td></tr></table></figure>
<p> 来指定下载所有以.bin结尾的文件,或者只指定一个文件名下载对应的文件。</p>
</li>
</ul>
<hr>
<h4 id="llama-cpp"><a href="#llama-cpp" class="headerlink" title="llama.cpp"></a><a target="_blank" rel="noopener" href="https://github.com/ggerganov/llama.cpp">llama.cpp</a></h4><ul>
<li>简介<br> llama.cpp llama模型推理的C/C++实现,它允许用户使用CPU进行推理,并提供一种模型量化方法以降低RAM使用。</li>
<li>安装 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">clone</span> https://github.com/ggerganov/llama.cpp.git</span><br><span class="line"><span class="built_in">cd</span> llama.cpp</span><br><span class="line">pip install -r requirements.txt</span><br><span class="line"><span class="comment">#如果你想使用Nvidia GPU加速,那么</span></span><br><span class="line">make LLAMA_CUBLAS=1</span><br><span class="line"><span class="comment">#否则</span></span><br><span class="line"><span class="comment">#make</span></span><br><span class="line"><span class="built_in">cd</span> -</span><br></pre></td></tr></table></figure></li>
<li>使用<br> 在使用llama.cpp帮助推理前,要明白它是将pytorch格式的模型文件量化为ggml格式,再进行推理的。<br> 而huggingface下载的模型文件都是hf格式的bin文件,首先需要转存为pytorch模型。<br> 这里我们部署<a target="_blank" rel="noopener" href="https://github.com/ymcui/Chinese-LLaMA-Alpaca">chinese-llama-alpaca</a>项目,它提供转存为pytorch model选项。<br> 如果想体验原版llama,并使用llama.cpp推理,参照<a target="_blank" rel="noopener" href="https://gist.github.com/pdtgct/b8bcbf9220d4d5059b62b1c35615a650">convert_hf_llama_to_ggml.md</a><blockquote>
<p>首先需要下载好decapoda-research/llama-7b-hf,推荐使用方法一</p>
</blockquote>
<ul>
<li>克隆项目<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">git <span class="built_in">clone</span> https://github.com/ymcui/Chinese-LLaMA-Alpaca.git</span><br><span class="line"><span class="built_in">cd</span> Chinese-LLaMA-Alpaca</span><br></pre></td></tr></table></figure></li>
<li>下载alpaca lora模型<br>使用方法三下载lora weight<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">git lfs install</span><br><span class="line">GIT_LFS_SKIP_SMUDGE=1 git <span class="built_in">clone</span> https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b</span><br><span class="line"><span class="built_in">cd</span> chinese-alpaca-lora-7b</span><br><span class="line">git lfs pull --include <span class="string">"*.bin"</span></span><br><span class="line"><span class="built_in">cd</span> -</span><br></pre></td></tr></table></figure></li>
<li>创建合并env,合并模型<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">conda create -n chinese-llama-alpaca python=3.10</span><br><span class="line">conda activate chinese-llama-alpaca</span><br><span class="line">pip install -r requirements.txt</span><br><span class="line"></span><br><span class="line"><span class="built_in">mkdir</span> chinese-alpaca-lora-7b-merged</span><br><span class="line"></span><br><span class="line">python scripts/merge_llama_with_chinese_lora.py \</span><br><span class="line">--base_model decapoda-research/llama-7b-hf \</span><br><span class="line">--lora_model chinese-alpaca-lora-7b \</span><br><span class="line">--output_type pth \</span><br><span class="line">--output_dir chinese-alpaca-lora-7b-merged</span><br></pre></td></tr></table></figure></li>
<li>放置模型,配置文件<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> llama.cpp </span><br><span class="line"><span class="built_in">mkdir</span> zh-models</span><br><span class="line"><span class="built_in">mkdir</span> zh-models/7B</span><br><span class="line"><span class="built_in">set</span> model_dir=<span class="string">"chinese-llama-alpaca/chinese-alpaca-lora-7b-merged"</span></span><br><span class="line"><span class="built_in">mv</span> <span class="variable">$model_dir</span>/consolidated.00.pth zh-models/7B</span><br><span class="line"><span class="built_in">mv</span> <span class="variable">$model_dir</span>/params.json zh-models/7B</span><br><span class="line"><span class="built_in">mv</span> <span class="variable">$model_dir</span>/tokenizer.model zh-models</span><br><span class="line"><span class="built_in">unset</span> model_dir</span><br></pre></td></tr></table></figure></li>
<li>转换权重,量化,推理<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># pth to fp16</span></span><br><span class="line">python convert.py zh-models/7B/</span><br><span class="line"><span class="comment"># 下面是q4_0量化实例</span></span><br><span class="line">./quantize ./zh-models/7B/ggml-model-f16.bin ./zh-models/7B/ggml-model-q4_0.bin q4_0</span><br><span class="line"><span class="comment">#推理</span></span><br><span class="line">./main -m zh-models/7B/ggml-model-q4_0.bin --color -f prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 --repeat_penalty 1.1</span><br><span class="line"><span class="comment"># 上述参数表示以instruction模式启动推理,prompt模板为alpaca.txt,max_token 2048,未使用cuBLAS</span></span><br><span class="line"><span class="comment"># 要使用GPU辅助推理,确保是以LLAMA_CUBLAS=1 构建的项目 只需要加入参数 -ngl N 指定以N为值的layer offload to GPU</span></span><br><span class="line">./main -m zh-models/7B/ggml-model-q4_0.bin --color -f prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 --repeat_penalty 1.1 -ngl 32</span><br><span class="line"><span class="comment">#Ctrl + c 退出推理</span></span><br><span class="line"><span class="built_in">cd</span> -</span><br></pre></td></tr></table></figure>
在examples下有更多使用案例。</li>
</ul>
</li>
</ul>
<hr>
<h4 id="text-generation-webui部署"><a href="#text-generation-webui部署" class="headerlink" title="text-generation-webui部署"></a><a target="_blank" rel="noopener" href="https://github.com/oobabooga/text-generation-webui">text-generation-webui</a>部署</h4><ul>
<li>简介<br> text-generation-webui是一个开源github项目,它是一个gradio web UI应用,用于运行,微调大语言模型,如LLaMA,llama.cpp,GPT-J,Pythia,OPT和GALACTICA。<br> 基于web UI,我们可以用它轻易地部署一个大语言模型</li>
<li>安装 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">conda create -n textgen python=3.10</span><br><span class="line">conda activate textgen</span><br><span class="line">git <span class="built_in">clone</span> https://github.com/oobabooga/text-generation-webui.git</span><br><span class="line"><span class="built_in">cd</span> text-generation-webui/</span><br><span class="line">pip install -r requirements.txt</span><br><span class="line"><span class="built_in">cd</span> -</span><br></pre></td></tr></table></figure></li>
<li>使用<br> 要使用本地模型,请将模型放在models/下<br> 如果我们使用chinese-llama-alpaca项目合并的模型,那么合并时指定output_type为huggingface <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> chinese-llama-alpaca</span><br><span class="line">conda activate chinese-llama-alpaca</span><br><span class="line">python scripts/merge_llama_with_chinese_lora.py \</span><br><span class="line">--base_model decapoda-research/llama-7b-hf \</span><br><span class="line">--lora_model chinese-alpaca-lora-7b \</span><br><span class="line">--output_type huggingface \</span><br><span class="line">--output_dir chinese-alpaca-lora-7b-merged-hf</span><br><span class="line">conda deactivate</span><br><span class="line"><span class="built_in">mv</span> -r chinese-alpaca-lora-7b-merged-hf ../text-generation-webui/models/</span><br><span class="line">conda activate textgen</span><br><span class="line"><span class="comment">#下面是以hf格式启动模型chat推理</span></span><br><span class="line">python server.py --model chinese-alpaca-lora-7b-merged-hf --auto-devices --chat</span><br><span class="line"><span class="comment">#要在text-generation-webui中使用llama.cpp推理</span></span><br><span class="line">pip install llama-cpp-python<span class="string">"[server]"</span></span><br><span class="line"><span class="comment">#下面是以ggml q4_0量化格式启动模型chat推理,cpu模式,请确保模型在models下</span></span><br><span class="line">python server.py --model ggml-model-q4_0.bin</span><br><span class="line"><span class="comment">#若转移layers到gpu且使用不量化版本,使用</span></span><br><span class="line">python server.py --model ggml-model-f16.bin --auto-devices -ngl 32</span><br></pre></td></tr></table></figure>
text-generation-webui是非常强大的集成web应用,若不以chat模式启动,默认为notebook模式,可以直接从应用上下载和微调huggingface模型</li>
</ul>
<hr>
<h4 id="值得关注的信息"><a href="#值得关注的信息" class="headerlink" title="值得关注的信息"></a>值得关注的信息</h4><ul>
<li><a target="_blank" rel="noopener" href="https://github.com/lm-sys/FastChat">lm-sys</a>,发布了Vicuna-13B,达到90% chatGPT性能</li>
<li><a target="_blank" rel="noopener" href="https://huggingface.co/TheBloke">TheBloke</a>发布了众多语言模型的量化版本。</li>
<li>如果想微调lora模型,参照<a target="_blank" rel="noopener" href="https://github.com/tloen/alpaca-lora">alpaca-lora</a>。</li>
<li><a target="_blank" rel="noopener" href="https://github.com/hwchase17/langchain">langchain</a> 一个推理应用工具链</li>
</ul>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</article>
</div>
<div class="post-block">
<article itemscope itemtype="http://schema.org/Article" class="post-content" lang="">
<link itemprop="mainEntityOfPage" href="https://changye-chen.github.io.git/2023/06/03/LLM%E9%83%A8%E7%BD%B2-%E4%BB%A3%E7%90%86%E7%AF%87/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="image" content="/images/avatar.gif">
<meta itemprop="name" content="Czh">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="changye-chen">
<meta itemprop="description" content="">
</span>
<span hidden itemprop="post" itemscope itemtype="http://schema.org/CreativeWork">
<meta itemprop="name" content="undefined | changye-chen">
<meta itemprop="description" content="">
</span>
<header class="post-header">
<h2 class="post-title" itemprop="name headline">
<a href="/2023/06/03/LLM%E9%83%A8%E7%BD%B2-%E4%BB%A3%E7%90%86%E7%AF%87/" class="post-title-link" itemprop="url">LLM部署-代理篇</a>
</h2>
<div class="post-meta-container">
<div class="post-meta">
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar"></i>
</span>
<span class="post-meta-item-text">发表于</span>
<time title="创建时间:2023-06-03 14:37:54 / 修改时间:21:04:18" itemprop="dateCreated datePublished" datetime="2023-06-03T14:37:54+08:00">2023-06-03</time>
</span>
</div>
</div>
</header>
<div class="post-body" itemprop="articleBody">
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</article>
</div>
<div class="post-block">
<article itemscope itemtype="http://schema.org/Article" class="post-content" lang="">
<link itemprop="mainEntityOfPage" href="https://changye-chen.github.io.git/2023/06/03/LLM%E9%83%A8%E7%BD%B2-%E6%A6%82%E8%A7%88/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="image" content="/images/avatar.gif">
<meta itemprop="name" content="Czh">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="changye-chen">
<meta itemprop="description" content="">
</span>
<span hidden itemprop="post" itemscope itemtype="http://schema.org/CreativeWork">
<meta itemprop="name" content="undefined | changye-chen">
<meta itemprop="description" content="">
</span>
<header class="post-header">
<h2 class="post-title" itemprop="name headline">
<a href="/2023/06/03/LLM%E9%83%A8%E7%BD%B2-%E6%A6%82%E8%A7%88/" class="post-title-link" itemprop="url">LLM部署-概览</a>
</h2>
<div class="post-meta-container">
<div class="post-meta">
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar"></i>
</span>
<span class="post-meta-item-text">发表于</span>
<time title="创建时间:2023-06-03 14:37:46 / 修改时间:15:41:14" itemprop="dateCreated datePublished" datetime="2023-06-03T14:37:46+08:00">2023-06-03</time>
</span>
</div>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<hr>
<h3 id="简介"><a href="#简介" class="headerlink" title="简介"></a>简介</h3><p>在chatGPT大火之后,在<a target="_blank" rel="noopener" href="https://arxiv.org/abs/2203.02155v1">InstructGPT</a>方法上复现和改进的开源模型越来越多地涌现出来,随之而来的消费级显卡部署方案也变得逐渐成熟。<br>本系列文章专注于介绍源于llama(<a target="_blank" rel="noopener" href="https://github.com/facebookresearch/llama">github</a>|<a target="_blank" rel="noopener" href="https://arxiv.org/abs/2302.13971v1">arxiv</a>)的众多语言模型统一和系统的部署方案。</p>
<hr>
<h3 id="目录"><a href="#目录" class="headerlink" title="目录"></a>目录</h3><ul>
<li><a href="./LLM%E9%83%A8%E7%BD%B2-%E7%8E%AF%E5%A2%83%E7%AF%87.md">LLM部署-环境篇</a></li>
<li><a href="./LLM%E9%83%A8%E7%BD%B2-%E4%BB%A3%E7%90%86%E7%AF%87.md">LLM部署-代理篇</a></li>
<li><a href="./LLM%E9%83%A8%E7%BD%B2-%E9%83%A8%E7%BD%B2%E7%AF%87.md">LLM部署-部署篇</a></li>
</ul>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</article>
</div>
<div class="post-block">
<article itemscope itemtype="http://schema.org/Article" class="post-content" lang="">
<link itemprop="mainEntityOfPage" href="https://changye-chen.github.io.git/2023/06/03/LLM%E9%83%A8%E7%BD%B2-%E7%8E%AF%E5%A2%83%E7%AF%87/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="image" content="/images/avatar.gif">
<meta itemprop="name" content="Czh">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="changye-chen">
<meta itemprop="description" content="">
</span>
<span hidden itemprop="post" itemscope itemtype="http://schema.org/CreativeWork">
<meta itemprop="name" content="undefined | changye-chen">
<meta itemprop="description" content="">
</span>
<header class="post-header">
<h2 class="post-title" itemprop="name headline">
<a href="/2023/06/03/LLM%E9%83%A8%E7%BD%B2-%E7%8E%AF%E5%A2%83%E7%AF%87/" class="post-title-link" itemprop="url">LLM部署-环境篇</a>
</h2>
<div class="post-meta-container">
<div class="post-meta">
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar"></i>
</span>
<span class="post-meta-item-text">发表于</span>
<time title="创建时间:2023-06-03 14:36:43 / 修改时间:15:41:10" itemprop="dateCreated datePublished" datetime="2023-06-03T14:36:43+08:00">2023-06-03</time>
</span>
</div>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<hr>
<h3 id="简介"><a href="#简介" class="headerlink" title="简介"></a>简介</h3><p>本篇文章的目的是为了介绍部署方案中的环境配置,环境配置是任何项目部署的关键。<br>笔者使用的操作系统:Ubuntu22.04 LTS</p>
<hr>
<h3 id="Driver-amp-amp-CUDA-Toolkit"><a href="#Driver-amp-amp-CUDA-Toolkit" class="headerlink" title="Driver && CUDA Toolkit"></a>Driver && CUDA Toolkit</h3><p>第一步,查看当前显卡支持的最高版本驱动</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">nvidia-smi</span><br></pre></td></tr></table></figure>
<p>命令输出的Driver Version就是当前显卡支持的最高版本的显卡驱动,CUDA Version同理。<br>第二步,下载安装显卡驱动和CUDA Toolkit<br>访问<a href="">此处</a>,登录后</p>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</article>
</div>
</div>
</main>
<footer class="footer">
<div class="footer-inner">
<div class="copyright">
©
<span itemprop="copyrightYear">2024</span>
<span class="with-love">
<i class="fa fa-heart"></i>
</span>
<span class="author" itemprop="copyrightHolder">Czh</span>
</div>
<div class="powered-by">由 <a href="https://hexo.io/" rel="noopener" target="_blank">Hexo</a> & <a href="https://theme-next.js.org/mist/" rel="noopener" target="_blank">NexT.Mist</a> 强力驱动
</div>
</div>
</footer>
<div class="toggle sidebar-toggle" role="button">
<span class="toggle-line"></span>
<span class="toggle-line"></span>
<span class="toggle-line"></span>
</div>
<div class="sidebar-dimmer"></div>
<div class="back-to-top" role="button" aria-label="返回顶部">
<i class="fa fa-arrow-up fa-lg"></i>
<span>0%</span>
</div>
<noscript>
<div class="noscript-warning">Theme NexT works best with JavaScript enabled</div>
</noscript>
<script src="https://cdnjs.cloudflare.com/ajax/libs/animejs/3.2.1/anime.min.js" integrity="sha256-XL2inqUJaslATFnHdJOi9GfQ60on8Wx1C2H8DYiN1xY=" crossorigin="anonymous"></script>
<script src="/js/comments.js"></script><script src="/js/utils.js"></script><script src="/js/motion.js"></script><script src="/js/schemes/muse.js"></script><script src="/js/next-boot.js"></script>
</body>
</html>