-
Notifications
You must be signed in to change notification settings - Fork 0
/
atom.xml
703 lines (556 loc) · 221 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>MelonQi Blog</title>
<subtitle>For dream</subtitle>
<link href="/atom.xml" rel="self"/>
<link href="http://melonqi.cn/"/>
<updated>2017-09-21T14:47:51.000Z</updated>
<id>http://melonqi.cn/</id>
<author>
<name>MelonQi</name>
</author>
<generator uri="http://hexo.io/">Hexo</generator>
<entry>
<title>AgentPool</title>
<link href="http://melonqi.cn/2017/09/21/AgentPool/"/>
<id>http://melonqi.cn/2017/09/21/AgentPool/</id>
<published>2017-09-21T12:32:46.000Z</published>
<updated>2017-09-21T14:47:51.000Z</updated>
<content type="html"><![CDATA[<p>为了解决爬取数据会被检测出流量问题,不得不求助于免费的代理了,其实主要原因还是穷,买不起代理。国内有很多不错的代理站,比如<a href="http://www.66ip.cn" target="_blank" rel="external">免费代理ip<em>服务器http代理</em>最新ip代理<em>免费ip提取网站</em>国内外代理_66免费代理ip</a>,他们会不定期扫描机器,扫描出代理就加到免费的代理池。</p>
<p>今天我们要做的就是从这些代理库中爬取有效地信息,使用工具有bs4,requests,redis。将获取的有效代理加到redis中,爬虫程序会从redis获取有效地代理来爬取数据,从而躲过ip流量封锁。</p>
<p>二话不说直接上代码</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"><span class="keyword">import</span> requests</span><br><span class="line"><span class="keyword">import</span> bs4</span><br><span class="line"><span class="keyword">import</span> random</span><br><span class="line"><span class="keyword">import</span> redis</span><br><span class="line"><span class="keyword">import</span> time</span><br><span class="line"></span><br><span class="line">TIMEOUT = <span class="number">5</span></span><br><span class="line"></span><br><span class="line">USER_AGENTS = [</span><br><span class="line"> <span class="string">"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20"</span>,</span><br><span class="line"> <span class="string">"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER"</span>,</span><br><span class="line"> <span class="string">"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11"</span>,</span><br><span class="line"> <span class="string">"Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10"</span></span><br><span class="line">]</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">get_header</span><span class="params">()</span>:</span></span><br><span class="line"> <span class="keyword">return</span> {</span><br><span class="line"> <span class="string">'User-Agent'</span>: random.choice(USER_AGENTS),</span><br><span class="line"> <span class="string">'Accept'</span>: <span class="string">'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'</span>,</span><br><span class="line"> <span class="string">'Accept-Language'</span>: <span class="string">'en-US,en;q=0.5'</span>,</span><br><span class="line"> <span class="string">'Connection'</span>: <span class="string">'keep-alive'</span>,</span><br><span class="line"> <span class="string">'Accept-Encoding'</span>: <span class="string">'gzip, deflate'</span>,</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">valid_agent</span><span class="params">(agent)</span>:</span></span><br><span class="line"> <span class="keyword">try</span>:</span><br><span class="line"> proxies = {<span class="string">"https"</span>: <span class="string">"http://%s"</span> % agent}</span><br><span class="line"> r = requests.get(url=<span class="string">"https://bj.lianjia.com"</span>, headers=get_header(), timeout=TIMEOUT,</span><br><span class="line"> proxies=proxies)</span><br><span class="line"> <span class="keyword">return</span> r.ok <span class="keyword">and</span> len(r.content) > <span class="number">500</span></span><br><span class="line"> <span class="keyword">except</span> Exception:</span><br><span class="line"> <span class="keyword">return</span> <span class="keyword">False</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">get_valid_agents</span><span class="params">(params)</span>:</span></span><br><span class="line"> count = <span class="number">0</span></span><br><span class="line"> url = params[<span class="number">0</span>]</span><br><span class="line"> redis_connector = params[<span class="number">1</span>]</span><br><span class="line"> response = requests.get(url, headers=get_header())</span><br><span class="line"> soup = bs4.BeautifulSoup(response.content, <span class="string">'lxml'</span>)</span><br><span class="line"> <span class="keyword">for</span> ip_list <span class="keyword">in</span> soup.find_all(<span class="string">'div'</span>, id=<span class="string">"main"</span>):</span><br><span class="line"> <span class="keyword">for</span> table <span class="keyword">in</span> ip_list.find_all(<span class="string">"table"</span>):</span><br><span class="line"> <span class="keyword">for</span> info <span class="keyword">in</span> table.find_all(<span class="string">"tr"</span>)[<span class="number">1</span>:]:</span><br><span class="line"> ip = info.contents[<span class="number">0</span>].string</span><br><span class="line"> port = info.contents[<span class="number">1</span>].string</span><br><span class="line"> agent = ip + <span class="string">":"</span> + port</span><br><span class="line"> <span class="keyword">if</span> valid_agent(agent):</span><br><span class="line"> count += <span class="number">1</span></span><br><span class="line"> <span class="keyword">print</span> agent</span><br><span class="line"> redis_connector.sadd(<span class="string">"ValidAgent"</span>, agent)</span><br><span class="line"> <span class="keyword">return</span> count</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">"__main__"</span>:</span><br><span class="line"> <span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line"> start = time.time()</span><br><span class="line"> r = redis.Redis(host=<span class="string">'localhost'</span>, port=<span class="number">6379</span>, db=<span class="number">0</span>)</span><br><span class="line"> urls = [(<span class="string">"http://www.66ip.cn/%d.html"</span> % i, r) <span class="keyword">for</span> i <span class="keyword">in</span> range(<span class="number">1</span>, <span class="number">50</span>)]</span><br><span class="line"> r.delete(<span class="string">'ValidAgent'</span>)</span><br><span class="line"> valid_agent_count = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> param <span class="keyword">in</span> urls:</span><br><span class="line"> <span class="keyword">print</span> param[<span class="number">0</span>]</span><br><span class="line"> valid_agent_count += get_valid_agents(param)</span><br><span class="line"> end = time.time()</span><br><span class="line"> print(<span class="string">"Find %d valid agents, use time %d second"</span> % (valid_agent_count, end - start))</span><br><span class="line"> time.sleep(<span class="number">60</span> * <span class="number">30</span>)</span><br></pre></td></tr></table></figure>
<p>悲剧的是,即使获取了免费的代理,去爬取链家的数据,发现会提示ip流量太多。。。看来大家都在用这个免费的代理,用的人多了,自然会被封锁。果然天下没有免费的午餐,获取免费代理这条路就走不通了。不得不采用最挫的办法,爬取数据,如果被封锁了,就暂停,sleep一段时间,等封锁了再继续爬取数据。办法比较笨,但是对于链家这种变化没那么实时的数据来说应该够用了,不得不走的下下策。</p>
<p>不过获取免费代理这个也不是没有用处,在测试阶段可以使用免费代理来测试网页在不同网络ip下的性能情况,也能够有效模拟其他用户访问,当然如果你想拿来办坏事也是可以的,比如攻击别人网站,又不会对方发现自己的真实ip。</p>
<p>好久不写博客了,有点对不起自己的计划,鞭策一下,计划还是要继续!!!</p>
]]></content>
<summary type="html">
<p>为了解决爬取数据会被检测出流量问题,不得不求助于免费的代理了,其实主要原因还是穷,买不起代理。国内有很多不错的代理站,比如<a href="http://www.66ip.cn" target="_blank" rel="external">免费代理ip<em>服务器htt
</summary>
<category term="Agent" scheme="http://melonqi.cn/tags/Agent/"/>
<category term="Pool" scheme="http://melonqi.cn/tags/Pool/"/>
<category term="redis" scheme="http://melonqi.cn/tags/redis/"/>
</entry>
<entry>
<title>python 多线程抓取链家数据</title>
<link href="http://melonqi.cn/2017/08/29/python-%E5%A4%9A%E7%BA%BF%E7%A8%8B%E6%8A%93%E5%8F%96%E9%93%BE%E5%AE%B6%E6%95%B0%E6%8D%AE/"/>
<id>http://melonqi.cn/2017/08/29/python-多线程抓取链家数据/</id>
<published>2017-08-29T12:15:17.000Z</published>
<updated>2017-08-29T12:57:25.000Z</updated>
<content type="html"><![CDATA[<p>Python在程序并行化方面多少有些声名狼藉, 基于<a href="https://segmentfault.com/a/1190000000414339" target="_blank" rel="external">一行 Python 实现并行化</a>, 可以使用multiprocessing库中的map实现并兴化操作,简单快捷.</p>
<a id="more"></a>
<h2 id="map_u5B9E_u73B0_u5E76_u884C_u5316"><a href="#map_u5B9E_u73B0_u5E76_u884C_u5316" class="headerlink" title="map实现并行化"></a>map实现并行化</h2><p>map函数会根据提供的函数对指定序列做映射, 常用的类似函数有reduce,filter. 这里使用multiprocessing 和它鲜为人知的子库 multiprocessing.dummy中的map函数,可以帮助我们很好地实现并行化.</p>
<p>dummy 是 multiprocessing 模块的完整克隆,唯一的不同在于 multiprocessing 作用于进程,而 dummy 模块作用于线程(因此也包括了 Python 所有常见的多线程限制).所以替换使用这两个库异常容易. 你可以针对 IO 密集型任务和 CPU 密集型任务来选择不同的库.简言之, IO 密集型任务选择multiprocessing.dummy, CPU 密集型任务选择multiprocessing.</p>
<p>举例:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> urllib2 </span><br><span class="line"><span class="keyword">from</span> multiprocessing.dummy <span class="keyword">import</span> Pool <span class="keyword">as</span> ThreadPool </span><br><span class="line"></span><br><span class="line">urls = [</span><br><span class="line"> <span class="string">'http://www.python.org'</span>, </span><br><span class="line"> <span class="string">'http://www.python.org/about/'</span>,</span><br><span class="line"> <span class="string">'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html'</span>,</span><br><span class="line"> <span class="string">'http://www.python.org/doc/'</span>,</span><br><span class="line"> <span class="string">'http://www.python.org/download/'</span>,</span><br><span class="line"> <span class="string">'http://www.python.org/getit/'</span>,</span><br><span class="line"> <span class="string">'http://www.python.org/community/'</span>,</span><br><span class="line"> <span class="string">'https://wiki.python.org/moin/'</span>,</span><br><span class="line"> <span class="string">'http://planet.python.org/'</span>,</span><br><span class="line"> <span class="string">'https://wiki.python.org/moin/LocalUserGroups'</span>,</span><br><span class="line"> <span class="string">'http://www.python.org/psf/'</span>,</span><br><span class="line"> <span class="string">'http://docs.python.org/devguide/'</span>,</span><br><span class="line"> <span class="string">'http://www.python.org/community/awards/'</span></span><br><span class="line"> <span class="comment"># etc.. </span></span><br><span class="line"> ]</span><br><span class="line"></span><br><span class="line"><span class="comment"># Make the Pool of workers</span></span><br><span class="line">pool = ThreadPool(<span class="number">4</span>) </span><br><span class="line"><span class="comment"># Open the urls in their own threads</span></span><br><span class="line"><span class="comment"># and return the results</span></span><br><span class="line">results = pool.map(urllib2.urlopen, urls)</span><br><span class="line"><span class="comment">#close the pool and wait for the work to finish </span></span><br><span class="line">pool.close() </span><br><span class="line">pool.join()</span><br></pre></td></tr></table></figure>
<p>关于性能测试,原文有简单的demo测试,可以查看原文<a href="https://segmentfault.com/a/1190000000414339" target="_blank" rel="external">一行 Python 实现并行化</a>. </p>
<h2 id="u6293_u53D6_u5317_u4EAC_u5E02_u4E8C_u624B_u623F_u6570_u636E"><a href="#u6293_u53D6_u5317_u4EAC_u5E02_u4E8C_u624B_u623F_u6570_u636E" class="headerlink" title="抓取北京市二手房数据"></a>抓取北京市二手房数据</h2><p>结合bs4,map我们可以很好地抓取全北京市地二手房数据;北京市的二手房划分为不同的区,针对某个区进行抓取数据就可以了. </p>
<p>关于bs4的使用可以参考<a href="http://dreamversion.github.io/2017/08/20/BeatifulSoup-regex-%E8%A7%A3%E6%9E%90%E9%93%BE%E5%AE%B6%E6%88%BF%E6%BA%90%E4%BF%A1%E6%81%AF/" target="_blank" rel="external">BeautifulSoup & regex 解析链家房源信息</a></p>
<p>完整的代码如下:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"><span class="keyword">from</span> multiprocessing.dummy <span class="keyword">import</span> Pool <span class="keyword">as</span> ThreadPool</span><br><span class="line"><span class="keyword">import</span> bs4</span><br><span class="line"><span class="keyword">import</span> requests</span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">import</span> re</span><br><span class="line"><span class="keyword">import</span> demjson</span><br><span class="line"></span><br><span class="line"><span class="comment"># parse single house</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">parse_house</span><span class="params">(house_id)</span>:</span></span><br><span class="line"> house_url = <span class="string">"http://bj.lianjia.com/ershoufang/"</span> + house_id + <span class="string">".html"</span></span><br><span class="line"> <span class="keyword">print</span> house_url</span><br><span class="line"> r = requests.get(house_url)</span><br><span class="line"> soup = bs4.BeautifulSoup(r.content, <span class="string">"lxml"</span>)</span><br><span class="line"> pattern = re.compile(<span class="string">r'init\((.*?)\);'</span>, re.DOTALL)</span><br><span class="line"> <span class="keyword">for</span> script <span class="keyword">in</span> soup.find_all(<span class="string">'script'</span>):</span><br><span class="line"> <span class="keyword">if</span> type(script.string) == bs4.element.NavigableString:</span><br><span class="line"> <span class="keyword">if</span> script.text.find(<span class="string">"sellDetail"</span>) >= <span class="number">0</span>:</span><br><span class="line"> match = pattern.search(script.text)</span><br><span class="line"> <span class="keyword">if</span> match:</span><br><span class="line"> json_dict = demjson.decode(match.group(<span class="number">1</span>))</span><br><span class="line"> house_info = json.dumps(json_dict, indent=<span class="number">4</span>)</span><br><span class="line"> <span class="keyword">print</span> house_info</span><br><span class="line"></span><br><span class="line"><span class="comment"># get house ids</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">parse_house_page</span><span class="params">(url)</span>:</span></span><br><span class="line"> r = requests.get(url)</span><br><span class="line"> soup = bs4.BeautifulSoup(r.content, <span class="string">'lxml'</span>)</span><br><span class="line"> pattern = re.compile(<span class="string">r'main\((.*?)\);'</span>, re.DOTALL)</span><br><span class="line"> <span class="keyword">for</span> script <span class="keyword">in</span> soup.find_all(<span class="string">'script'</span>):</span><br><span class="line"> <span class="keyword">if</span> type(script.string) == bs4.element.NavigableString:</span><br><span class="line"> <span class="keyword">if</span> script.text.find(<span class="string">"sellList"</span>) >= <span class="number">0</span>:</span><br><span class="line"> match = pattern.search(script.text)</span><br><span class="line"> <span class="keyword">if</span> match:</span><br><span class="line"> json_dict = demjson.decode(match.group(<span class="number">1</span>))</span><br><span class="line"> house_ids = json_dict[<span class="string">'ids'</span>].split(<span class="string">','</span>)</span><br><span class="line"> <span class="keyword">for</span> house_id <span class="keyword">in</span> house_ids:</span><br><span class="line"> parse_house(house_id)</span><br><span class="line"></span><br><span class="line"><span class="comment"># parse a district house</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">parse_district</span><span class="params">(url)</span>:</span></span><br><span class="line"> r = requests.get(url)</span><br><span class="line"> <span class="keyword">print</span> r.content</span><br><span class="line"> soup = bs4.BeautifulSoup(r.content, <span class="string">'lxml'</span>)</span><br><span class="line"> total_page = <span class="number">0</span></span><br><span class="line"> <span class="keyword">for</span> result <span class="keyword">in</span> soup.find_all(<span class="string">'div'</span>, class_=<span class="string">'page-box fr'</span>):</span><br><span class="line"> page_box = result.contents[<span class="number">0</span>]</span><br><span class="line"> total_page_json = json.loads(page_box.attrs[<span class="string">'page-data'</span>])</span><br><span class="line"> total_page = total_page_json[<span class="string">'totalPage'</span>]</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> page <span class="keyword">in</span> range(<span class="number">1</span>, total_page + <span class="number">1</span>):</span><br><span class="line"> page_url = url + <span class="string">"/pg"</span> + str(page) + <span class="string">"/"</span></span><br><span class="line"> <span class="keyword">print</span> page_url</span><br><span class="line"> parse_house_page(page_url)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">scapy_district</span><span class="params">(district_name)</span>:</span></span><br><span class="line"> url = <span class="string">"http://bj.lianjia.com/ershoufang/"</span> + district_name + <span class="string">"/"</span></span><br><span class="line"> <span class="keyword">print</span> url</span><br><span class="line"> parse_district(url)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">"__main__"</span>:</span><br><span class="line"> districts = [<span class="string">'dongcheng'</span>, <span class="string">'xicheng'</span>, <span class="string">'chaoyang'</span>, <span class="string">'haidian'</span>, <span class="string">'fengtai'</span>, <span class="string">'shijingshan'</span>, <span class="string">'tongzhou'</span>, <span class="string">'changping'</span>,</span><br><span class="line"> <span class="string">'daxing'</span>, <span class="string">'yizhuangkaifaqu'</span>, <span class="string">'shunyi'</span>, <span class="string">'fangshan'</span>, <span class="string">'mentougou'</span>, <span class="string">'pinggu'</span>, <span class="string">'huairou'</span>, <span class="string">'miyun'</span>,</span><br><span class="line"> <span class="string">'yanqing'</span>]</span><br><span class="line"></span><br><span class="line"> <span class="comment"># districts = ['dongcheng']</span></span><br><span class="line"></span><br><span class="line"> pool = ThreadPool(<span class="number">4</span>)</span><br><span class="line"> results = pool.map(scapy_district, districts)</span><br><span class="line"> <span class="comment"># close the pool and wait for the work to finish</span></span><br><span class="line"> pool.close()</span><br><span class="line"> pool.join()</span><br></pre></td></tr></table></figure>
<h2 id="u95EE_u9898"><a href="#u95EE_u9898" class="headerlink" title="问题"></a>问题</h2><p>在爬取的过程中会遇到链家防爬虫的阻碍,连一个区的二手房都抓取不完就不能再继续抓取了. 简单分析了一下, 链家会根据ip+有验证功能的时效性cookie来判断是不是恶意爬虫. 一旦发现了恶意流量,会返回一组图片,让人根据提示选择对应的图片,以此来判断是否是机器人. 如果选择正确会返回一个有效cookie,后续拿这个有效cookie就可以继续访问. </p>
<p>所以破解思路:</p>
<ol>
<li><p>使用代理来变换ip,由于正常的爬虫代理比较贵,只好寻求一些免费的代理,比如<a href="http://www.xicidaili.com/" target="_blank" rel="external">西刺</a>,有很多免费代理,但是大部分不能用,需要自己实现对网址解析,获取有效代理形成代理池,每次获取链家数据就使用代理池的代理进行访问;难点是实现自己的代理池,而且有效代理的数量和质量无法保证. </p>
</li>
<li><p>识别出图片,或者机器随机选择图片,瞎猫碰着死耗子的方法来获取有效地cookie</p>
</li>
<li><p>隔一段时间等cookie自动失效再做操作,缺点是保证不了数据的实时性和正确性</p>
</li>
</ol>
<p>目前可行的方案是方案1,下一步实现一个属于自己的免费代理池,Fighting!</p>
]]></content>
<summary type="html">
<p>Python在程序并行化方面多少有些声名狼藉, 基于<a href="https://segmentfault.com/a/1190000000414339">一行 Python 实现并行化</a>, 可以使用multiprocessing库中的map实现并兴化操作,简单快捷.</p>
</summary>
<category term="lianjia;python;multiprocess" scheme="http://melonqi.cn/tags/lianjia-python-multiprocess/"/>
</entry>
<entry>
<title>BeautifulSoup & regex 解析链家房源信息</title>
<link href="http://melonqi.cn/2017/08/20/BeatifulSoup-regex-%E8%A7%A3%E6%9E%90%E9%93%BE%E5%AE%B6%E6%88%BF%E6%BA%90%E4%BF%A1%E6%81%AF/"/>
<id>http://melonqi.cn/2017/08/20/BeatifulSoup-regex-解析链家房源信息/</id>
<published>2017-08-20T08:38:04.000Z</published>
<updated>2017-08-20T10:53:49.000Z</updated>
<content type="html"><![CDATA[<p>之前解析网页使用regex正则解析, 简单粗暴.但是遇到复杂的网页, 效率上就惨不忍睹了; 使用BeautifulSoup可以很优雅的解析html网页, 再加以regex的辅佐, 可以优雅高效的解析html, 从而获取我们想要的信息;<br><a id="more"></a></p>
<h2 id="u524D_u671F_u51C6_u5907"><a href="#u524D_u671F_u51C6_u5907" class="headerlink" title="前期准备"></a>前期准备</h2><ol>
<li>链家房屋网页url: “<a href="https://bj.lianjia.com/ershoufang/*.html" target="_blank" rel="external">https://bj.lianjia.com/ershoufang/*.html</a>“, 已”<a href="https://bj.lianjia.com/ershoufang/101101845590.html"为例" target="_blank" rel="external">https://bj.lianjia.com/ershoufang/101101845590.html"为例</a></li>
<li>解析工具:python, http请求库requests, json, demjson, bs4, re</li>
</ol>
<h2 id="requests__u5E93_u7B80_u5355_u4F7F_u7528"><a href="#requests__u5E93_u7B80_u5355_u4F7F_u7528" class="headerlink" title="requests 库简单使用"></a>requests 库简单使用</h2><p>python 的requests是一款十分优秀的http请求库. </p>
<p>我们的需求很简单:把相应url的html抓取下来.</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">r = requests.get(<span class="string">"https://bj.lianjia.com/ershoufang/101101845590.html"</span>)</span><br><span class="line">html = r.content</span><br></pre></td></tr></table></figure>
<p><code>r.content</code>就是url的放回内容, 并将返回的网页内容保存到<code>html</code>变量.</p>
<p>更多用法, 参考网页<a href="http://cn.python-requests.org/" target="_blank" rel="external">requests</a>.</p>
<h2 id="BeautifulSoup_u4F7F_u7528"><a href="#BeautifulSoup_u4F7F_u7528" class="headerlink" title="BeautifulSoup使用"></a>BeautifulSoup使用</h2><p>BeautifulSoup 是一个可以从HTML或XML文件中提取数据的Python库.它能够通过你喜欢的转换器实现惯用的文档导航, 查找, 修改文档的方式.Beautiful Soup会帮你节省数小时甚至数天的工作时间.<br><a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/" target="_blank" rel="external">BeautifulSoup文档</a></p>
<p>BeautifulSoup3 已经不再维护了,最好使用BeautifulSoup4.</p>
<p>BeautifulSoup支持多种引擎解析html,本文使用lxml.</p>
<p>BeautifulSoup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为4种: Tag , NavigableString , BeautifulSoup , Comment.</p>
<h3 id="Tag"><a href="#Tag" class="headerlink" title="Tag"></a>Tag</h3><p>Tag对象与XML或HTML原生文档中的tag相同.</p>
<p>Tag的元素可以name,attrs,string;</p>
<p>其中attrs是key value型,value往往是一个list,所以支持多值属性;</p>
<p>string是NavigableString,通过 unicode() 方法可以直接将 NavigableString 对象转换成Unicode字符串.如果想在Beautiful Soup之外使用 NavigableString 对象,需要调用 unicode() 方法,将该对象转换成普通的Unicode字符串,否则就算Beautiful Soup已方法已经执行结束,该对象的输出也会带有对象的引用地址.这样会浪费内存.</p>
<p><img src="http://7xpwmi.com1.z0.glb.clouddn.com/Tag.png" alt="Tag"></p>
<h3 id="BeautifulSoup"><a href="#BeautifulSoup" class="headerlink" title="BeautifulSoup"></a>BeautifulSoup</h3><p>BeautifulSoup 对象表示的是一个文档的全部内容.大部分时候,可以把它当作 Tag 对象.</p>
<h3 id="Comment"><a href="#Comment" class="headerlink" title="Comment"></a>Comment</h3><p><img src="http://7xpwmi.com1.z0.glb.clouddn.com/Comment.png" alt="Comment"><br>Comment 需要特别注意一下.</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">markup = <span class="string">"<b><!--Hey, buddy. Want to buy a used parser?--></b>"</span></span><br><span class="line">soup = BeautifulSoup(markup)</span><br><span class="line">comment = soup.b.string</span><br><span class="line">print(comment)</span><br><span class="line">print(type(comment))</span><br></pre></td></tr></table></figure>
<p>输出结果:</p>
<figure class="highlight ocaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"># u'Hey, buddy. <span class="type">Want</span> <span class="keyword">to</span> buy a used parser'</span><br><span class="line"># <<span class="keyword">class</span> <span class="symbol">'bs4</span>.element.<span class="type">Comment'</span>></span><br></pre></td></tr></table></figure>
<p>输出并不是最一开始预期的<code><!--Hey, buddy. Want to buy a used parser?--></code></p>
<h3 id="u904D_u5386_u6587_u6863_u6811"><a href="#u904D_u5386_u6587_u6863_u6811" class="headerlink" title="遍历文档树"></a>遍历文档树</h3><p><img src="http://7xpwmi.com1.z0.glb.clouddn.com/children.png" alt="children"></p>
<pre><code>.contents和children遍历Tag;
`head_tag.contents`
`[<title>The Dormouse's story</title>]`
contents 返回的是一个list;
也可以用过children遍历 `for child in head_tag.children`
其他的方法,.descendants, .parent 等参考官方文档
</code></pre><h3 id="u641C_u7D22_u6587_u6863_u6811"><a href="#u641C_u7D22_u6587_u6863_u6811" class="headerlink" title="搜索文档树"></a>搜索文档树</h3><p> find_all, find 方法</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">head_tag.find_all(<span class="string">'title'</span>)</span><br><span class="line"><span class="comment"># [<title>The Dormouse's story</title>]</span></span><br></pre></td></tr></table></figure>
<p> 搜索tag,根据tag的name返回结果;</p>
<ol>
<li>支持正则<code>find_all(re.compile("^b"))</code></li>
<li>数组查询<code>find_all(["a", "b"])</code></li>
<li>所有查询<code>find_all(True)</code></li>
<li>函数查找</li>
</ol>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">has_class_but_no_id</span><span class="params">(tag)</span>:</span></span><br><span class="line"> <span class="keyword">return</span> tag.has_attr(<span class="string">'class'</span>) <span class="keyword">and</span> <span class="keyword">not</span> tag.has_attr(<span class="string">'id'</span>)</span><br><span class="line"></span><br><span class="line">soup.find_all(has_class_but_no_id)</span><br></pre></td></tr></table></figure>
<p>find_all() 方法将返回文档中符合条件的所有tag,尽管有时候我们只想得到一个结果.比如文档中只有一个<body>标签,那么使用 find_all() 方法来查找<body>标签就不太合适, 使用 find_all 方法并设置 limit=1 参数不如直接使用 find() 方法.下面两行代码是等价的:</body></body></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">soup.find_all(<span class="string">'title'</span>, limit=<span class="number">1</span>)</span><br><span class="line"><span class="comment"># [<title>The Dormouse's story</title>]</span></span><br><span class="line"></span><br><span class="line">soup.find(<span class="string">'title'</span>)</span><br><span class="line"><span class="comment"># <title>The Dormouse's story</title></span></span><br></pre></td></tr></table></figure>
<p>唯一的区别是 find_all() 方法的返回结果是值包含一个元素的列表,而 find() 方法直接返回结果.</p>
<p>其他搜索方法: find_parents find_parent等</p>
<h2 id="Regex"><a href="#Regex" class="headerlink" title="Regex"></a>Regex</h2><p><a href="http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html" target="_blank" rel="external">python正则</a></p>
<h2 id="u5B9E_u9645_u4EE3_u7801"><a href="#u5B9E_u9645_u4EE3_u7801" class="headerlink" title="实际代码"></a>实际代码</h2><p>通过格式化分析链家房屋的html,会发现房屋信息出现了两个地方。</p>
<ol>
<li>body正文中,出现在不同的tag中,像面积,价钱出现在不同的tag中,可以使用遍历tag的children的方法定位到信息具体;缺点是代码量比较大,一旦html格式变化了,很容易就解析不出来想要的信息,不够灵活;</li>
<li>出现在script脚本中,作为script的函数参数上传;可以使用遍历children或find_all的方法;<br>具体的script tag:</li>
</ol>
<figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br></pre></td><td class="code"><pre><span class="line"><script></span><br><span class="line"> <span class="built_in">require</span>([<span class="string">'ershoufang/sellDetail/detailV3'</span>], <span class="function"><span class="keyword">function</span> (<span class="params">init</span>) </span>{</span><br><span class="line"> init({</span><br><span class="line"> ucid: <span class="string">''</span>,</span><br><span class="line"> houseType: <span class="string">'普通住宅'</span>,</span><br><span class="line"> isUnique: <span class="string">'唯一住宅'</span>,</span><br><span class="line"> registerTime: <span class="string">'满五年'</span>,</span><br><span class="line"> area: <span class="string">'70.07'</span>,</span><br><span class="line"> totalPrice: <span class="string">'435'</span>,</span><br><span class="line"> price: <span class="string">'62081'</span>,</span><br><span class="line"> houseId: <span class="string">'101101845590'</span>,</span><br><span class="line"> resblockId: <span class="string">'1111027381003'</span>,</span><br><span class="line"> resblockName: <span class="string">'新龙城'</span>,</span><br><span class="line"> isRemove: <span class="number">1</span>,</span><br><span class="line"> defaultImg: <span class="string">'https://s1.ljcdn.com/feroot/pc/asset/img/blank.gif?_v=20170817190344'</span>,</span><br><span class="line"> defaultBrokerIcon: <span class="string">'https://s1.ljcdn.com/feroot/pc/asset/img/blank.gif?_v=20170817190344'</span>,</span><br><span class="line"> resblockPosition: <span class="string">'116.330198,40.074228'</span>,</span><br><span class="line"> cityId: <span class="string">'110000'</span>,</span><br><span class="line"> changedate: [<span class="number">1</span>, <span class="number">3</span>, <span class="number">4</span>, <span class="number">5</span>, <span class="number">3</span>],</span><br><span class="line"> changenum: [<span class="number">123</span>, <span class="number">567</span>, <span class="number">232</span>, <span class="number">347</span>, <span class="number">122</span>],</span><br><span class="line"> diamondAgent: [],</span><br><span class="line"> diamondAgentPhone: [],</span><br><span class="line"> agentInfo: {</span><br><span class="line"> <span class="string">"1000000010091451"</span>: {</span><br><span class="line"> <span class="string">"ucid"</span>: <span class="string">"1000000010091451"</span>,</span><br><span class="line"> <span class="string">"isQualify"</span>: <span class="literal">true</span>,</span><br><span class="line"> <span class="string">"reason"</span>: <span class="string">"\u9700\u8981\u4e86\u89e3\u672c\u623f\u7684\u4efb\u4f55\u60c5\u51b5\uff0c\u6b22\u8fce\u968f\u65f6\u8054\u7cfb\u6211"</span>,</span><br><span class="line"> <span class="string">"name"</span>: <span class="string">"\u5434\u4e16\u5175"</span>,</span><br><span class="line"> <span class="string">"photo_url"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/usercenter\/images\/uc_ehr_avatar\/c0eb44d2-ac7c-4a23-90aa-72764f961300.jpg"</span>,</span><br><span class="line"> <span class="string">"agent_url"</span>: <span class="string">"https:\/\/dianpu.lianjia.com\/1000000010091451"</span>,</span><br><span class="line"> <span class="string">"agent_level"</span>: <span class="string">"\u8d44\u6df1\u7ecf\u7eaa\u4eba"</span>,</span><br><span class="line"> <span class="string">"feedbackGoodRate"</span>: <span class="string">"95%"</span>,</span><br><span class="line"> <span class="string">"totalCommentScore"</span>: <span class="string">"4.9"</span>,</span><br><span class="line"> <span class="string">"commentCount"</span>: <span class="string">"167"</span></span><br><span class="line"> }</span><br><span class="line"> },</span><br><span class="line"> hasDaikan: <span class="literal">true</span>,</span><br><span class="line"> uniqueAgent: <span class="literal">false</span>,</span><br><span class="line"> showCart: <span class="string">''</span>,</span><br><span class="line"> hasFangjia: <span class="literal">false</span>,</span><br><span class="line"> test_400_hide: <span class="string">''</span>,</span><br><span class="line"> newTax: <span class="literal">true</span>,</span><br><span class="line"> uuid: <span class="string">'e933ba1f-fa23-4eff-8ec9-bb1ea8f46a54'</span>,</span><br><span class="line"> loadingImg: <span class="string">'https://s1.ljcdn.com/feroot/pc/asset/ershoufang/sellDetail/img/loading.gif?_v=20170817190344'</span>,</span><br><span class="line"> qrImg: <span class="string">'//ajax.api.lianjia.com/qr/getDownloadQr'</span>,</span><br><span class="line"> title: <span class="string">'新龙城 南向 采光好 满五年 中间楼层 诚心出售_北京回龙观新龙城二手房推荐'</span>,</span><br><span class="line"> images: [{</span><br><span class="line"> <span class="string">"code"</span>: <span class="number">1</span>,</span><br><span class="line"> <span class="string">"id"</span>: <span class="number">61005103321762</span>,</span><br><span class="line"> <span class="string">"type"</span>: <span class="string">"\u5385"</span>,</span><br><span class="line"> <span class="string">"uri"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/110000-inspection\/c649ea1d-a0ca-4ca9-bb5a-141776122c85.jpg"</span>,</span><br><span class="line"> <span class="string">"url"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/110000-inspection\/c649ea1d-a0ca-4ca9-bb5a-141776122c85.jpg.710x400.jpg"</span>,</span><br><span class="line"> <span class="string">"isHead"</span>: <span class="number">1</span></span><br><span class="line"> }, {</span><br><span class="line"> <span class="string">"code"</span>: <span class="number">1</span>,</span><br><span class="line"> <span class="string">"id"</span>: <span class="number">61005103321763</span>,</span><br><span class="line"> <span class="string">"type"</span>: <span class="string">"\u5385"</span>,</span><br><span class="line"> <span class="string">"uri"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/110000-inspection\/8eaee9ad-1ff8-4b73-90ec-9964d2a71208.jpg"</span>,</span><br><span class="line"> <span class="string">"url"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/110000-inspection\/8eaee9ad-1ff8-4b73-90ec-9964d2a71208.jpg.710x400.jpg"</span></span><br><span class="line"> }, {</span><br><span class="line"> <span class="string">"code"</span>: <span class="number">2</span>,</span><br><span class="line"> <span class="string">"id"</span>: <span class="number">61005103321759</span>,</span><br><span class="line"> <span class="string">"type"</span>: <span class="string">"\u5367\u5ba4"</span>,</span><br><span class="line"> <span class="string">"uri"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/110000-inspection\/ae02bf61-fa1e-48b6-aad7-6d23ef8aa361.jpg"</span>,</span><br><span class="line"> <span class="string">"url"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/110000-inspection\/ae02bf61-fa1e-48b6-aad7-6d23ef8aa361.jpg.710x400.jpg"</span></span><br><span class="line"> }, {</span><br><span class="line"> <span class="string">"code"</span>: <span class="number">99</span>,</span><br><span class="line"> <span class="string">"id"</span>: <span class="number">1</span>,</span><br><span class="line"> <span class="string">"type"</span>: <span class="string">"\u6237\u578b\u56fe"</span>,</span><br><span class="line"> <span class="string">"uri"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/x-se\/\/hdic-frame\/3fb0661e-08cd-441d-afdb-9d85e8b35d24.png"</span>,</span><br><span class="line"> <span class="string">"url"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/x-se\/\/hdic-frame\/3fb0661e-08cd-441d-afdb-9d85e8b35d24.png.533x400.jpg"</span></span><br><span class="line"> }, {</span><br><span class="line"> <span class="string">"code"</span>: <span class="number">3</span>,</span><br><span class="line"> <span class="string">"id"</span>: <span class="number">61005103321761</span>,</span><br><span class="line"> <span class="string">"type"</span>: <span class="string">"\u53a8\u623f"</span>,</span><br><span class="line"> <span class="string">"uri"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/110000-inspection\/dfc1c2fb-8819-49d6-8d0c-5bd72687c96c.jpg"</span>,</span><br><span class="line"> <span class="string">"url"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/110000-inspection\/dfc1c2fb-8819-49d6-8d0c-5bd72687c96c.jpg.710x400.jpg"</span></span><br><span class="line"> }, {</span><br><span class="line"> <span class="string">"code"</span>: <span class="number">4</span>,</span><br><span class="line"> <span class="string">"id"</span>: <span class="number">61005103321760</span>,</span><br><span class="line"> <span class="string">"type"</span>: <span class="string">"\u536b\u751f\u95f4"</span>,</span><br><span class="line"> <span class="string">"uri"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/110000-inspection\/ce458ac9-65ef-420e-baed-61375e7bf135.jpg"</span>,</span><br><span class="line"> <span class="string">"url"</span>: <span class="string">"https:\/\/image1.ljcdn.com\/110000-inspection\/ce458ac9-65ef-420e-baed-61375e7bf135.jpg.710x400.jpg"</span></span><br><span class="line"> }]</span><br><span class="line"> });</span><br><span class="line"> });</span><br><span class="line"> <span class="built_in">require</span>([<span class="string">'common/jquery.popups'</span>], <span class="function"><span class="keyword">function</span> (<span class="params"></span>) </span>{});</span><br><span class="line"> <span class="built_in">require</span>([<span class="string">'common/jquery.fly'</span>], <span class="function"><span class="keyword">function</span> (<span class="params"></span>) </span>{});</span><br><span class="line"> <span class="built_in">require</span>([<span class="string">'ershoufang/sellDetail/comp/imgZoom'</span>], <span class="function"><span class="keyword">function</span> (<span class="params"></span>) </span>{});</span><br><span class="line"> <span class="built_in">require</span>([<span class="string">'common/requestAnimationFrame'</span>], <span class="function"><span class="keyword">function</span> (<span class="params"></span>) </span>{});</span><br><span class="line"><span class="xml"><span class="tag"></<span class="title">script</span>></span></span></span><br></pre></td></tr></table></figure>
<p>通过分析,可以看到房屋的信息出现在<code>require(['ershoufang/sellDetail/detailV3'], function (init) {init({*});}</code>中,这个时候需要正则提取房屋的json信息;</p>
<p>提取出来的信息不是合格的json串,原因是key没有引号,而且value有单引号包着的,需要使用demjson进行修正;将结果保存json串打印;</p>
<p>完整代码:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> bs4</span><br><span class="line"><span class="keyword">import</span> requests</span><br><span class="line"><span class="keyword">import</span> re</span><br><span class="line"><span class="keyword">import</span> demjson</span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">"__main__"</span>:</span><br><span class="line"> r = requests.get(<span class="string">"https://bj.lianjia.com/ershoufang/101101845590.html"</span>)</span><br><span class="line"> soup = bs4.BeautifulSoup(r.content, <span class="string">"lxml"</span>)</span><br><span class="line"> pattern = re.compile(<span class="string">r'init\((.*?)\);'</span>, re.DOTALL)</span><br><span class="line"> <span class="keyword">for</span> script <span class="keyword">in</span> soup.find_all(<span class="string">'script'</span>):</span><br><span class="line"> <span class="keyword">if</span> type(script.string) == bs4.element.NavigableString:</span><br><span class="line"> <span class="keyword">if</span> script.text.find(<span class="string">"sellDetail"</span>) >= <span class="number">0</span>:</span><br><span class="line"> match = pattern.search(script.text)</span><br><span class="line"> <span class="keyword">if</span> match:</span><br><span class="line"> json_dict = demjson.decode(match.group(<span class="number">1</span>))</span><br><span class="line"> house_info = json.dumps(json_dict, indent=<span class="number">4</span>)</span><br><span class="line"> <span class="keyword">print</span> house_info</span><br></pre></td></tr></table></figure>
]]></content>
<summary type="html">
<p>之前解析网页使用regex正则解析, 简单粗暴.但是遇到复杂的网页, 效率上就惨不忍睹了; 使用BeautifulSoup可以很优雅的解析html网页, 再加以regex的辅佐, 可以优雅高效的解析html, 从而获取我们想要的信息;<br>
</summary>
<category term="BeautifulSoup;regex" scheme="http://melonqi.cn/tags/BeautifulSoup-regex/"/>
</entry>
<entry>
<title>Just Do IT</title>
<link href="http://melonqi.cn/2017/08/10/Just-Do-IT/"/>
<id>http://melonqi.cn/2017/08/10/Just-Do-IT/</id>
<published>2017-08-10T12:56:45.000Z</published>
<updated>2017-08-10T13:13:30.000Z</updated>
<content type="html"><![CDATA[<p>工作了,好久没有写过博客了,感觉自己的学习有点停滞不前了,再加上KPI不是自己想象中的那么理想,所以应该静下来心来学一些想学的,做一些想做的事情。</p>
<p>一直有一个想法,想实现这样的功能:虽然现在各种限购,房事依然是人们重点关注的事情。想出一个预测性质的软件,能够根据历史数据,对某个地区或者某个小区的房价做一个预测,尽管会受政策影响,但是能够作为机器学习的练手项目还是足够了。</p>
<p>项目打算分几个阶段走:</p>
<ol>
<li><p>爬虫,从链家或者我爱我家爬取相关数据,包含现有房价,房屋信息,历史成交价格等。 可能会遇到的问题:如何破解防爬虫限制,要保存哪些信息,信息存储介质,如何对数据进行有效分类等。使用语言暂定python,数据库暂定mysql。<br> 预期完成时间1到2个月。</p>
</li>
<li><p>整理数据,调研,能够形成热力图,第一阶段使用现有分析软件生成热力图,第二阶段使用相关服务来动态更新热力图。<br> 后台程序暂定使用go,python<br> 预期完成时间2到3个月</p>
</li>
<li><p>机器学习分析数据,给出一套房的房价预测,或者小区的均价预测。<br> 可以尝试使用学到的机器学习算法来分析,数据不具有参考价值。<br> 预期完成时间1到2个月</p>
</li>
</ol>
<p>项目期间每周整理一份所学的知识以及项目进度,定时发博客监督自己。To improve myself, just do it.</p>
]]></content>
<summary type="html">
<p>工作了,好久没有写过博客了,感觉自己的学习有点停滞不前了,再加上KPI不是自己想象中的那么理想,所以应该静下来心来学一些想学的,做一些想做的事情。</p>
<p>一直有一个想法,想实现这样的功能:虽然现在各种限购,房事依然是人们重点关注的事情。想出一个预测性质的软件,
</summary>
<category term="机器学习;爬虫;热力图" scheme="http://melonqi.cn/tags/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0-%E7%88%AC%E8%99%AB-%E7%83%AD%E5%8A%9B%E5%9B%BE/"/>
</entry>
<entry>
<title>C++多线程参数传递思考</title>
<link href="http://melonqi.cn/2016/11/03/Cplusplus_thread/"/>
<id>http://melonqi.cn/2016/11/03/Cplusplus_thread/</id>
<published>2016-11-03T04:08:22.000Z</published>
<updated>2016-11-03T04:29:50.000Z</updated>
<content type="html"><![CDATA[<p>在C++多线程开发过程中难免会遇到很多意想不到的问题,最近遇到了参数传递的问题,总结一下。<br><a id="more"></a></p>
<h2 id="u53C2_u6570_u4F20_u9012_u5982_u679C_u662F_u4E34_u65F6_u53D8_u91CF_uFF0C_u5C31_u7528malloc_u6216new_u6765_u7533_u8BF7_u53D8_u91CF_uFF0C_u4E0D_u7136_u5728_u7EBF_u7A0B_u8FD0_u884C_u65F6_uFF0C_u4E34_u65F6_u53D8_u91CF_u53EF_u80FD_u4F1A_u88AB_u91CA_u653E_uFF0C_u4F46_u662F_u7EBF_u7A0B_u91CC_u9762_u7684_u662F_u91CE_u6307_u9488_u3002"><a href="#u53C2_u6570_u4F20_u9012_u5982_u679C_u662F_u4E34_u65F6_u53D8_u91CF_uFF0C_u5C31_u7528malloc_u6216new_u6765_u7533_u8BF7_u53D8_u91CF_uFF0C_u4E0D_u7136_u5728_u7EBF_u7A0B_u8FD0_u884C_u65F6_uFF0C_u4E34_u65F6_u53D8_u91CF_u53EF_u80FD_u4F1A_u88AB_u91CA_u653E_uFF0C_u4F46_u662F_u7EBF_u7A0B_u91CC_u9762_u7684_u662F_u91CE_u6307_u9488_u3002" class="headerlink" title="参数传递如果是临时变量,就用malloc或new来申请变量,不然在线程运行时,临时变量可能会被释放,但是线程里面的是野指针。"></a>参数传递如果是临时变量,就用malloc或new来申请变量,不然在线程运行时,临时变量可能会被释放,但是线程里面的是野指针。</h2><p>举例:</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><pthread.h></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><vector></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> *<span class="title">func</span><span class="params">(<span class="keyword">void</span> *args)</span></span><br><span class="line"></span>{</span><br><span class="line"> <span class="built_in">vector</span><<span class="keyword">int</span>> *vec = (<span class="built_in">vector</span><<span class="keyword">int</span>>*) args;</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"vec size:"</span><<vec->size()<<endl;</span><br><span class="line"> <span class="keyword">for</span>(<span class="built_in">vector</span><<span class="keyword">int</span>>::iterator it = vec->begin();it!=vec->end();it++)</span><br><span class="line"> {</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"输出 "</span><<*it<<endl;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span><br><span class="line"></span>{</span><br><span class="line"> <span class="keyword">pthread_t</span> thread_ids[<span class="number">2</span>];</span><br><span class="line"> <span class="keyword">for</span>(<span class="keyword">int</span> i=<span class="number">0</span>;i<<span class="number">2</span>;i++)</span><br><span class="line"> {</span><br><span class="line"> <span class="built_in">vector</span><<span class="keyword">int</span>> vec;<span class="comment">//临时变量出循环会被释放</span></span><br><span class="line"> <span class="keyword">for</span>(<span class="keyword">int</span> j=<span class="number">0</span>;j<<span class="number">3</span>;j++)</span><br><span class="line"> {</span><br><span class="line"> vec.push_back(i*j);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">int</span> ret=pthread_create(&thread_ids[i],<span class="literal">NULL</span>,func,(<span class="keyword">void</span>*)&vec);</span><br><span class="line"> <span class="keyword">if</span>(ret!=<span class="number">0</span>)</span><br><span class="line"> {</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"Create Thread Fail!"</span><<endl;</span><br><span class="line"> <span class="keyword">return</span> -<span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>(<span class="keyword">int</span> i=<span class="number">0</span>;i<<span class="number">2</span>;i++)</span><br><span class="line"> {</span><br><span class="line"> pthread_join(thread_ids[i],<span class="literal">NULL</span>);<span class="comment">//等待线程结束</span></span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>结果输出<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/pthread0.png" alt="结果">,与预期相差很多。</p>
<p>原因分析:<br><code>vec</code>是临时变量,作为参数传递给线程。<code>pthread_create</code>会立刻返回,第一层<code>for</code>循环会很快结束,<code>vec</code>会被释放,此时线程拿到的参数是个野指针,输出结果就不是预期了。</p>
<p>解决办法,临时变量<code>vec</code>是<code>new</code>或者<code>malloc</code>出来,但是要注意内存泄漏的问题。</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><pthread.h></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><vector></span></span></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> *<span class="title">func</span><span class="params">(<span class="keyword">void</span> *args)</span></span><br><span class="line"></span>{</span><br><span class="line"> <span class="built_in">vector</span><<span class="keyword">int</span>> *vec = (<span class="built_in">vector</span><<span class="keyword">int</span>>*) args;</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"vec size:"</span><<vec->size()<<endl;</span><br><span class="line"> <span class="keyword">for</span>(<span class="built_in">vector</span><<span class="keyword">int</span>>::iterator it = vec->begin();it!=vec->end();it++)</span><br><span class="line"> {</span><br><span class="line"> <span class="built_in">cout</span><<*it<<endl;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span><br><span class="line"></span>{</span><br><span class="line"> <span class="keyword">pthread_t</span> thread_ids[<span class="number">2</span>];</span><br><span class="line"> <span class="keyword">for</span>(<span class="keyword">int</span> i=<span class="number">0</span>;i<<span class="number">2</span>;i++)</span><br><span class="line"> {</span><br><span class="line"> <span class="comment">//new出来的变量出循环不会被释放,但是什么时候delete?</span></span><br><span class="line"> <span class="built_in">vector</span><<span class="keyword">int</span>> *vec = <span class="keyword">new</span> <span class="built_in">vector</span><<span class="keyword">int</span>>();</span><br><span class="line"> </span><br><span class="line"> <span class="keyword">for</span>(<span class="keyword">int</span> j=<span class="number">0</span>;j<<span class="number">3</span>;j++)</span><br><span class="line"> {</span><br><span class="line"> vec->push_back(i*j);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">int</span> ret=pthread_create(&thread_ids[i],<span class="literal">NULL</span>,func,(<span class="keyword">void</span>*)vec);</span><br><span class="line"> <span class="keyword">if</span>(ret!=<span class="number">0</span>)</span><br><span class="line"> {</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"Create Thread Fail!"</span><<endl;</span><br><span class="line"> <span class="keyword">return</span> -<span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>(<span class="keyword">int</span> i=<span class="number">0</span>;i<<span class="number">2</span>;i++)</span><br><span class="line"> {</span><br><span class="line"> pthread_join(thread_ids[i],<span class="literal">NULL</span>);<span class="comment">//等待线程结束</span></span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>输出结果如下图,是正确的:</p>
<p><img src="http://7xpwmi.com1.z0.glb.clouddn.com/pthread1.png" alt=""></p>
<h2 id="phread_join_u4F1A_u7B49_u7EBF_u7A0B_u7ED3_u675F_uFF0C_u6B63_u786E_u4F7F_u7528pthread_join_u53EF_u4EE5_u9632_u6B62_u4E34_u65F6_u53D8_u91CF_u88AB_u63D0_u524D_u91CA_u653E_u3002"><a href="#phread_join_u4F1A_u7B49_u7EBF_u7A0B_u7ED3_u675F_uFF0C_u6B63_u786E_u4F7F_u7528pthread_join_u53EF_u4EE5_u9632_u6B62_u4E34_u65F6_u53D8_u91CF_u88AB_u63D0_u524D_u91CA_u653E_u3002" class="headerlink" title="phread_join会等线程结束,正确使用pthread_join可以防止临时变量被提前释放。"></a><code>phread_join</code>会等线程结束,正确使用<code>pthread_join</code>可以防止临时变量被提前释放。</h2><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><pthread.h></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><vector></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> *<span class="title">func</span><span class="params">(<span class="keyword">void</span> *args)</span> </span>{</span><br><span class="line"> <span class="built_in">vector</span><<span class="keyword">int</span>> *vec = (<span class="built_in">vector</span><<span class="keyword">int</span>> *) args;</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"vec size:"</span> << vec->size() << endl;</span><br><span class="line"> <span class="keyword">for</span> (<span class="built_in">vector</span><<span class="keyword">int</span>>::iterator it = vec->begin(); it != vec->end(); it++) {</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"输出 "</span> << *it << endl;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">test</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">pthread_t</span> thread_id;</span><br><span class="line"></span><br><span class="line"> <span class="built_in">vector</span><<span class="keyword">int</span>> vec;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> j = <span class="number">0</span>; j < <span class="number">3</span>; j++) {</span><br><span class="line"> vec.push_back(j);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">int</span> ret = pthread_create(&thread_id, <span class="literal">NULL</span>, func, (<span class="keyword">void</span> *) &vec);</span><br><span class="line"> <span class="keyword">if</span> (ret != <span class="number">0</span>) {</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"Create Thread Fail!"</span> << endl;</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> test();</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p><code>vec</code>离开<code>test()</code>函数也会被释放掉,导致程序出现不可预期的错误。</p>
<p>如果在<code>test()</code>中加上<code>pthread_join</code>函数,在<code>test()</code>中等待线程结束,此时<code>vec</code>一直有效,输出结果就是正确的。</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><pthread.h></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><vector></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> *<span class="title">func</span><span class="params">(<span class="keyword">void</span> *args)</span> </span>{</span><br><span class="line"> <span class="built_in">vector</span><<span class="keyword">int</span>> *vec = (<span class="built_in">vector</span><<span class="keyword">int</span>> *) args;</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"vec size:"</span> << vec->size() << endl;</span><br><span class="line"> <span class="keyword">for</span> (<span class="built_in">vector</span><<span class="keyword">int</span>>::iterator it = vec->begin(); it != vec->end(); it++) {</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"输出 "</span> << *it << endl;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">test</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">pthread_t</span> thread_id;</span><br><span class="line"></span><br><span class="line"> <span class="built_in">vector</span><<span class="keyword">int</span>> vec;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> j = <span class="number">0</span>; j < <span class="number">3</span>; j++) {</span><br><span class="line"> vec.push_back(j);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">int</span> ret = pthread_create(&thread_id, <span class="literal">NULL</span>, func, (<span class="keyword">void</span> *) &vec);</span><br><span class="line"> <span class="keyword">if</span> (ret != <span class="number">0</span>) {</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"Create Thread Fail!"</span> << endl;</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> pthread_join(thread_id,<span class="literal">NULL</span>);<span class="comment">//等待线程结束,保证临时变量vec在线程执行中不会被test提前释放</span></span><br><span class="line"></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> test();</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h2 id="u8C28_u614E_u4F7F_u7528STL_u4F5C_u4E3A_u7EBF_u7A0B_u53C2_u6570"><a href="#u8C28_u614E_u4F7F_u7528STL_u4F5C_u4E3A_u7EBF_u7A0B_u53C2_u6570" class="headerlink" title="谨慎使用STL作为线程参数"></a>谨慎使用STL作为线程参数</h2><p>在使用<code>vector</code>等STL时,如果传递给线程参数的其中的某个元素地址,由于<code>vector</code>会根据元素多少动态申请内存,之前传递给线程的地址就有可能会失效,导致了非预期效果出现。</p>
<p>看下面一段代码:</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><pthread.h></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><vector></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><zconf.h></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> *<span class="title">func</span><span class="params">(<span class="keyword">void</span> *args)</span> </span>{</span><br><span class="line"> <span class="keyword">int</span> *val = (<span class="keyword">int</span> *)args;</span><br><span class="line"> sleep(<span class="number">1</span>);<span class="comment">//使效果更明显</span></span><br><span class="line"> <span class="built_in">cout</span><<*val<<endl;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"></span><br><span class="line"> <span class="keyword">pthread_t</span> thread_ids[<span class="number">5</span>];</span><br><span class="line"></span><br><span class="line"> <span class="built_in">vector</span><<span class="keyword">int</span>> vec;</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < <span class="number">5</span>; i++) {</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"capactity:"</span><<vec.capacity()<<endl;</span><br><span class="line"> vec.push_back(i);</span><br><span class="line"> <span class="keyword">int</span> ret = pthread_create(&thread_ids[i], <span class="literal">NULL</span>, func, (<span class="keyword">void</span> *) &(vec[i]));</span><br><span class="line"> <span class="keyword">if</span> (ret != <span class="number">0</span>) {</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"Create Thread Fail!"</span> << endl;</span><br><span class="line"> <span class="keyword">return</span> -<span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>(<span class="keyword">int</span> i=<span class="number">0</span>;i<<span class="number">5</span>;i++)</span><br><span class="line"> {</span><br><span class="line"> pthread_join(thread_ids[i], <span class="literal">NULL</span>);<span class="comment">//等待线程结束</span></span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>输出结果:</p>
<p><img src="http://7xpwmi.com1.z0.glb.clouddn.com/pthread3.png" alt=""></p>
<p>可以发现<code>1</code>没有输出。</p>
<p>这个错误比较隐蔽:当<code>vec</code>中只有<code>0,1</code>的时候,<code>capacity</code>大小为2,此时空间已经满了。<code>vec.push_back(2)</code>,会重新申请空间,此时之前传递给<code>pthread_create</code>的<code>vec[1]</code>的地址就会失效,此时打印就会出错。</p>
<p>也会有人问,如果传递的迭代器呢,会不会出现这个问题?<br>在STL源码剖析这本书中,提到过对<code>vector</code>的任何操作,一旦引起空间重新配置,指向原<code>vector</code>的所有迭代器都失效了。所以传递迭代器也会出现这个问题。这个问题也引申出来了,在使用<code>vector</code>作为外层循环的时候不要在循环中会引起空间重新配置的操作。</p>
]]></content>
<summary type="html">
<p>在C++多线程开发过程中难免会遇到很多意想不到的问题,最近遇到了参数传递的问题,总结一下。<br>
</summary>
<category term="C++" scheme="http://melonqi.cn/tags/C/"/>
<category term="STL" scheme="http://melonqi.cn/tags/STL/"/>
<category term="pthread" scheme="http://melonqi.cn/tags/pthread/"/>
</entry>
<entry>
<title>解析xml,boost和python</title>
<link href="http://melonqi.cn/2016/07/12/boost-xml/"/>
<id>http://melonqi.cn/2016/07/12/boost-xml/</id>
<published>2016-07-12T14:41:33.000Z</published>
<updated>2016-07-20T01:49:02.000Z</updated>
<content type="html"><![CDATA[<p>XML(eXtensible Markup Language的缩写),意为可扩展的标记语言。与HTML相似,XML是一种显示数据的标记语言,它能使数据通过网络无障碍地进行传输,并显示在用户的浏览器上。可以简单理解为,XML是用于浏览器传输数据的一种数据格式。XML也可以用来保存数据,供其他系统的使用,虽然笔者更喜欢使用json来保存和传输数据。</p>
<p>本文分别介绍了使用python和C++的boost库来解析xml格式的文件。<br><a id="more"></a></p>
<h2 id="XML_u793A_u4F8B"><a href="#XML_u793A_u4F8B" class="headerlink" title="XML示例"></a>XML示例</h2><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="pi"><?xml version="1.0" encoding="iso-8859-1"?></span></span><br><span class="line"><span class="tag"><<span class="title">bookstore</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">book</span> <span class="attribute">category</span>=<span class="value">"COOKING"</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">title</span> <span class="attribute">lang</span>=<span class="value">"en"</span>></span>Everyday Italian<span class="tag"></<span class="title">title</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">author</span>></span>Giada De Laurentiis<span class="tag"></<span class="title">author</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">year</span>></span>2005<span class="tag"></<span class="title">year</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">price</span>></span>30.00<span class="tag"></<span class="title">price</span>></span></span><br><span class="line"> <span class="tag"></<span class="title">book</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">book</span> <span class="attribute">category</span>=<span class="value">"CHILDREN"</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">title</span> <span class="attribute">lang</span>=<span class="value">"en"</span>></span>Harry Potter<span class="tag"></<span class="title">title</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">author</span>></span>J K. Rowling<span class="tag"></<span class="title">author</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">year</span>></span>2005<span class="tag"></<span class="title">year</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">price</span>></span>29.99<span class="tag"></<span class="title">price</span>></span></span><br><span class="line"> <span class="tag"></<span class="title">book</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">book</span> <span class="attribute">category</span>=<span class="value">"WEB"</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">title</span> <span class="attribute">lang</span>=<span class="value">"en"</span>></span>Learning XML<span class="tag"></<span class="title">title</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">author</span>></span>Erik T. Ray<span class="tag"></<span class="title">author</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">year</span>></span>2003<span class="tag"></<span class="title">year</span>></span></span><br><span class="line"> <span class="tag"><<span class="title">price</span>></span>39.95<span class="tag"></<span class="title">price</span>></span></span><br><span class="line"> <span class="tag"></<span class="title">book</span>></span></span><br><span class="line"><span class="tag"></<span class="title">bookstore</span>></span></span><br></pre></td></tr></table></figure>
<p>第一行是一个 XML 声明。这是文件的可选部分,且如果出现,必须出现在文件的开头,它将文件识别为 XML 文件。可以将这个声明简单地写成 <code><?xml?></code>,或包含 XML 版本(<code><?xml version="1.0"?></code>),甚至包含字符编码,比如针对 Unicode 的 <code><?xml version="1.0" encoding="utf-8"?></code>。</p>
<p>该xml的整体结构如下图:<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/TREE.gif" alt="树结构"></p>
<h2 id="python_u89E3_u6790xml"><a href="#python_u89E3_u6790xml" class="headerlink" title="python解析xml"></a>python解析xml</h2><p>python解析xml的方法有很多三种:SAX,DOM,以及ElementTree。这里介绍使用DOM来解析xml,其他的以后作为扩展内容来介绍。</p>
<p>二话不说,直接上代码。</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> xml.dom.minidom</span><br><span class="line"></span><br><span class="line">xml_src = <span class="string">"/Users/qiguanjie/Documents/bookstore.xml"</span></span><br><span class="line"></span><br><span class="line">xmldoc = xml.dom.minidom.parse(xml_src)</span><br><span class="line"></span><br><span class="line"><span class="comment"># get root</span></span><br><span class="line">bookstore = xmldoc.documentElement</span><br><span class="line"></span><br><span class="line">books = bookstore.getElementsByTagName(<span class="string">"book"</span>)</span><br><span class="line">print(<span class="string">"There are %d books!"</span> % (len(books)))</span><br><span class="line"></span><br><span class="line">i = <span class="number">1</span></span><br><span class="line"><span class="keyword">for</span> book <span class="keyword">in</span> books:</span><br><span class="line"> print(<span class="string">"Book %d"</span> % (i))</span><br><span class="line"> <span class="keyword">if</span> book.hasAttribute(<span class="string">"category"</span>):</span><br><span class="line"> print(<span class="string">"Category: "</span> + book.getAttribute(<span class="string">"category"</span>))</span><br><span class="line"></span><br><span class="line"> title = book.getElementsByTagName(<span class="string">"title"</span>)[<span class="number">0</span>]</span><br><span class="line"> print(<span class="string">"Title: "</span>+title.childNodes[<span class="number">0</span>].data)</span><br><span class="line"></span><br><span class="line"> print(<span class="string">"Language: "</span>+title.getAttribute(<span class="string">"lang"</span>))</span><br><span class="line"></span><br><span class="line"> author = book.getElementsByTagName(<span class="string">"author"</span>)[<span class="number">0</span>]</span><br><span class="line"> print(<span class="string">"Author: "</span>+author.childNodes[<span class="number">0</span>].data)</span><br><span class="line"></span><br><span class="line"> year = book.getElementsByTagName(<span class="string">"year"</span>)[<span class="number">0</span>]</span><br><span class="line"> print(<span class="string">"Year: "</span>+year.childNodes[<span class="number">0</span>].data)</span><br><span class="line"></span><br><span class="line"> price = book.getElementsByTagName(<span class="string">"price"</span>)[<span class="number">0</span>]</span><br><span class="line"> print(<span class="string">"Price: "</span>+price.childNodes[<span class="number">0</span>].data)</span><br><span class="line"></span><br><span class="line"> i+=<span class="number">1</span></span><br></pre></td></tr></table></figure>
<p>代码逻辑比较简单,一般的流程是获取root,然后获取子节点,获取属性等。如果想获取子节点的子节点的数据或属性,则子节点就作为root,递归方式来获取信息。有没有觉得很熟悉,这个就是树结构,由此可见树结构在编程中有很多变种,一通百通。</p>
<p>运行结果:<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/parse_bookstore.png" alt="结果"></p>
<h2 id="boost_u89E3_u6790xml"><a href="#boost_u89E3_u6790xml" class="headerlink" title="boost解析xml"></a>boost解析xml</h2><p>实际编程中遇到过,xml文件存放了一些配置文件,而程序是C/C++写的,就没有像python那么方便了。C语言版的解析有tinyxml,不依赖于第三方库。C++版的可使用boost,当然这个就需要安装boost这个庞大的库来作为支持。可根据实际需要来选择相应的工具。</p>
<p>boost使用property_tree来解析xml。property_tree是一个很强大的工具,常用来解析配置文件等。</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><boost/property_tree/ptree.hpp></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><boost/property_tree/xml_parser.hpp></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><boost/foreach.hpp></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"><span class="keyword">namespace</span> pt = boost::property_tree;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span> </span>{</span><br><span class="line"> <span class="keyword">const</span> <span class="built_in">string</span> file_name = <span class="string">"/Users/qiguanjie/Documents/bookstore.xml"</span>;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Create empty property tree object</span></span><br><span class="line"> pt::ptree tree;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Parse the XML into the property tree.</span></span><br><span class="line"> pt::read_xml(file_name, tree, pt::xml_parser::trim_whitespace);</span><br><span class="line"></span><br><span class="line"> pt::ptree &books = tree.get_child(<span class="string">"bookstore"</span>);</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"There are "</span> << books.size() << <span class="string">" books!"</span> << endl;</span><br><span class="line"> <span class="keyword">int</span> i = <span class="number">1</span>;</span><br><span class="line"> BOOST_FOREACH(<span class="keyword">const</span> pt::ptree::value_type &book, books) {</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"Book "</span> << i << endl;</span><br><span class="line"> <span class="built_in">string</span> category = book.second.get<<span class="built_in">string</span>>(<span class="string">"<xmlattr>.category"</span>);</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"Category: "</span> << category << endl;</span><br><span class="line"> </span><br><span class="line"> pt::ptree title = book.second.get_child(<span class="string">"title"</span>);</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"Title: "</span><<title.data()<<endl;</span><br><span class="line"> <span class="built_in">string</span> language = title.get<<span class="built_in">string</span>>(<span class="string">"<xmlattr>.lang"</span>);</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"Language: "</span><<language<<endl;</span><br><span class="line"></span><br><span class="line"> <span class="built_in">string</span> author = book.second.get<<span class="built_in">string</span>>(<span class="string">"author"</span>);</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"Author: "</span><<author<<endl;</span><br><span class="line"></span><br><span class="line"> pt::ptree year = book.second.get_child(<span class="string">"year"</span>);</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"Year: "</span><<year.data()<<endl;</span><br><span class="line"></span><br><span class="line"> <span class="built_in">string</span> price = book.second.get<<span class="built_in">string</span>>(<span class="string">"price"</span>);</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"Price: "</span><<price<<endl;</span><br><span class="line"> i++;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>编译CMakeLists.txt</p>
<figure class="highlight stylus"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="title">cmake_minimum_required</span><span class="params">(VERSION <span class="number">3.3</span>)</span></span></span><br><span class="line"><span class="function"><span class="title">project</span><span class="params">(boost_xml)</span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="title">set</span><span class="params">(CMAKE_CXX_FLAGS <span class="string">"${CMAKE_CXX_FLAGS} -std=c++11"</span>)</span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="title">set</span><span class="params">(SOURCE_FILES main.cpp)</span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="title">INCLUDE_DIRECTORIES</span><span class="params">(/usr/local/include)</span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="title">LINK_DIRECTORIES</span><span class="params">(/usr/local/lib)</span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="title">add_executable</span><span class="params">(boost_xml ${SOURCE_FILES})</span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="title">TARGET_LINK_LIBRARIES</span><span class="params">(boost_xml)</span></span></span><br></pre></td></tr></table></figure>
<p>运行<code>cmake .</code> 和 <code>make</code><br>运行结果:<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/boost_xml_c++.png" alt="C++版本运行结果"></p>
]]></content>
<summary type="html">
<p>XML(eXtensible Markup Language的缩写),意为可扩展的标记语言。与HTML相似,XML是一种显示数据的标记语言,它能使数据通过网络无障碍地进行传输,并显示在用户的浏览器上。可以简单理解为,XML是用于浏览器传输数据的一种数据格式。XML也可以用来保存数据,供其他系统的使用,虽然笔者更喜欢使用json来保存和传输数据。</p>
<p>本文分别介绍了使用python和C++的boost库来解析xml格式的文件。<br>
</summary>
<category term="boost" scheme="http://melonqi.cn/tags/boost/"/>
<category term="python" scheme="http://melonqi.cn/tags/python/"/>
<category term="xml" scheme="http://melonqi.cn/tags/xml/"/>
</entry>
<entry>
<title>命令行参数解析C++和Python版本</title>
<link href="http://melonqi.cn/2016/06/23/%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0%E8%A7%A3%E6%9E%90C-%E5%92%8CPython%E7%89%88%E6%9C%AC/"/>
<id>http://melonqi.cn/2016/06/23/命令行参数解析C-和Python版本/</id>
<published>2016-06-23T15:51:25.000Z</published>
<updated>2016-06-24T15:26:30.000Z</updated>
<content type="html"><![CDATA[<p>程序总是涉及命令行参数解析,尤其是linux下的程序。本文给出了C++版本和Python版本的命令行参数解析方法。其中C++版本使用了boost库。</p>
<a id="more"></a>
<h2 id="C++_u7248_u672C_u547D_u4EE4_u884C_u53C2_u6570_u89E3_u6790"><a href="#C++_u7248_u672C_u547D_u4EE4_u884C_u53C2_u6570_u89E3_u6790" class="headerlink" title="C++版本命令行参数解析"></a>C++版本命令行参数解析</h2><p>main.cpp<br>boost库的安装就省去,可自行百度</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><boost/program_options.hpp></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><string></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><vector></span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">namespace</span> po = boost::program_options;</span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span> </span>{</span><br><span class="line"> <span class="keyword">bool</span> cmdline_error = <span class="literal">false</span>;</span><br><span class="line"></span><br><span class="line"> po::<span class="function">options_description <span class="title">options</span><span class="params">(<span class="string">"Options"</span>)</span></span>;</span><br><span class="line"> options.add_options()(<span class="string">"help,h"</span>, <span class="string">"Use --help or -h to list all arguments"</span>)</span><br><span class="line"> (<span class="string">"file,f"</span>, po::value<<span class="built_in">vector</span><<span class="built_in">string</span>> >()->composing(), <span class="string">"Get the md5 of a file/files"</span>)</span><br><span class="line"> (<span class="string">"config,c"</span>, po::value<<span class="built_in">string</span>>()->default_value(<span class="string">"/etc/main.conf"</span>), <span class="string">"Configuration"</span>);</span><br><span class="line"></span><br><span class="line"> po::variables_map vm;</span><br><span class="line"> <span class="keyword">try</span> {</span><br><span class="line"></span><br><span class="line"> po::store(po::parse_command_line(argc, argv, options), vm);</span><br><span class="line"> po::notify(vm);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">catch</span> (po::error &e) {</span><br><span class="line"> cmdline_error = <span class="literal">true</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (vm.count(<span class="string">"help"</span>) || cmdline_error) {</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"Usage:"</span> << argv[<span class="number">0</span>] << <span class="string">" [OPTIONS]\n"</span> << endl;</span><br><span class="line"> <span class="built_in">cout</span> << options << endl;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (vm.count(<span class="string">"file"</span>)) {</span><br><span class="line"> <span class="built_in">vector</span><<span class="built_in">string</span>> files = vm[<span class="string">"file"</span>].as<<span class="built_in">vector</span><<span class="built_in">string</span>> >();</span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < files.size(); i++) {</span><br><span class="line"> <span class="built_in">cout</span> << <span class="string">"File "</span> << i + <span class="number">1</span> << <span class="string">" "</span> << files[i] << endl;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span>(vm.count(<span class="string">"config"</span>))</span><br><span class="line"> {</span><br><span class="line"> <span class="built_in">string</span> config_file = vm[<span class="string">"config"</span>].as<<span class="built_in">string</span>>();</span><br><span class="line"> <span class="built_in">cout</span><< <span class="string">"Config File: "</span><<config_file<<endl;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>编译命令<code>g++ -o main main.cpp -I/usr/local/include -L/usr/local/lib -lboost_program_options</code></p>
<p>程序效果:<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/boost_program_options1.png" alt="help"></p>
<p><img src="http://7xpwmi.com1.z0.glb.clouddn.com/boost_program_options2.png" alt="all"></p>
<p>还有一些更复杂的用法,比如从文件读配置,options group等请参考<a href="http://www.boost.org/doc/libs/1_61_0/doc/html/program_options/tutorial.html#idp308779504" target="_blank" rel="external">boost_program_options官方文档</a>。</p>
<h2 id="python_u7684argparse"><a href="#python_u7684argparse" class="headerlink" title="python的argparse"></a>python的argparse</h2>]]></content>
<summary type="html">
<p>程序总是涉及命令行参数解析,尤其是linux下的程序。本文给出了C++版本和Python版本的命令行参数解析方法。其中C++版本使用了boost库。</p>
</summary>
<category term="C++" scheme="http://melonqi.cn/tags/C/"/>
<category term="Python" scheme="http://melonqi.cn/tags/Python/"/>
<category term="命令行解析" scheme="http://melonqi.cn/tags/%E5%91%BD%E4%BB%A4%E8%A1%8C%E8%A7%A3%E6%9E%90/"/>
</entry>
<entry>
<title>SplitString.cpp</title>
<link href="http://melonqi.cn/2016/06/23/SplitString/"/>
<id>http://melonqi.cn/2016/06/23/SplitString/</id>
<published>2016-06-23T12:24:22.000Z</published>
<updated>2016-06-23T12:25:43.000Z</updated>
<content type="html"><![CDATA[<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">SplitString</span><span class="params">(<span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span>& s, <span class="built_in">std</span>::<span class="built_in">vector</span><<span class="built_in">std</span>::<span class="built_in">string</span>>& v, <span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span>& c)</span></span><br><span class="line"></span>{</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">string</span>::size_type pos1, pos2;</span><br><span class="line"> pos2 = s.find(c);</span><br><span class="line"> pos1 = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">while</span>(<span class="built_in">std</span>::<span class="built_in">string</span>::npos != pos2)</span><br><span class="line"> {</span><br><span class="line"> v.push_back(s.substr(pos1, pos2-pos1));</span><br><span class="line"> </span><br><span class="line"> pos1 = pos2 + c.size();</span><br><span class="line"> pos2 = s.find(c, pos1);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">if</span>(pos1 != s.length())</span><br><span class="line"> v.push_back(s.substr(pos1));</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
]]></content>
<summary type="html">
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="
</summary>
<category term="C++" scheme="http://melonqi.cn/tags/C/"/>
</entry>
<entry>
<title>nginx转发使用心得</title>
<link href="http://melonqi.cn/2016/04/23/nginx/"/>
<id>http://melonqi.cn/2016/04/23/nginx/</id>
<published>2016-04-23T09:12:43.000Z</published>
<updated>2016-04-23T15:10:22.000Z</updated>
<content type="html"><![CDATA[<p>最近使用hexo搭建了个人博客,hexo启动的时候使用了localhost作为监听ip,由于本地还有其他的提供web服务的程序,所以使用nginx作为反向代理,对外统一使用80端口提供服务,根据uri来定位服务。遇到了转发的时候hexo的博客没办法加载css等问题。</p>
<a id="more"></a>
<p>假设hexo使用命令 hexo server -i localhost 启动,默认监听4000端口,另外有一个使用django搭建的web服务,监听8080端口。假设域名是www.melonqi.com</p>
<p>整体的框架图如下图:<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/nginx.png" alt="整体框架"></p>
<h2 id="u95EE_u9898_u590D_u73B0"><a href="#u95EE_u9898_u590D_u73B0" class="headerlink" title="问题复现"></a>问题复现</h2><p>hexo中的_config.yml中的设置url和root分别为<br><code>url: http://www.melonqi.com</code><br><code>root: /</code></p>
<p>nginx的配置,主要是配置了server:</p>
<figure class="highlight crmsh"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">server{</span><br><span class="line"> listen <span class="number">80</span>;</span><br><span class="line"> server_name www.melonqi.com;</span><br><span class="line"> <span class="keyword">location</span> <span class="title">^/tools</span> {</span><br><span class="line"> proxy_pass http://<span class="number">127.0</span>.<span class="number">0.1</span>:<span class="number">8080</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">location</span> <span class="title">/blog</span> {</span><br><span class="line"> proxy_pass http://localhost:<span class="number">4000</span>/;</span><br><span class="line"> }</span><br><span class="line"> }</span><br></pre></td></tr></table></figure>
<p>注意是<code>http://localhost:4000/</code>,而非<code>http://localhost:4000</code></p>
<p>或者使用rewrite来改写</p>
<figure class="highlight xquery"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">server{</span><br><span class="line"> listen <span class="number">80</span>;</span><br><span class="line"> server_name www.melonqi.com;</span><br><span class="line"> location ^/tools {</span><br><span class="line"> proxy_pass http://<span class="number">127.0</span>.<span class="number">0</span>.<span class="number">1</span>:<span class="number">8080</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> location /blog {</span><br><span class="line"> rewrite /blog(.*) /<span class="variable">$1</span> break;</span><br><span class="line"> proxy_pass http://localhost:<span class="number">4000</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br></pre></td></tr></table></figure>
<p>这样nginx根据不同的uri将请求转发给不同的服务,举个例子,访问www.melonqi.com/blog/index.html 转发给hexo就是localhost:4000/index.html</p>
<p>hexo拿到这样的连接,根据自己的url和root就能处理请求,并通过nginx返回。但是会发现浏览器看到的结果是很丑陋的界面,css等都加载不上了。</p>
<p>通过Chrome打开源代码,可以看到css连接是类似于 css/a.css,用户就会请求www.melonqi.com/css/a.css,这个css只有hexo有,nginx无法匹配这个资源,也无法跳转hexo请求该资源,从而导致了资源的丢失,加载不了。</p>
<h2 id="u89E3_u51B3_u529E_u6CD5"><a href="#u89E3_u51B3_u529E_u6CD5" class="headerlink" title="解决办法"></a>解决办法</h2><p>hexo返回的资源都带有blog前缀,这样nginx就能从hexo请求到该资源了。<br>修改hexo的_config.yml<br><code>url: http://www.melonqi.com/blog</code><br><code>root: /blog/</code></p>
<p>修改nginx配置</p>
<figure class="highlight crmsh"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">server{</span><br><span class="line"> listen <span class="number">80</span>;</span><br><span class="line"> server_name www.melonqi.com;</span><br><span class="line"> <span class="keyword">location</span> <span class="title">^/tools</span> {</span><br><span class="line"> proxy_pass http://<span class="number">127.0</span>.<span class="number">0.1</span>:<span class="number">8080</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">location</span> <span class="title">/blog</span> {</span><br><span class="line"> proxy_pass http://localhost:<span class="number">4000</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br></pre></td></tr></table></figure>
<p>注意这里<code>proxy_pass http://localhost:4000</code>,不是<code>proxy_pass http://localhost:4000/</code></p>
<p>这样nginx接到了<code>www.melonqi.com/blog/index.html</code>的请求,就会把<code>localhost:4000/blog/index.html</code>转发给hexo,而由于指定了hexo的url和root,hexo返回的资源文件都是类似于 <code>blog/css/a.css</code>。用户请求资源的时候就会请求<code>www.melonqi.com/blog/css/a.css</code>,根据location,hexo能够接收到该请求并将资源返回给客户端,客户端能够进行正常显示。问题解决。</p>
<h2 id="u5FC3_u5F97"><a href="#u5FC3_u5F97" class="headerlink" title="心得"></a>心得</h2><p>nginx最好做直接转发,只替换host返回给其他web服务,而保证该web服务能够正常处理该请求。尽量减少类似<code>www.melonqi.com/blog/index.html</code>转发给<code>localhost:4000/index.html</code>这种处理规则,否则就可能出来资源文件请求问题。</p>
]]></content>
<summary type="html">
<p>最近使用hexo搭建了个人博客,hexo启动的时候使用了localhost作为监听ip,由于本地还有其他的提供web服务的程序,所以使用nginx作为反向代理,对外统一使用80端口提供服务,根据uri来定位服务。遇到了转发的时候hexo的博客没办法加载css等问题。</p>
</summary>
<category term="nginx" scheme="http://melonqi.cn/tags/nginx/"/>
</entry>
<entry>
<title>syslog的使用</title>
<link href="http://melonqi.cn/2016/03/18/syslog/"/>
<id>http://melonqi.cn/2016/03/18/syslog/</id>
<published>2016-03-18T02:05:03.000Z</published>
<updated>2016-03-18T07:24:13.000Z</updated>
<content type="html"><![CDATA[<h2 id="syslog_u662F_u4EC0_u4E48"><a href="#syslog_u662F_u4EC0_u4E48" class="headerlink" title="syslog是什么"></a>syslog是什么</h2><p>简单的说syslog是做日志处理的,用来监控程序的运行状态。<br><a id="more"></a></p>
<h2 id="u672C_u5730_u751F_u6210_u65E5_u5FD7"><a href="#u672C_u5730_u751F_u6210_u65E5_u5FD7" class="headerlink" title="本地生成日志"></a>本地生成日志</h2><p>假设已经安装好了syslog或者rsyslog,而且对日志的等级和级别有一定的了解,如果不了解,请参考<a href="http://linux.vbird.org/linux_basic/0570syslog.php" target="_blank" rel="external">鸟哥的Linux私房菜</a>。</p>
<p>之前开发过的一个C++程序,实现的功能是对每天的流量进行统计,并将结果用syslog写到本地日志中,功能比较简单。</p>
<h3 id="u6838_u5FC3_u4EE3_u7801_u793A_u4F8B"><a href="#u6838_u5FC3_u4EE3_u7801_u793A_u4F8B" class="headerlink" title="核心代码示例"></a>核心代码示例</h3><p>头文件my_log.hpp</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">ifndef</span> MY_LOG_HPP_</span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">define</span> MY_LOG_HPP_</span></span><br><span class="line"></span><br><span class="line"><span class="preprocessor">#<span class="keyword">pragma</span> once</span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><syslog.h></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><string></span></span></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">my_openlog</span><span class="params">(<span class="keyword">const</span> <span class="keyword">char</span> *program, <span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span>& facility_arg)</span></span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">inline</span> <span class="keyword">void</span> <span class="title">my_closelog</span><span class="params">(<span class="keyword">void</span>)</span></span><br><span class="line"></span>{</span><br><span class="line"> closelog();</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">my_log_push</span><span class="params">(<span class="built_in">std</span>::<span class="built_in">string</span>& content)</span></span>;</span><br><span class="line"><span class="preprocessor">#<span class="keyword">endif</span></span></span><br></pre></td></tr></table></figure>
<p>my_log.cpp</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string">"my_log.hpp"</span></span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">my_openlog</span><span class="params">(<span class="keyword">const</span> <span class="keyword">char</span> *program, <span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span>& facility_arg)</span></span><br><span class="line"></span>{</span><br><span class="line"> <span class="keyword">int</span> facility = LOG_USER;</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">map</span><<span class="built_in">std</span>::<span class="built_in">string</span>, <span class="keyword">int</span>> facilities;</span><br><span class="line"> </span><br><span class="line"> facilities[<span class="string">"LOG_LOCAL1"</span>] = LOG_LOCAL1;</span><br><span class="line"> facilities[<span class="string">"LOG_LOCAL2"</span>] = LOG_LOCAL2;</span><br><span class="line"> facilities[<span class="string">"LOG_LOCAL3"</span>] = LOG_LOCAL3;</span><br><span class="line"> facilities[<span class="string">"LOG_LOCAL4"</span>] = LOG_LOCAL4;</span><br><span class="line"> facilities[<span class="string">"LOG_LOCAL5"</span>] = LOG_LOCAL5;</span><br><span class="line"> facilities[<span class="string">"LOG_LOCAL6"</span>] = LOG_LOCAL6;</span><br><span class="line"> facilities[<span class="string">"LOG_LOCAL7"</span>] = LOG_LOCAL7;</span><br><span class="line"> </span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">map</span><<span class="built_in">std</span>::<span class="built_in">string</span>, <span class="keyword">int</span>>::iterator it = facilities.find(facility_arg);</span><br><span class="line"> <span class="keyword">if</span> (it != facilities.end())</span><br><span class="line"> facility = it->second;</span><br><span class="line"> openlog(program, LOG_CONS | LOG_ODELAY, facility); </span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">void</span> <span class="title">my_log_push</span><span class="params">(<span class="built_in">std</span>::<span class="built_in">string</span>& content)</span></span><br><span class="line"></span>{</span><br><span class="line"> <span class="keyword">time_t</span> now;</span><br><span class="line"> <span class="keyword">struct</span> tm tm_now;</span><br><span class="line"> <span class="keyword">char</span> timestamp[<span class="number">32</span>];</span><br><span class="line"> now = time(<span class="literal">NULL</span>);</span><br><span class="line"> localtime_r(&now, &tm_now);</span><br><span class="line"> strftime(timestamp, <span class="keyword">sizeof</span> timestamp,<span class="string">"%Y-%m-%d %H:%M:%S"</span>, &tm_now);</span><br><span class="line"> <span class="keyword">int</span> log_type = LOG_NOTICE;</span><br><span class="line"> <span class="keyword">const</span> <span class="keyword">char</span> *identity = <span class="string">"test"</span>;</span><br><span class="line"> syslog(log_type,<span class="string">"%s|%s"</span>, identity,timestamp);</span><br><span class="line">}</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>main.cpp</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string">"my_log.hpp"</span></span></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">(<span class="keyword">int</span> argc, <span class="keyword">char</span> *argv[])</span></span><br><span class="line"></span>{</span><br><span class="line"> my_openlog(argv[<span class="number">0</span>], <span class="string">"LOG_LOCAL5"</span>);</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">string</span> content = <span class="string">"hello"</span>;</span><br><span class="line"> my_log_push(content);</span><br><span class="line"> my_closelog();</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>Makefile</p>
<figure class="highlight mel"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">CC = g++</span><br><span class="line">RM = /bin/rm</span><br><span class="line">ECHO = /bin/echo</span><br><span class="line">MAIN_EXE = mylog</span><br><span class="line">EXECUTABLES = <span class="variable">$(</span>MAIN_EXE)</span><br><span class="line">SOURCES = <span class="variable">$(</span>wildcard <span class="variable">*.</span>cpp)</span><br><span class="line">OBJECTS = <span class="variable">$(</span>patsubst <span class="variable">%.</span>cpp, <span class="variable">%.</span>o, <span class="variable">$(</span>SOURCES))</span><br><span class="line"></span><br><span class="line">.PHONY: all clean</span><br><span class="line"></span><br><span class="line">all: <span class="variable">$(</span>EXECUTABLES)</span><br><span class="line"></span><br><span class="line"><span class="variable">$(</span>MAIN_EXE): <span class="variable">$(</span>OBJECTS)</span><br><span class="line"> <span class="variable">$(</span>CC) -o <span class="variable">$@</span> <span class="variable">$^</span> </span><br><span class="line"></span><br><span class="line"><span class="variable">%.</span>o: <span class="variable">%.</span>cpp</span><br><span class="line"> <span class="variable">$(</span>CC) -o <span class="variable">$@</span> -c <span class="variable">$<</span> </span><br><span class="line"></span><br><span class="line">clean:</span><br><span class="line"> - <span class="variable">$(</span>RM) -rf <span class="variable">$(</span>OBJECTS) <span class="variable">$(</span>MAIN_EXE)</span><br></pre></td></tr></table></figure>
<p><code>make</code>编译,<code>./mylog</code>运行,查看<code>/var/log/messages</code>可以看到刚才打印的日志。</p>
]]></content>
<summary type="html">
<h2 id="syslog_u662F_u4EC0_u4E48"><a href="#syslog_u662F_u4EC0_u4E48" class="headerlink" title="syslog是什么"></a>syslog是什么</h2><p>简单的说syslog是做日志处理的,用来监控程序的运行状态。<br>
</summary>
<category term="syslog,C++" scheme="http://melonqi.cn/tags/syslog%EF%BC%8CC/"/>
</entry>
<entry>
<title>有限状态机解析Http报文</title>
<link href="http://melonqi.cn/2016/01/20/Http-Parser/"/>
<id>http://melonqi.cn/2016/01/20/Http-Parser/</id>
<published>2016-01-20T13:53:08.000Z</published>
<updated>2016-01-21T06:19:34.000Z</updated>
<content type="html"><![CDATA[<p>Http报文解析是做后台和客户端都是必须的,其实就是字符串处理,使用查找子串的方式肯定会慢很多,这对追求性能的C++来说是不能接受,所以引入只需要遍历一遍的有限状态机的方式来解析。<br><a id="more"></a><br>HTTP请求格式:</p>
<blockquote>
<p><code><request-line></code><br><code><headers></code><br><code><blank line></code><br><code>[<request-body>]</code></p>
</blockquote>
<p>说明: 第一行必须是一个请求行(request-line),用来说明请求类型,要访问的资源以及所使用的HTTP版本. 紧接着是一个首部(header)小节,用来说明服务器要使用的附加信息.之后是一个空行. 再后面可以添加任意的其他数据[称之为主体(body)].<br>举例:</p>
<blockquote>
<p><code>GET / HTTP/1.1</code><br><code>Accept: */*</code><br><code>Accept-Language: zh-cn</code><br><code>Accept-Encoding: gzip, deflate</code><br><code>User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)</code><br><code>Host: www.google.cn</code><br><code>Connection: Keep-Alive</code></p>
</blockquote>
<p>说明: 请求的第一部分说明了该请求是一个GET请求. 该行的第二部分是一个斜杠(/),用来说明请求的是该域名的根目录. 该行的最后一部分说明使用的是HTTP1.1版本(另一个可选荐是1.0).</p>
<p>第2行是请求的第一个首部,HOST将指出请求的目的地. User-Agent,服务器端和客户端脚本都能访问它,它是浏览器类型检测逻辑的重要基础. 该信息由你的浏览器来定义, 并且在每个请求中自动发送. Connection,通常将浏览器操作设置为Keep-Alive</p>
<p>第三部分,空行,即使不存在请求主体,这个空行也是必需的.</p>
<h2 id="u6709_u9650_u72B6_u6001_u673A"><a href="#u6709_u9650_u72B6_u6001_u673A" class="headerlink" title="有限状态机"></a>有限状态机</h2><p>有限状态机(FSM:Finite State Machine),简称状态机,是表示有限多个状态以及在这些状态之间转移和动作的数学模型。状态存储关于过去的信息,它反映从系统开始到现在时刻输入的变化;转移指示状态变更,用必须满足来确使转移发生的条件来描述它;动作是在给定时刻要进行的活动描述。</p>
<p>有多种类型的动作:</p>
<ul>
<li>进入动作(entry action):在进入状态时进行;</li>
<li>退出动作:在退出状态时进行;</li>
<li>输入动作:依赖于当前状态和输入条件进行;</li>
<li>转移动作:在特定转移时进行。</li>
</ul>
<p>状态机最重要的两个概念是:状态和转移。状态是程序所处的某个状态,比如TCP链接的时候出于要接受链接状态。再某个状态下,如果有特定的条件发生,程序将会转移到另外的状态。<br>举一个生活中最常见的例子:我们每一天的状态基本上都是睡眠,起床,吃饭,工作,吃饭,工作,吃饭,休闲,睡眠;其中睡眠,吃饭,等都是我们所处的状态,状态之间的转移需要一些条件的发生,比如睡眠到起床需要8:00的到来。当然也会有一些动作产生,我们这里就忽略动作。<br>用椭圆代表状态,箭头代表状态转移,箭头上方的文字代表是条件。<br>用状态转移图表示一天的流程:<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/wholeday.png" alt="一天的流程"><br>最常见的状态机是TCP的状态机。<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/TCP_AC.png" alt="TCP状态机"></p>
<h2 id="Parser_u7C7B"><a href="#Parser_u7C7B" class="headerlink" title="Parser类"></a>Parser类</h2><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> Parser</span><br><span class="line">{</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> Parser(<span class="keyword">char</span> *data,<span class="keyword">int</span> len):</span><br><span class="line"> packet(data),length(len){</span><br><span class="line"> parse_packet();</span><br><span class="line"> }</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">vector</span><<span class="built_in">std</span>::<span class="built_in">string</span>> getParameters();<span class="comment">//查询所有的Key</span></span><br><span class="line"> <span class="built_in">std</span>::<span class="function"><span class="built_in">string</span> <span class="title">getValueByParameter</span><span class="params">(<span class="built_in">std</span>::<span class="built_in">string</span> parameter)</span></span>;<span class="comment">//查询Key对应的Value</span></span><br><span class="line"> <span class="function"><span class="keyword">inline</span> <span class="keyword">int</span> <span class="title">getVersion</span><span class="params">()</span></span>{<span class="keyword">return</span> version;}<span class="comment">//获得http版本</span></span><br><span class="line"> <span class="keyword">inline</span> <span class="built_in">std</span>::<span class="function"><span class="built_in">string</span> <span class="title">getMethod</span><span class="params">()</span></span>{<span class="keyword">return</span> method;}<span class="comment">//获取http方法</span></span><br><span class="line"> <span class="keyword">inline</span> <span class="built_in">std</span>::<span class="function"><span class="built_in">string</span> <span class="title">getUri</span><span class="params">()</span></span>{<span class="keyword">return</span> uri;}<span class="comment">//获取uri</span></span><br><span class="line"><span class="keyword">private</span>:</span><br><span class="line"> <span class="function"><span class="keyword">int</span> <span class="title">parse_packet</span><span class="params">()</span></span>;<span class="comment">//解析整个http包</span></span><br><span class="line"> <span class="function"><span class="keyword">int</span> <span class="title">parse_header_line</span><span class="params">()</span></span>;<span class="comment">//解析headers line</span></span><br><span class="line"> <span class="function"><span class="keyword">int</span> <span class="title">parse_request_line</span><span class="params">()</span></span>;<span class="comment">//解析request line</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">private</span>:</span><br><span class="line"> <span class="keyword">int</span> length; <span class="comment">//http长度</span></span><br><span class="line"> <span class="keyword">int</span> version; <span class="comment">//Http版本</span></span><br><span class="line"> <span class="keyword">char</span> *packet;<span class="comment">//http数据包</span></span><br><span class="line"> <span class="keyword">char</span> *pos;<span class="comment">//解析时,记录已解析的位置</span></span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">string</span> method;<span class="comment">//http方法</span></span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">string</span> uri;<span class="comment">// uri</span></span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">map</span><<span class="built_in">std</span>::<span class="built_in">string</span>, <span class="built_in">std</span>::<span class="built_in">string</span>> KVMap;<span class="comment">//存放headers中的KeyValue对</span></span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<h2 id="u89E3_u6790_u6574_u4F53"><a href="#u89E3_u6790_u6574_u4F53" class="headerlink" title="解析整体"></a>解析整体</h2><p>一行一行解析,先解析request line,后解析headers直到碰到空白行。</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">int</span> Parser::parse_packet()</span><br><span class="line">{</span><br><span class="line"> pos = packet;</span><br><span class="line"> <span class="keyword">int</span> ret = parse_request_line();</span><br><span class="line"> <span class="keyword">switch</span>(ret) {</span><br><span class="line"> <span class="keyword">case</span> HTTP_PARSE_INVALID_REQUEST:</span><br><span class="line"> <span class="keyword">case</span> HTTP_PARSE_IGNORED_METHOD:</span><br><span class="line"> <span class="keyword">return</span> HTTP_ERR;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>(;;)</span><br><span class="line"> {</span><br><span class="line"> ret = parse_header_line();</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (ret == HTTP_PARSE_HEADER_DONE) {</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">else</span> <span class="keyword">if</span> (ret == HTTP_PARSE_AGAIN && HTTP_PARSE_INVALID_HEADER == ret) {</span><br><span class="line"> <span class="keyword">return</span> HTTP_ERR;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> ret;</span><br><span class="line"></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h2 id="u89E3_u6790headers"><a href="#u89E3_u6790headers" class="headerlink" title="解析headers"></a>解析headers</h2><p>Http请求报文中的headers部分是由多行组成,其中每行都是KeyValue格式,具体格式为<code>name: value \r\n</code>. 注意有空白行<code>\r\n</code>,也需要加入到状态机中。<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/ac.png" alt="有限状态机"><br>状态解析:</p>
<ol>
<li>start是程序的开始,在此状态下,如果字符是<code>\r</code>,则有可能是空白行,整个headers的部分就快解析完成,就会进入到headers almost done状态;若是其他字符,状态就会迁移到name状态。当然这里省略了不合法字符的状态迁移,比如在start状态时的字符是<code>\n</code>,说明整个headers都是不合法的,导致程序会退出。</li>
<li>name状态说明程序正在解析Key部分,记录进入name状态的第一个字符位置和出name状态的最后一个字符位置,最终可以得到name字段。在name状态中若接受到的字符是’:’,说明name解析完成,将会进入到space before value状态;若接收到除了<code>\r</code>和<code>\n</code>以外的字符,说明只解析了Key的一部分,仍然停留在name状态。</li>
<li>在space before value状态中,用于去除value开头多余的空格。在此状态只要接受的都是’ ‘,就将保留在此状态。其他字符将会进入value状态。</li>
<li>在value状态将会得到Value值,需要记录一下进入value状态和出value状态的字符位置。当字符是’ ‘时,可能代表了value的解析完成,就进入到space after value;</li>
<li>在space after value状态中,若字符是’ ‘,继续停留在此状态;若是除了<code>\r</code>,<code>\0</code>和<code>\n</code>,将进入到value状态,因为在Value中也可能出现’ ‘。</li>
<li>在name,space before value,value,space after value状态接受到<code>\r</code>都会进入到almost done状态,代表一行的解析差不多结束了。</li>
<li>在almost done状态时,想要接受的字符是<code>\n</code>,才能正常结束;若接受到<code>\r</code>,持续停留在这个状态。</li>
<li>有一些细节可能没有处理,比如对<code>\0</code>字符的处理等等,需要参考HTTP标准,多增加状态来处理,这里就忽略。</li>
</ol>
<p>相关代码:</p>
<figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br><span class="line">181</span><br><span class="line">182</span><br><span class="line">183</span><br><span class="line">184</span><br><span class="line">185</span><br><span class="line">186</span><br><span class="line">187</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">int</span> Parser::parse_header_line()</span><br><span class="line">{</span><br><span class="line"> <span class="keyword">enum</span> {</span><br><span class="line"> sw_start = <span class="number">0</span>,</span><br><span class="line"> sw_name,</span><br><span class="line"> sw_space_before_value,</span><br><span class="line"> sw_value,</span><br><span class="line"> sw_space_after_value,</span><br><span class="line"> sw_ignore_line,</span><br><span class="line"> sw_almost_done,</span><br><span class="line"> sw_header_almost_done</span><br><span class="line"> };</span><br><span class="line"></span><br><span class="line"> <span class="keyword">char</span> ch, *p=<span class="literal">NULL</span>;</span><br><span class="line"> <span class="keyword">uint32_t</span> state = sw_start;</span><br><span class="line"> <span class="keyword">char</span> *header_name_start=<span class="literal">NULL</span>,*header_name_end=<span class="literal">NULL</span>;</span><br><span class="line"> <span class="keyword">char</span> *header_start=<span class="literal">NULL</span>,*header_end=<span class="literal">NULL</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>(p = pos; p < packet +length; p++)</span><br><span class="line"> {</span><br><span class="line"> ch = *p;</span><br><span class="line"> <span class="keyword">switch</span>(state)</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">case</span> sw_start:</span><br><span class="line"> {</span><br><span class="line"> header_name_start = p;</span><br><span class="line"> <span class="keyword">switch</span> (ch) {</span><br><span class="line"> <span class="keyword">case</span> CR:</span><br><span class="line"> header_end = p;</span><br><span class="line"> state = sw_header_almost_done;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> LF:</span><br><span class="line"> header_end = p;</span><br><span class="line"> pos = p + <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">return</span> HTTP_PARSE_HEADER_DONE;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> state = sw_name;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">case</span> sw_name:</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">switch</span>(ch)</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">case</span> <span class="string">':'</span>:</span><br><span class="line"> header_name_end = p;</span><br><span class="line"> state = sw_space_before_value;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> CR:</span><br><span class="line"> header_name_end = p;</span><br><span class="line"> header_start = p;</span><br><span class="line"> header_end = p;</span><br><span class="line"> state = sw_almost_done;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> LF:</span><br><span class="line"> header_name_end = p;</span><br><span class="line"> header_start = p;</span><br><span class="line"> header_end = p;</span><br><span class="line"> pos = p + <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">case</span> sw_space_before_value:</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">switch</span> (ch) {</span><br><span class="line"> <span class="keyword">case</span> <span class="string">' '</span>:</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> CR:</span><br><span class="line"> header_start = p;</span><br><span class="line"> header_end = p;</span><br><span class="line"> state = sw_almost_done;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> LF:</span><br><span class="line"> header_start = p;</span><br><span class="line"> header_end = p;</span><br><span class="line"> pos = p + <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">case</span> <span class="string">'\0'</span>:</span><br><span class="line"> <span class="keyword">return</span> HTTP_PARSE_INVALID_HEADER;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> header_start = p;</span><br><span class="line"> state = sw_value;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> sw_value:</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">switch</span> (ch) {</span><br><span class="line"> <span class="keyword">case</span> <span class="string">' '</span>:</span><br><span class="line"> header_end = p;</span><br><span class="line"> state = sw_space_after_value;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> CR:</span><br><span class="line"> header_end = p;</span><br><span class="line"></span><br><span class="line"> state = sw_almost_done;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> LF:</span><br><span class="line"> header_end = p;</span><br><span class="line"> pos = p + <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">case</span> <span class="string">'\0'</span>:</span><br><span class="line"> <span class="keyword">return</span> HTTP_PARSE_INVALID_HEADER;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> sw_space_after_value:</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">switch</span> (ch) {</span><br><span class="line"> <span class="keyword">case</span> <span class="string">' '</span>:</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> CR:</span><br><span class="line"></span><br><span class="line"> state = sw_almost_done;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> LF:</span><br><span class="line"> pos = p + <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">case</span> <span class="string">'\0'</span>:</span><br><span class="line"> <span class="keyword">return</span> HTTP_PARSE_INVALID_HEADER;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> state = sw_value;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> sw_ignore_line:</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">switch</span> (ch) {</span><br><span class="line"> <span class="keyword">case</span> LF:</span><br><span class="line"> state = sw_start;</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> sw_almost_done:</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">switch</span> (ch) {</span><br><span class="line"> <span class="keyword">case</span> LF:</span><br><span class="line"> pos = p + <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">if</span>(header_name_start!=<span class="literal">NULL</span></span><br><span class="line"> &&header_name_end>header_name_start</span><br><span class="line"> && header_start!=<span class="literal">NULL</span></span><br><span class="line"> && header_end>header_start)</span><br><span class="line"> {</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">string</span> key = <span class="built_in">std</span>::<span class="built_in">string</span>(header_name_start,header_name_end-header_name_start);</span><br><span class="line"> <span class="built_in">std</span>::<span class="built_in">string</span> value = <span class="built_in">std</span>::<span class="built_in">string</span>(header_start,header_end-header_start);</span><br><span class="line"> KVMap[key]=value;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">case</span> CR:</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> <span class="keyword">return</span> HTTP_PARSE_INVALID_HEADER;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">case</span> sw_header_almost_done:</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">switch</span> (ch) {</span><br><span class="line"> <span class="keyword">case</span> LF:</span><br><span class="line"> pos = p + <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">return</span> HTTP_PARSE_HEADER_DONE;</span><br><span class="line"> <span class="keyword">case</span> CR:</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">default</span>:</span><br><span class="line"> <span class="keyword">return</span> HTTP_PARSE_INVALID_HEADER;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> pos = p;</span><br><span class="line"> <span class="keyword">return</span> HTTP_PARSE_AGAIN;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h2 id="u89E3_u6790request-line"><a href="#u89E3_u6790request-line" class="headerlink" title="解析request-line"></a>解析request-line</h2><p>Http请求报文中的headers部分的格式是<code>method uri version\r\n</code>,原理相同,但是代码相对较长,就省略了,详细代码请参考:<a href="http://7xpwmi.com1.z0.glb.clouddn.com/http_parser.zip" target="_blank" rel="external">http_parser</a></p>
]]></content>
<summary type="html">
<p>Http报文解析是做后台和客户端都是必须的,其实就是字符串处理,使用查找子串的方式肯定会慢很多,这对追求性能的C++来说是不能接受,所以引入只需要遍历一遍的有限状态机的方式来解析。<br>
</summary>
<category term="http" scheme="http://melonqi.cn/tags/http/"/>
<category term="有限状态机" scheme="http://melonqi.cn/tags/%E6%9C%89%E9%99%90%E7%8A%B6%E6%80%81%E6%9C%BA/"/>
</entry>
<entry>
<title>Makefile 学习笔记</title>
<link href="http://melonqi.cn/2016/01/16/Makefile/"/>
<id>http://melonqi.cn/2016/01/16/Makefile/</id>
<published>2016-01-15T16:21:37.000Z</published>
<updated>2016-01-15T16:23:50.000Z</updated>
<content type="html"><![CDATA[<h2 id="Makefile_u662F_u4EC0_u4E48"><a href="#Makefile_u662F_u4EC0_u4E48" class="headerlink" title="Makefile是什么"></a>Makefile是什么</h2><h3 id="u5173_u4E8E_u7A0B_u5E8F_u7684_u7F16_u8BD1_u548C_u94FE_u63A5"><a href="#u5173_u4E8E_u7A0B_u5E8F_u7684_u7F16_u8BD1_u548C_u94FE_u63A5" class="headerlink" title="关于程序的编译和链接"></a>关于程序的编译和链接</h3><p> 一般来说,无论是C还是C++,首先要把源文件编译成中间代码文件,在Windows下也就是 .obj 文件,UNIX下是 .o 文件,即 Object File,这个动作叫做编译(compile),一般来说,每个源文件都应该对应于一个中间目标文件(O文件或是OBJ文件)。然后再把大量的Object File合成执行文件,这个动作叫作链接(link)。</p>
<p> 编译时,编译器需要的是语法的正确,函数与变量的声明的正确。对于后者,通常是你需要告诉编译器头文件的所在位置(头文件中应该只是声明,而定义应该放在C/C++文件中),只要所有的语法正确,编译器就可以编译出中间目标文件。<br><a id="more"></a><br> 链接时,主要是链接函数和全局变量,所以,我们可以使用这些中间目标文件(O文件或是OBJ文件)来 链接我们的应用程序。链接器并不管函数所在的源文件,只管函数的中间目标文件(Object File),在大多数时候,由于源文件太多,编译生成的中间目标文件太多,而在链接时需要明显地指出中间目标文件名,这对于编译很不方便,所以,我们要给中间目标文件打个包,在Windows下这种包叫“库文件”(Library File),也就是 .lib 文件,在UNIX下,是Archive File,也就是 .a 文件。</p>
<blockquote>
<p>一个工程中的源文件不计其数,其按类型、功能、模块分别放在若干个目录中,makefile定义了一系列的规则来指定,哪些文件需要先编译,哪些文件需要后编译,哪些文件需要重新编译,甚至于进行更复杂的功能操作,因为 makefile就像一个Shell脚本一样,其中也可以执行操作系统的命令。</p>
</blockquote>
<p>简单的说,Makefile是告诉用户程序该怎么编译,用户可以简单使用<code>make</code>命令就能编译一个软件,极大地提升了编程效率。 </p>
<blockquote>
<p><code>make</code>命令默认使用名称为Makefile或makefile的文件作为编译规则,若想指定makefile文件,使用<code>make -f file</code></p>
</blockquote>
<h2 id="Makefile_u89C4_u5219"><a href="#Makefile_u89C4_u5219" class="headerlink" title="Makefile规则"></a>Makefile规则</h2><p>Makefile简单的规则为:</p>
<figure class="highlight gcode"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">目标 : 需要的条件 <span class="comment">(注意冒号两边有空格)</span></span><br><span class="line"> 命令 <span class="comment">(注意前面用tab键开头)</span></span><br></pre></td></tr></table></figure>
<ol>
<li>目标可以是一个或多个,可以是Object File,也可以是执行文件,甚至可以是一个标签。</li>
<li>需要的条件就是生成目标所需要的文件或目标</li>
<li>命令就是生成目标所需要执行的脚本</li>
</ol>
<p> 总结一下,就是说一条makefile规则规定了编译的依赖关系,也就是目标文件依赖于条件,生成规则用命令来描述。在编译时,如果需要的条件的文件比目标更新的话,就会执行生成命令来更新目标。</p>
<h2 id="u4E07_u80FDMakefile_u6A21_u677F"><a href="#u4E07_u80FDMakefile_u6A21_u677F" class="headerlink" title="万能Makefile模板"></a>万能Makefile模板</h2><figure class="highlight makefile"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#指定编译器</span></span><br><span class="line"><span class="constant">CC</span> = g++</span><br><span class="line"></span><br><span class="line"><span class="comment">#定义rm命令</span></span><br><span class="line"><span class="constant">RM</span> = /bin/rm</span><br><span class="line"></span><br><span class="line"><span class="comment">#定义echo命令,用于打印信息</span></span><br><span class="line"><span class="constant">ECHO</span> = /bin/echo</span><br><span class="line"></span><br><span class="line"><span class="comment">#编译选项,定义了HAVE_DEBUG变量,用于调试</span></span><br><span class="line"><span class="constant">DEFINES</span> = -DHAVE_DEBUG </span><br><span class="line"></span><br><span class="line"><span class="comment">#编译选项,</span></span><br><span class="line"><span class="comment">#-g是生成debug版本,-O2代码优化,</span></span><br><span class="line"><span class="comment">#-Wall提示所用waring,可根据实际需要来选择</span></span><br><span class="line"><span class="constant">CPPFLAGS</span> = -g -O2 -Wall</span><br><span class="line"></span><br><span class="line"><span class="comment">#把include 加入到搜索头文件的路径列表中</span></span><br><span class="line"><span class="constant">INCLUDES</span> = -Iinclude </span><br><span class="line"></span><br><span class="line"><span class="comment">#-lcrypto进行链接时搜索名为crypto的库,</span></span><br><span class="line"><span class="comment">#-lssl 进行链接时搜索名为ssl的库</span></span><br><span class="line"><span class="comment">#-Llib 把lib加入到搜索库文件的路径列表中</span></span><br><span class="line"><span class="constant">LDFLAGS</span> = -lcrypto -lssl</span><br><span class="line"></span><br><span class="line"><span class="comment">#定义生成可执行文件名</span></span><br><span class="line"><span class="constant">MAIN_EXE</span> = hello</span><br><span class="line"></span><br><span class="line"><span class="comment">#定义变量,定义all操作的条件,可包含多个要生成的可执行文件</span></span><br><span class="line"><span class="comment">#例如EXECUTABLES = main1 main2,这样执行all操作就会分别生成main1 和 main2</span></span><br><span class="line"><span class="constant">EXECUTABLES</span> = <span class="variable">$(MAIN_EXE)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#指定源文件,</span></span><br><span class="line"><span class="comment">#wildcard是通配符,搜索./src目录下所有以.cpp结尾的文件,</span></span><br><span class="line"><span class="comment">#生成一个以空格间隔的文件名列表,并赋值给SOURCES. </span></span><br><span class="line"><span class="comment">#当前目录文件只有文件名, 子目录下的文件名包含路径信息,比如./src/bar.cpp。</span></span><br><span class="line"><span class="constant">SOURCES</span> = <span class="variable">$(wildcard src/*.cpp)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#指定要生成的*.o文件,</span></span><br><span class="line"><span class="comment">#patsubst是pattern substitute的缩写,匹配替代的意思。</span></span><br><span class="line"><span class="comment">#这句是在SOURCES中找到所有.cpp 结尾的文件,然后把所有的.cpp换成.o。</span></span><br><span class="line"><span class="constant">OBJECTS</span> = <span class="variable">$(patsubst %.cpp, %.o, $(SOURCES)</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment">#PHONY 目标并非实际的文件名:只是在显式请求时执行命令的名字。</span></span><br><span class="line"><span class="comment">#有两种理由需要使用PHONY 目标:避免和同名文件冲突,改善性能。</span></span><br><span class="line"><span class="comment">#如果编写一个规则,并不产生目标文件,则其命令在每次make 该目标时都执行。</span></span><br><span class="line"><span class="phony"><span class="keyword">.PHONY</span>: all clean</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#目标为all需要条件,$(EXECUTABLES)</span></span><br><span class="line"><span class="comment">#由于.PHONY中有all,可解决执行文件生成的依赖关系</span></span><br><span class="line">all: $(EXECUTABLES) </span><br><span class="line"></span><br><span class="line"><span class="comment">#生成$(MAIN_EXE),需要条件$(OBJECTS)</span></span><br><span class="line"><span class="comment">#使用$(CC)来生成$(MAIN_EXE),具体g++用法请自行查阅</span></span><br><span class="line"><span class="comment">#$@表示目标</span></span><br><span class="line"><span class="comment">#$^所有依赖目标的集合,以空格分隔。</span></span><br><span class="line"><span class="comment">#如果在依赖目标中有多个重复的,那个这个变量会去除重复的依赖目标,只保留一份。</span></span><br><span class="line"><span class="comment">#该命令在本makefile中被扩展为,假设$(OBJECTS) = src/bar.o</span></span><br><span class="line"><span class="comment">#hello: src/bar.o</span></span><br><span class="line"><span class="comment"># g++ -o hello bar.o -lcrypto -lssl</span></span><br><span class="line">$(MAIN_EXE): $(OBJECTS)</span><br><span class="line"> <span class="variable">$(CC)</span> -o $@ $^ <span class="variable">$(LDFLAGS)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#生成$(MAIN_EXE),需要先生成$(OBJECTS)</span></span><br><span class="line"><span class="comment">#目标是生成src/%.o, 需要src/%.cpp,其中%为通配符</span></span><br><span class="line"><span class="comment">#$< 表示依赖目标中第一个目标的名字。</span></span><br><span class="line"><span class="comment">#如果依赖目标是以模式(%)定义的,那么"$<"将是符合模式的一系列文件集。(注:是一个一个取出来的)</span></span><br><span class="line"><span class="comment">#命令被扩展为</span></span><br><span class="line"><span class="comment">#src/bar.o: src/bar.cpp</span></span><br><span class="line"><span class="comment"># g++ -o src/bar.o -c src/bar.cpp -DHAVE_DEBUG -g -O2 -Wall -Iinclude </span></span><br><span class="line">src/%.o: src/%.cpp</span><br><span class="line"> <span class="variable">$(CC)</span> -o $@ -c $< <span class="variable">$(DEFINES)</span> <span class="variable">$(CPPFLAGS)</span> <span class="variable">$(INCLUDES)</span></span><br><span class="line"></span><br><span class="line"><span class="comment">#执行clean操作</span></span><br><span class="line"><span class="title">clean:</span></span><br><span class="line"> - <span class="variable">$(RM)</span> -rf <span class="variable">$(OBJECTS)</span> <span class="variable">$(MAIN_EXE)</span></span><br></pre></td></tr></table></figure>
<p>makefile(不带注释)的下载地址<a href="http://7xpwmi.com1.z0.glb.clouddn.com/Makefile" target="_blank" rel="external">Makefile</a></p>
<p>.PHONY的用法详细可参考<a href="http://blog.chinaunix.net/uid-28458801-id-3452277.html" target="_blank" rel="external">.PHONY的用法</a></p>
<p>\$@等都是Makefile中的特殊变量,更多特殊变量请参考:<a href="http://blog.csdn.net/u012474286/article/details/20715331" target="_blank" rel="external">makefile中的特殊变量</a></p>
<h2 id="u6D4B_u8BD5"><a href="#u6D4B_u8BD5" class="headerlink" title="测试"></a>测试</h2><p>新建一个项目,要实现的功能是对字符串”1234“计算MD5值。<br>项目目录结构为:<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/MD5_tree.png" alt="项目目录结构"></p>
<p>其中Makefile的下载地址:<a href="http://7xpwmi.com1.z0.glb.clouddn.com/Makefile" target="_blank" rel="external">Makefile</a>下载。</p>
<p>include/common.h</p>
<figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">ifndef</span> _COMMON_H</span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">define</span> _COMMON_H</span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><string></span></span></span><br><span class="line"></span><br><span class="line"><span class="built_in">std</span>::<span class="function"><span class="built_in">string</span> <span class="title">get_string_MD5</span><span class="params">(<span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span> str)</span></span>; </span><br><span class="line"></span><br><span class="line"><span class="preprocessor">#<span class="keyword">endif</span></span></span><br></pre></td></tr></table></figure>
<p>src/common.cpp</p>
<figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string">"common.h"</span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><openssl/md5.h></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><string.h></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><stdio.h></span></span></span><br><span class="line"><span class="built_in">std</span>::<span class="function"><span class="built_in">string</span> <span class="title">get_string_MD5</span><span class="params">(<span class="keyword">const</span> <span class="built_in">std</span>::<span class="built_in">string</span> str)</span></span><br><span class="line"></span>{</span><br><span class="line"> MD5_CTX ctx;</span><br><span class="line"> <span class="keyword">char</span> *data = (<span class="keyword">char</span> *)str.c_str();</span><br><span class="line"> <span class="keyword">unsigned</span> <span class="keyword">char</span> md[<span class="number">16</span>]={<span class="number">0</span>};</span><br><span class="line"> <span class="keyword">char</span> buf[<span class="number">33</span>]={<span class="number">0</span>};</span><br><span class="line"> <span class="keyword">char</span> tmp[<span class="number">3</span>]={<span class="number">0</span>};</span><br><span class="line"> <span class="keyword">int</span> i;</span><br><span class="line"> </span><br><span class="line"> MD5_Init(&ctx);</span><br><span class="line"> MD5_Update(&ctx,data,<span class="built_in">strlen</span>(data));</span><br><span class="line"> MD5_Final(md,&ctx);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span>(i=<span class="number">0</span>;i<<span class="number">16</span>;i++)</span><br><span class="line"> {</span><br><span class="line"> <span class="built_in">sprintf</span>(tmp,<span class="string">"%X"</span>,md[i]);</span><br><span class="line"> <span class="built_in">strcat</span>(buf,tmp);</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">std</span>::<span class="built_in">string</span>(buf);</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>src/main.cpp</p>
<figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string"><iostream></span></span></span><br><span class="line"><span class="preprocessor">#<span class="keyword">include</span> <span class="string">"common.h"</span></span></span><br><span class="line"></span><br><span class="line"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="built_in">std</span>;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">int</span> <span class="title">main</span><span class="params">()</span></span><br><span class="line"></span>{</span><br><span class="line"> <span class="built_in">string</span> str = <span class="string">"1234"</span>;</span><br><span class="line"> <span class="built_in">string</span> after = get_string_MD5(str);</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"Before MD5: "</span><<str<<endl;</span><br><span class="line"> <span class="built_in">cout</span><<<span class="string">"After MD5: "</span><<after<<endl;</span><br><span class="line"> <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>示例需要安装openssl库;在目录下运行<code>make</code>,然后运行<code>./hello</code></p>
<p>可以看到运行结果:<img src="http://7xpwmi.com1.z0.glb.clouddn.com/MD5_result.png" alt="运行结果"></p>
<p>整个示例的下载地址:<a href="http://7xpwmi.com1.z0.glb.clouddn.com/md5.zip" target="_blank" rel="external">md5.zip</a></p>
]]></content>
<summary type="html">
<h2 id="Makefile_u662F_u4EC0_u4E48"><a href="#Makefile_u662F_u4EC0_u4E48" class="headerlink" title="Makefile是什么"></a>Makefile是什么</h2><h3 id="u5173_u4E8E_u7A0B_u5E8F_u7684_u7F16_u8BD1_u548C_u94FE_u63A5"><a href="#u5173_u4E8E_u7A0B_u5E8F_u7684_u7F16_u8BD1_u548C_u94FE_u63A5" class="headerlink" title="关于程序的编译和链接"></a>关于程序的编译和链接</h3><p> 一般来说,无论是C还是C++,首先要把源文件编译成中间代码文件,在Windows下也就是 .obj 文件,UNIX下是 .o 文件,即 Object File,这个动作叫做编译(compile),一般来说,每个源文件都应该对应于一个中间目标文件(O文件或是OBJ文件)。然后再把大量的Object File合成执行文件,这个动作叫作链接(link)。</p>
<p> 编译时,编译器需要的是语法的正确,函数与变量的声明的正确。对于后者,通常是你需要告诉编译器头文件的所在位置(头文件中应该只是声明,而定义应该放在C/C++文件中),只要所有的语法正确,编译器就可以编译出中间目标文件。<br>
</summary>
<category term="Linux" scheme="http://melonqi.cn/tags/Linux/"/>
<category term="Makefile" scheme="http://melonqi.cn/tags/Makefile/"/>
</entry>
<entry>
<title>Hello World</title>
<link href="http://melonqi.cn/2016/01/14/hello-world/"/>
<id>http://melonqi.cn/2016/01/14/hello-world/</id>
<published>2016-01-14T08:06:05.000Z</published>
<updated>2016-01-14T08:06:05.000Z</updated>
<content type="html"><![CDATA[<p>Welcome to <a href="http://hexo.io/" target="_blank" rel="external">Hexo</a>! This is your very first post. Check <a href="http://hexo.io/docs/" target="_blank" rel="external">documentation</a> for more info. If you get any problems when using Hexo, you can find the answer in <a href="http://hexo.io/docs/troubleshooting.html" target="_blank" rel="external">troubleshooting</a> or you can ask me on <a href="https://github.com/hexojs/hexo/issues" target="_blank" rel="external">GitHub</a>.<br><a id="more"></a></p>
<h2 id="Quick_Start"><a href="#Quick_Start" class="headerlink" title="Quick Start"></a>Quick Start</h2><h3 id="Create_a_new_post"><a href="#Create_a_new_post" class="headerlink" title="Create a new post"></a>Create a new post</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo new <span class="string">"My New Post"</span></span><br></pre></td></tr></table></figure>
<p>More info: <a href="http://hexo.io/docs/writing.html" target="_blank" rel="external">Writing</a></p>
<h3 id="Run_server"><a href="#Run_server" class="headerlink" title="Run server"></a>Run server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo server</span><br></pre></td></tr></table></figure>
<p>More info: <a href="http://hexo.io/docs/server.html" target="_blank" rel="external">Server</a></p>
<h3 id="Generate_static_files"><a href="#Generate_static_files" class="headerlink" title="Generate static files"></a>Generate static files</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo generate</span><br></pre></td></tr></table></figure>
<p>More info: <a href="http://hexo.io/docs/generating.html" target="_blank" rel="external">Generating</a></p>
<h3 id="Deploy_to_remote_sites"><a href="#Deploy_to_remote_sites" class="headerlink" title="Deploy to remote sites"></a>Deploy to remote sites</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo deploy</span><br></pre></td></tr></table></figure>
<p>More info: <a href="http://hexo.io/docs/deployment.html" target="_blank" rel="external">Deployment</a></p>
]]></content>
<summary type="html">
<p>Welcome to <a href="http://hexo.io/">Hexo</a>! This is your very first post. Check <a href="http://hexo.io/docs/">documentation</a> for more info. If you get any problems when using Hexo, you can find the answer in <a href="http://hexo.io/docs/troubleshooting.html">troubleshooting</a> or you can ask me on <a href="https://github.com/hexojs/hexo/issues">GitHub</a>.<br>
</summary>
</entry>
<entry>
<title>2015年总结</title>
<link href="http://melonqi.cn/2016/01/13/2015%E6%80%BB%E7%BB%93/"/>
<id>http://melonqi.cn/2016/01/13/2015总结/</id>
<published>2016-01-13T03:06:10.000Z</published>
<updated>2016-01-14T08:06:05.000Z</updated>
<content type="html"><![CDATA[<p>2015年已经过去了,长大后才会觉得时间过得如此飞快.<br><a id="more"></a><br>崭新的2016已经到来,不知不觉马上就要研究生毕业了。这一年有很多遗憾,也有很多收获,总结一下走过的路,吸取该吸取的,忘掉该忘掉的,勇敢的往前走。</p>
<h2 id="u627E_u5DE5_u4F5C"><a href="#u627E_u5DE5_u4F5C" class="headerlink" title="找工作"></a>找工作</h2><p>找工作是这一年的重中之重,奈何准备的不充分再加上后劲不足,找工作的结果虽然还算满意,但本可以更好的,比如拿个sp。最后定下来了是去腾讯,很喜欢腾讯的氛围。回想起来这一路走过来,有很多收获,也发现了自己的性格缺陷,能清楚地认识了自己。</p>
<ol>
<li>第一个也不算是offer的offer是阿里云给的,内推过程比较顺利,也给了录取意向书,由于最终的薪水问题就没有打算去了。</li>
<li>同样是内推,腾讯给我的感觉是最好的,部门氛围很轻松,薪水也比较满意就早早的签了两方。</li>
<li>由于拿到了腾讯和阿里的offer,大有想横扫BAT的想法,后来证明也就是想想。9,10月份找工作的过程比较纠结,面哪跪哪,百度跪了,360跪了。主要是心里不平衡,别人都是sp啊,股票啊,30w+啊,想证明一下自己其实不比他们差。结果被教训了一番有一番,总结原因就是问题出现在了最后一面上,自己准备不足,没有获得足够的青睐。</li>
<li>于此同时还拿到了农行,华为,楚楚街的offer,女朋友到时候解决户口问题,我就专心挣钱吧。最后就选择了腾讯,也很满意。</li>
</ol>
<p>总结一下找工作发现的问题,找工作要早早准备,其实提前半年就可以去leetcode刷刷题,看看书。自己面试没通过的最大原因是最后一面,当考官问我有什么问题想要问的时候,自己总是没有问出很好的问题,给考官的感觉是我没有认真准备。最大的收获是看清了自己的性格特点,自己不是一个帅才,以后就专心做一个将才,脚踏实地地干好每一件事,不打没准备的仗。</p>
<h2 id="u5B66_u4E1A"><a href="#u5B66_u4E1A" class="headerlink" title="学业"></a>学业</h2><p>说到学业就有点对不起自己的导师,由于研究方向是理论方向的,自己在这方面的思维比较欠缺,光是论文就改了7,8次,怪不得导师说带我真累,哈哈。学理论最大的收获就是做任何事情,都要静下心,全身心的投入,不能太有功利性。做研究要追求根本,要善于总结,最后跳出来看看问题会发现不同的视角。</p>
<p>写了两篇文章,至今都还没发表,但是对得起自己。</p>
<h2 id="u7231_u60C5"><a href="#u7231_u60C5" class="headerlink" title="爱情"></a>爱情</h2><p>与自己女朋友走过了第6个年头,等我工作赚到钱了就娶你,等我。深知自己有很多的缺点,也知道自己的情商不是很高,给亲爱的带来了很多困扰。我发现自己已经离不开你了,么么哒。</p>
<h2 id="u656C_u81EA_u5DF1"><a href="#u656C_u81EA_u5DF1" class="headerlink" title="敬自己"></a>敬自己</h2><p>这一年给我帮助的人很多,老婆,胡哥,军哥,大犇,导师,感谢你们让我快速成长。你真的过好今天,明天就不会差。相信未来是光明的!</p>
]]></content>
<summary type="html">
<p>2015年已经过去了,长大后才会觉得时间过得如此飞快.<br>
</summary>
<category term="总结" scheme="http://melonqi.cn/tags/%E6%80%BB%E7%BB%93/"/>
</entry>
<entry>
<title>宏碁4736zg起死回生记</title>
<link href="http://melonqi.cn/2016/01/12/fix/"/>
<id>http://melonqi.cn/2016/01/12/fix/</id>
<published>2016-01-12T15:42:04.000Z</published>
<updated>2016-01-14T08:06:04.000Z</updated>
<content type="html"><![CDATA[<p>奋战3小时,拯救了一台旧电脑,看到正在启动Windows的画面,幸福感爆棚.幸福是什么?幸福是在付出努力之后看到了收获的果实的满足感!<br><a id="more"></a><br>手里有一台旧电脑4736zg,突然就启动不起来了,连BIOS都进不去了,电风扇,磁盘都不工作了。经网上查过之后,是CPU背面的电容损坏了,这是宏碁的老毛病,包含笔记本突然死机,不插电启动不起来等等都是电容引起的,更换电容就可以了。</p>
<h2 id="u5DE5_u5177_u51C6_u5907"><a href="#u5DE5_u5177_u51C6_u5907" class="headerlink" title="工具准备"></a>工具准备</h2><ul>
<li>330UF 6.3V贴片电容4到6个,可淘宝</li>
<li>电烙铁,镊子</li>
<li>细电线,直接使用旧USB里面的细铜线</li>
</ul>
<h2 id="u6B65_u9AA4"><a href="#u6B65_u9AA4" class="headerlink" title="步骤"></a>步骤</h2><ol>
<li>拆机,找到要拆的电容,<img src="http://7xpwmi.com1.z0.glb.clouddn.com/%E7%94%B5%E5%AE%B9.jpg" alt="电容"></li>
<li>去除原来的电容,底层有一层铜片都要去除,不然不好焊。<br><img src="http://7xpwmi.com1.z0.glb.clouddn.com/fix1.jpg" alt="去除效果"></li>
<li>重点是焊电容,慢工出细活,自己的手艺太差,而且只焊了4个,借网上的图来做效果图。<img src="http://7xpwmi.com1.z0.glb.clouddn.com/%E6%95%88%E6%9E%9C%E5%9B%BE.jpg" alt="效果图"></li>
<li>测试。<img src="http://7xpwmi.com1.z0.glb.clouddn.com/fix2.jpg" alt="测试"></li>
<li>装机<img src="http://7xpwmi.com1.z0.glb.clouddn.com/fix3.jpg" alt="装机">效果完美,哈哈。</li>
</ol>
<h2 id="u611F_u60F3"><a href="#u611F_u60F3" class="headerlink" title="感想"></a>感想</h2><p>自己动手丰衣足食,不仅仅省了200多大洋,而且旧物回收能够拿回家让父母用,更重要的是收获的满足感和幸福感。另外在重启的过程中,电脑蓝屏,代码为7B,是引导分区的问题,借助大白菜PE完美解决。作为一个标准的程序猿,不仅仅要写的代码,还要修的了电脑,大白菜兵家常备,你值得拥有。</p>
]]></content>
<summary type="html">
<p>奋战3小时,拯救了一台旧电脑,看到正在启动Windows的画面,幸福感爆棚.幸福是什么?幸福是在付出努力之后看到了收获的果实的满足感!<br>
</summary>
<category term="修电脑" scheme="http://melonqi.cn/tags/%E4%BF%AE%E7%94%B5%E8%84%91/"/>
</entry>
<entry>
<title>使用腾讯云搭建hexo博客</title>
<link href="http://melonqi.cn/2016/01/12/Hexo-1/"/>
<id>http://melonqi.cn/2016/01/12/Hexo-1/</id>
<published>2016-01-12T03:20:20.000Z</published>
<updated>2016-01-14T08:06:04.000Z</updated>
<content type="html"><![CDATA[<p>Hexo是基于NodeJS的静态博客,搭建快,维护便利等成为了搭建个人博客的很好的一个选择. 在云服务上也能够搭建属于自己的博客,快来行动吧.<br><a id="more"></a></p>
<h2 id="u73AF_u5883_u5B89_u88C5"><a href="#u73AF_u5883_u5B89_u88C5" class="headerlink" title="环境安装"></a>环境安装</h2><p>笔者使用腾讯云申请了一台云服务器+一个.cn域名,由于目前还是学生域名免费1年,云服务器1块钱一个月:<a href="http://www.qcloud.com/event/qcloudSchool" target="_blank" rel="external">申请地址</a></p>
<p>不是学生就老老实实申请域名和服务器,或者使用github。 使用Github搭建博客参考:<a href="http://ibruce.info/2013/11/22/hexo-your-blog/" target="_blank" rel="external">Github+Hexo搭建博客</a></p>
<blockquote>
<p>hexo是一款基于Node.js的静态博客框架.</p>
<p>hexo出自台湾大学生tommy351之手,是一个基于Node.js的静态博客程序,其编译上百篇文字只需要几秒。hexo生成的静态网页可以直接放到GitHub Pages,BAE,SAE等平台上。先看看tommy是如何吐槽Octopress的 →_→<a href="https://zespia.tw/blog/2012/10/11/hexo-debut/" target="_blank" rel="external">Hexo颯爽登場</a></p>
</blockquote>
<p>Hexo需要依赖包:NodeJS</p>
<p>安装方式:<br><code>yum install nodejs npm</code></p>
<p>安装Hexo: <code>npm install -g hexo</code></p>
<h2 id="u521D_u59CB_u5316_u76EE_u5F55"><a href="#u521D_u59CB_u5316_u76EE_u5F55" class="headerlink" title="初始化目录"></a>初始化目录</h2><p>新建目录,然后进入目录运行初始化命令:</p>
<ol>
<li><code>mkdir blog</code></li>
<li><code>cd blog</code></li>
<li><code>hexo init</code></li>
<li><code>npm install</code></li>
</ol>
<p>生成静态页面,在blog目录下运行<code>hexo generate</code>,生成静态页面至hexo\public\目录。</p>
<p>目录结构为:</p>
<p><img src="http://7xpwmi.com1.z0.glb.clouddn.com/tree.png" alt="目录结构"></p>
<ul>
<li>.deploy:执行hexo deploy命令部署到GitHub上的内容目录</li>
<li>public:执行hexo generate命令,输出的静态网页内容目录</li>
<li>scaffolds:layout模板文件目录,其中的md文件可以添加编辑</li>
<li>scripts:扩展脚本目录,这里可以自定义一些javascript脚本</li>
<li>source:文章源码目录,该目录下的markdown和html文件均会被hexo处理。该页面对应repo的根目录,404文件、favicon.ico文件,CNAME文件等都应该放这里,该目录下可新建页面目录。</li>
<li>_drafts:草稿文章</li>
<li>_posts:发布文章</li>
<li>themes:主题文件目录</li>
<li>_config.yml:全局配置文件,大多数的设置都在这里</li>
<li>package.json:应用程序数据,指明hexo的版本等信息,类似于一般软件中的关于按钮</li>
</ul>
<blockquote>
<p>命令必须在init目录下执行,否则不成功,但是也不报错。<br>当你修改文章Tag或内容,不能正确重新生成内容,可以删除hexo\db.json后重试,还不行就到public目录删除对应的文件,重新生成。</p>
</blockquote>
<h2 id="u542F_u52A8_u670D_u52A1_u5668"><a href="#u542F_u52A8_u670D_u52A1_u5668" class="headerlink" title="启动服务器"></a>启动服务器</h2><p>执行如下命令,启动本地服务,进行文章预览调试。</p>
<p><code>hexo server</code></p>
<p>由于hexo默认使用4000端口,访问自己的服务器的ip来预览, 以笔者的博客为例<a href="/melonqi.cn">MelonQi Blog</a>来预览。端口用<code>hexo server -p 80</code>中修改。</p>
<blockquote>
<p>请使用高级浏览器,否则可能…你懂的!</p>
</blockquote>
<h2 id="u4E3B_u9898_u5B89_u88C5"><a href="#u4E3B_u9898_u5B89_u88C5" class="headerlink" title="主题安装"></a>主题安装</h2><p>自带的主题可能有人不喜欢,可以挑选自己所喜爱的主题,可参考<a href="https://www.zhihu.com/question/24422335" target="_blank" rel="external">有哪些好看的 Hexo 主题?</a>来选择自己喜欢的主题。</p>
<p>以<a href="https://github.com/litten/hexo-theme-yilia" target="_blank" rel="external">litten/hexo-theme-yilia</a>为例, 在blog目录运行<br><code>git clone https://github.com/litten/hexo-theme-yilia themes/yilia</code><br>其中yilia代表的是主题名,可任意命名。</p>
<p>修改yilia主题的相关信息需修改themes/yilia目录下的_config.yml;</p>
<p>切换主题需修改blog目录下的_config.yml,<code>theme: yilia</code>, 重启服务器就可以了。</p>
<h2 id="u53D1_u5E03_u6587_u7AE0"><a href="#u53D1_u5E03_u6587_u7AE0" class="headerlink" title="发布文章"></a>发布文章</h2><p>执行new命令,生成指定名称的文章至hexo\source_posts\postName.md。<br><code>hexo new [layout] "postName" #新建文章</code></p>
<p>其中layout是可选参数,默认值为post。有哪些layout呢,请到scaffolds目录下查看,这些文件名称就是layout名称。当然你可以添加自己的layout,方法就是添加一个文件即可,同时你也可以编辑现有的layout,比如post的layout默认是hexo\scaffolds\post.md.</p>
<p>看一下刚才生成的文件hexo\source_posts\postName.md,内容如下:</p>
<blockquote>
<p>title: postName #文章页面上的显示名称,可以任意修改,不会出现在URL中</p>
<p>date: 2013-12-02 15:30:16 #文章生成时间,一般不改,当然也可以任意修改</p>
<p>categories: #文章分类目录,可以为空,注意:后面有个空格</p>
<p>tags: #文章标签,可空,多标签请用格式[tag1,tag2,tag3],注意:后面有个空格</p>
<hr>
<p>这里开始使用markdown格式输入你的正文。</p>
</blockquote>
<p>正文用markdown来写,markdown语法参考<a href="http://www.appinn.com/markdown/#link" target="_blank" rel="external">markdown语法</a>。</p>
<p>更多发表的用法参考<a href="http://ibruce.info/2013/11/22/hexo-your-blog/" target="_blank" rel="external">hexo你的博客|不如</a>。</p>
]]></content>
<summary type="html">
<p>Hexo是基于NodeJS的静态博客,搭建快,维护便利等成为了搭建个人博客的很好的一个选择. 在云服务上也能够搭建属于自己的博客,快来行动吧.<br>
</summary>
<category term="hexo" scheme="http://melonqi.cn/tags/hexo/"/>
<category term="腾讯云" scheme="http://melonqi.cn/tags/%E8%85%BE%E8%AE%AF%E4%BA%91/"/>
</entry>
</feed>