-
Notifications
You must be signed in to change notification settings - Fork 1
/
2019010501.html
1 lines (1 loc) · 81.5 KB
/
2019010501.html
1
<!DOCTYPE html><html class="theme-next mist use-motion" lang="zh-Hans"><head><meta name="generator" content="Hexo 3.9.0"><meta charset="UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1"><meta name="theme-color" content="#222"><script src="/lib/pace/pace.min.js?v=1.0.2"></script><link href="/lib/pace/pace-theme-minimal.min.css?v=1.0.2" rel="stylesheet"><meta http-equiv="Cache-Control" content="no-transform"><meta http-equiv="Cache-Control" content="no-siteapp"><link href="/lib/fancybox/source/jquery.fancybox.css?v=2.1.5" rel="stylesheet" type="text/css"><link href="/lib/font-awesome/css/font-awesome.min.css?v=4.6.2" rel="stylesheet" type="text/css"><link href="/css/main.css?v=5.1.3" rel="stylesheet" type="text/css"><link rel="apple-touch-icon" sizes="180x180" href="/images/apple-touch-icon-240x240-playpi.png?v=5.1.3"><link rel="icon" type="image/png" sizes="32x32" href="/images/favicon-32x32-playpi.png?v=5.1.3"><link rel="icon" type="image/png" sizes="16x16" href="/images/favicon-16x16-playpi.png?v=5.1.3"><link rel="mask-icon" href="/images/logo-playpi.svg?v=5.1.3" color="#222"><meta name="keywords" content="GitHub Pages,SEO,百度蜘蛛,Baiduspider"><link rel="alternate" href="/atom.xml" title="虾丸派" type="application/atom+xml"><meta name="description" content="最近才发现我的静态博客站点,大部分的网页没被百度收录,除了少量的网页是我自动提交【主动推动、自动推送】的,或者手动提交的,其它的网页都不被收录【网页全部是利用自动提交的 sitemap 方式提交的,一个都没收录】。我查看百度的站长工具后台,发现通过 sitemap 方式提交链接这种方式不可行,因为百度蜘蛛采集链接信息之前需要访问 baidusitemap.xml 文件,而这个文件是在 GitHub"><meta name="keywords" content="GitHub Pages,SEO,百度蜘蛛,Baiduspider"><meta property="og:type" content="article"><meta property="og:title" content="GitHub Pages 禁止百度蜘蛛爬取的问题"><meta property="og:url" content="https://www.playpi.org/2019010501.html"><meta property="og:site_name" content="虾丸派"><meta property="og:description" content="最近才发现我的静态博客站点,大部分的网页没被百度收录,除了少量的网页是我自动提交【主动推动、自动推送】的,或者手动提交的,其它的网页都不被收录【网页全部是利用自动提交的 sitemap 方式提交的,一个都没收录】。我查看百度的站长工具后台,发现通过 sitemap 方式提交链接这种方式不可行,因为百度蜘蛛采集链接信息之前需要访问 baidusitemap.xml 文件,而这个文件是在 GitHub"><meta property="og:locale" content="zh-Hans"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ujsyasw0j20en0ie42d.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojj5hv3qj20ng0pp0uv.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojjp6f0jj20um08h3yk.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojjzz7kaj20uj0l6wf0.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojkc7sggj212a0kbgmb.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojkvts8yj20v60jmjrm.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojlapxs2j20ke0f1t9f.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojmrakf5j20v90c3dfy.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojn41t81j20r50n8jsy.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojnhzr7xj21060gqq3w.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojnn4r5bj20zf0l3q49.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojnt73w4j20rb0ppgmu.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojnz71jxj20rr04dmxf.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0r3ajynl5j20z20ne76i.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojt87natj235s1zw4k8.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojtr12gxj21hc0q9755.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0oju7wg4tj21ar0npjtb.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojur917gj20sp0hz0t7.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0oviygtn3j21hc0qxgnz.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovjinxzvj21hc0qxac2.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovk0xljij21hc0qxta3.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovkf6k08j21hc0qxtb9.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovkoigmaj20s60lymyk.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovn10ghwj20wt0mvgm6.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovncqfzuj20ww0atdfw.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovnq0lhgj20yi0m7q3i.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovl23kqbj20tl0lfabk.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pmkhqwygj20z40lojsp.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pmkls6dkj20e008mweh.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0prgtihdkj21gr060js0.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0prhfu9p8j20p40chwf7.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pri38049j21hc0rymzk.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0proxgda3j20u204q0ss.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0qmesphe5j20rb0g0q3o.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0r33rvt0oj21hc0q9tdh.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0r34hh1lxj21hc0q90xi.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pmjyvmbtj218h0qx76r.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pmjky1dwj20v50hnq3v.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0qmguu51qj21660i9ab1.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pmj6s4syj21150a4dhn.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0uis3rezoj21hc0q9wgm.jpg"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ptlseegej20p00howf6.jpg"><meta property="og:updated_time" content="2019-01-05T16:42:49.000Z"><meta name="twitter:card" content="summary"><meta name="twitter:title" content="GitHub Pages 禁止百度蜘蛛爬取的问题"><meta name="twitter:description" content="最近才发现我的静态博客站点,大部分的网页没被百度收录,除了少量的网页是我自动提交【主动推动、自动推送】的,或者手动提交的,其它的网页都不被收录【网页全部是利用自动提交的 sitemap 方式提交的,一个都没收录】。我查看百度的站长工具后台,发现通过 sitemap 方式提交链接这种方式不可行,因为百度蜘蛛采集链接信息之前需要访问 baidusitemap.xml 文件,而这个文件是在 GitHub"><meta name="twitter:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ujsyasw0j20en0ie42d.jpg"><script type="text/javascript" id="hexo.configurations">var NexT=window.NexT||{},CONFIG={root:"/",scheme:"Mist",version:"5.1.3",sidebar:{position:"left",display:"hide",offset:12,b2t:!1,scrollpercent:!0,onmobile:!1},fancybox:!0,tabs:!0,motion:{enable:!0,async:!1,transition:{post_block:"fadeIn",post_header:"slideDownIn",post_body:"slideDownIn",coll_header:"slideLeftIn",sidebar:"slideUpIn"}},duoshuo:{userId:"0",author:"博主"},algolia:{applicationID:"",apiKey:"",indexName:"",hits:{per_page:10},labels:{input_placeholder:"Search for Posts",hits_empty:"We didn't find any results for the search: ${query}",hits_stats:"${hits} results found in ${time} ms"}}}</script><link rel="canonical" href="https://www.playpi.org/2019010501.html"><title>GitHub Pages 禁止百度蜘蛛爬取的问题 | 虾丸派</title></head><body itemscope itemtype="http://schema.org/WebPage" lang="zh-Hans"><div class="container sidebar-position-left page-post-detail"><div class="headband"></div><header id="header" class="header" itemscope itemtype="http://schema.org/WPHeader"><div class="header-inner"><div class="site-brand-wrapper"><div class="site-meta"><div class="custom-logo-site-title"><a href="/" class="brand" rel="start"><span class="logo-line-before"><i></i></span> <span class="site-title">虾丸派</span> <span class="logo-line-after"><i></i></span></a></div><h1 class="site-subtitle" itemprop="description">烂笔头</h1></div><div class="site-nav-toggle"><button><span class="btn-bar"></span> <span class="btn-bar"></span> <span class="btn-bar"></span></button></div></div><nav class="site-nav"><ul id="menu" class="menu"><li class="menu-item menu-item-home"><a href="/" rel="section"><i class="menu-item-icon fa fa-fw fa-home"></i><br>首页</a></li><li class="menu-item menu-item-tags"><a href="/tags/" rel="section"><i class="menu-item-icon fa fa-fw fa-tags"></i><br>标签</a></li><li class="menu-item menu-item-categories"><a href="/categories/" rel="section"><i class="menu-item-icon fa fa-fw fa-th"></i><br>分类</a></li><li class="menu-item menu-item-archives"><a href="/archives/" rel="section"><i class="menu-item-icon fa fa-fw fa-archive"></i><br>归档</a></li><li class="menu-item menu-item-about"><a href="/about/" rel="section"><i class="menu-item-icon fa fa-fw fa-user"></i><br>关于</a></li><li class="menu-item menu-item-books"><a href="/books/" rel="section"><i class="menu-item-icon fa fa-fw fa-book"></i><br>书籍</a></li><li class="menu-item menu-item-guide"><a href="/guide/" rel="section"><i class="menu-item-icon fa fa-fw fa-location-arrow"></i><br>指南</a></li><li class="menu-item menu-item-search"><a href="javascript:;" class="popup-trigger"><i class="menu-item-icon fa fa-search fa-fw"></i><br>搜索</a></li></ul><div class="site-search"><div class="popup search-popup local-search-popup"><div class="local-search-header clearfix"><span class="search-icon"><i class="fa fa-search"></i> </span><span class="popup-btn-close"><i class="fa fa-times-circle"></i></span><div class="local-search-input-wrapper"><input autocomplete="off" placeholder="搜索..." spellcheck="false" type="text" id="local-search-input"></div></div><div id="local-search-result"></div></div></div></nav></div></header><main id="main" class="main"><div class="main-inner"><div class="content-wrap"><div id="content" class="content"><div id="posts" class="posts-expand"><article class="post post-type-normal" itemscope itemtype="http://schema.org/Article"><div class="post-block"><link itemprop="mainEntityOfPage" href="https://www.playpi.org/2019010501.html"><span hidden itemprop="author" itemscope itemtype="http://schema.org/Person"><meta itemprop="name" content="虾丸派"><meta itemprop="description" content="记录知识 | 分享技术"><meta itemprop="image" content="/images/favicon-1536x1536-playpi.png"></span><span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization"><meta itemprop="name" content="虾丸派"></span><header class="post-header"><h2 class="post-title" itemprop="name headline">GitHub Pages 禁止百度蜘蛛爬取的问题</h2><div class="post-meta"><span class="post-time"><span class="post-meta-item-text">发表于</span> <time title="创建于" itemprop="dateCreated datePublished" datetime="2019-01-05T00:42:49+08:00">2019-01-05 </time></span><span class="post-category"><span class="post-meta-divider">|</span> <span class="post-meta-item-text">分类于</span> <span itemprop="about" itemscope itemtype="http://schema.org/Thing"><a href="/categories/building/" itemprop="url" rel="index"><span itemprop="name">建站</span> </a></span></span><span id="busuanzi_container_page_pv" style="display:none"><span class="post-meta-divider">|</span> 阅读次数 <span id="busuanzi_value_page_pv"></span></span><div class="post-wordcount"><span class="post-meta-item-text">字数统计</span> <span title="字数统计">6,708字 </span><span class="post-meta-divider">|</span> <span class="post-meta-item-text">阅读时长 ≈</span> <span title="阅读时长">26分钟</span></div></div></header><div class="post-body" itemprop="articleBody"><p>最近才发现我的静态博客站点,大部分的网页没被百度收录,除了少量的网页是我自动提交【主动推动、自动推送】的,或者手动提交的,其它的网页都不被收录【网页全部是利用自动提交的 <code>sitemap</code> 方式提交的,一个都没收录】。我查看百度的站长工具后台,发现通过 <code>sitemap</code> 方式提交链接这种方式不可行,因为百度蜘蛛采集链接信息之前需要访问 <code>baidusitemap.xml</code> 文件,而这个文件是在 <code>GitHub Pages</code> 里面的,但是 <code>GitHub Pages</code> 是禁止百度蜘蛛爬取的,所以百度蜘蛛在获取 <code>baidusitemap.xml</code> 文件这一步骤就被禁止了,<code>GitHub Pages</code> 返回 403 错误【在 <code>http</code> 协议中表示禁止访问】,因此抓取失败【哪怕获取到 <code>baidusitemap.xml</code> 文件也不行,因为后续需要采集的静态网页全部是放在 <code>GitHub Pages</code> 中的,全部都会被禁止】。本文就详细描述这种现象,以及寻找可行的解决方案。</p><a id="more"></a><h1 id="问题出现"><a href="# 问题出现" class="headerlink" title="问题出现"></a>问题出现</h1><h2 id="网页收录对比差距大"><a href="# 网页收录对比差距大" class="headerlink" title="网页收录对比差距大"></a> 网页收录对比差距大</h2><p>利用搜索引擎的 <code>site</code> 搜索可以看到百度与谷歌明显的差别<br>百度搜索结果【只有少量的收录,仅有的还是通过主动推送与自动推送提交的】</p><p>上面那个图片被封了,再来一张局部截图。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ujsyasw0j20en0ie42d.jpg" alt="百度搜索结果 - 局部" title="百度搜索结果 - 局部"></p><p>谷歌搜索结果【收录很多,而且很全面】。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojj5hv3qj20ng0pp0uv.jpg" alt="谷歌搜索结果" title="谷歌搜索结果"></p><p>首先在百度站长工具【官方主页:<a href="https://ziyuan.baidu.com/" target="_blank" rel="noopener">https://ziyuan.baidu.com/</a> 】后台看到 <code>baidusitemap.xml</code> 抓取失败,查看具体原因是抓取失败【<code>http</code> 状态码 403】。</p><p>抓取失败。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojjp6f0jj20um08h3yk.jpg" alt="抓取失败" title="抓取失败"></p><p>抓取失败原因概述。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojjzz7kaj20uj0l6wf0.jpg" alt="抓取失败原因概述" title="抓取失败原因概述"></p><p>根据抓取失败原因,我还以为是文件不存在,或者根据链接打不开【链接是:</p><p><a href="https://www.playpi.org/baidusitemap.xml">https://www.playpi.org/baidusitemap.xml</a> 】,我使用浏览器和 <code>curl</code> 命令都尝试过了,链接没有问题,可以正常打开。然后根据 403 错误发现是拒绝访问,那就有可能是百度爬虫的问题了【被 <code>GitHub Pages</code> 禁止爬取了】。</p><p>使用浏览器打开。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojkc7sggj212a0kbgmb.jpg" alt="浏览器能正常打开" title="浏览器能正常打开"></p><p>这里需要注意一点,百度站长工具里面显示的链接是 <code>http</code> 开头的【如上面抓取失败原因概述截图中红框圈出的,不是 <code>https</code> 开头的,我觉得百度爬虫抓取使用的就是 <code>http</code> 开头的链接】,不过没关系,我在域名解析里面已经配置了所有的域名情况,完全可以支持。但是有时候仍然会遇到打不开上面链接的情况【在某些电脑上面或者某些网络环境中】,我猜测这可能是电脑的缓存或者当前网络的 <code>DNS</code> 设置问题,不是我的站点的问题。因为,哪怕你在浏览器中输入以 <code>http</code> 开头的链接,也会自动跳转到以 <code>https</code> 开头的链接去。</p><p>浏览器打不开链接的情况【其实不是链接的问题】。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojkvts8yj20v60jmjrm.jpg" alt="浏览器打不开链接的情况" title="浏览器打不开链接的情况"></p><p>使用命令行打开【如下使用 <code>curl</code> 命令】。</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">curl https://www.playpi.org/baidusitemap.xml</span><br></pre></td></tr></table></figure><p>执行命令结果截图。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojlapxs2j20ke0f1t9f.jpg" alt="执行命令结果" title="执行命令结果"></p><h2 id="通过百度反馈寻找原因"><a href="# 通过百度反馈寻找原因" class="headerlink" title="通过百度反馈寻找原因"></a>通过百度反馈寻找原因</h2><p>于是接下来,我就给官方提交了反馈,官方只是回复我说是链接问题【意思就是链接无法正常打开,其实使用浏览器或者检测工具都是可以打开的,但是使用百度爬虫就不行】。</p><p>提交反馈【官方主页:<a href="https://ziyuan.baidu.com/feedback/apply" target="_blank" rel="noopener">https://ziyuan.baidu.com/feedback/apply</a> 】。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojmrakf5j20v90c3dfy.jpg" alt="提交反馈" title="提交反馈"></p><p>反馈回复。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojn41t81j20r50n8jsy.jpg" alt="反馈回复" title="反馈回复"></p><p>前面我已经证明了链接没问题,那我就要猜想是百度蜘蛛爬虫的问题了,于是按照官方回复的建议,使用诊断工具看看是否可行。</p><p>诊断工具测试多次都失败。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojnhzr7xj21060gqq3w.jpg" alt="诊断工具测试多次都失败" title="诊断工具测试多次都失败"></p><p>如果抓取 <code>UA</code> 设置为移动端【即模拟手机、平板之类的设别】,会有部分成功的,而使用 <code>PC</code> 端全部都是失败的。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojnn4r5bj20zf0l3q49.jpg" alt="诊断工具 UA 代理部分成功" title="诊断工具 UA 代理部分成功"></p><p>失败原因仍旧是拒绝访问【<code>http</code> 403 状态码】。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojnt73w4j20rb0ppgmu.jpg" alt="拒绝访问" title="拒绝访问"></p><p>我又接着查看文档【文档地址:<a href="https://ziyuan.baidu.com/college/courseinfo?id=267&page=9#007" target="_blank" rel="noopener">https://ziyuan.baidu.com/college/courseinfo?id=267&page=9#007</a> 】,发现拒绝访问的原因之一就是托管服务供应商阻止百度 <code>Spider</code> 访问我的网站,所以猜测是 <code>GitHub Pages</code> 拒绝了百度 <code>Spider</code> 的爬取请求,接着就想办法验证一下猜测是否正确。</p><p>文档说明截取片段。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojnz71jxj20rr04dmxf.jpg" alt="文档说明" title="文档说明"></p><p>接下来我又查找了资料,发现网上确实有很多这种说法,而且大家都遇到了这种问题,但是并没有官方的说明放出来。</p><p>于是,接着我又回复了百度站长对方的反馈,直接问是不是因为 <code>GitHub Pages</code> 禁止了百度爬虫,所以百度爬取的结果总是 403 错误。等了 2 天多【赶上周末】,对方没有明确回复,说的都是废话,可能是不想承认,那我也不管了。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0r3ajynl5j20z20ne76i.jpg" alt="百度反馈中心再次回复" title="百度反馈中心再次回复"></p><h2 id="通过 -GitHub-Pages- 找原因"><a href="# 通过 -GitHub-Pages- 找原因" class="headerlink" title="通过 GitHub Pages 找原因"></a>通过 GitHub Pages 找原因</h2><p>另一方面,我尝试给 <code>GitHub</code> 的技术支持发送邮件询问,得到了确认的答复,<code>GitHub</code> 已经禁止了百度蜘蛛爬虫的访问,并且不保证在未来的时间恢复。主要是因为以前百度爬虫爬取太猛了,导致 <code>GitHub Pages</code> 不可用或者访问速度变慢,影响了其他正常的用户浏览使用 <code>GitHub Pages</code>,所以把百度爬虫给禁止了【当然,这是官方说法】。</p><p><code>GitHub Pages</code> 的反馈链接【填写姓名、邮箱、内容描述即可】:<a href="https://github.com/contact" target="_blank" rel="noopener">https://github.com/contact</a> ;</p><p>我发送了一封邮件过去,当然是借助谷歌翻译完成的,勉强能看。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojt87natj235s1zw4k8.jpg" alt="邮件内容" title="邮件内容"></p><p>成功发送邮件后的通知页面。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojtr12gxj21hc0q9755.jpg" alt="成功发送邮件" title="成功发送邮件"></p><p>邮件内容已经被我上传至 <code>GitHub</code>,读者可以提前下载查看:<a href="https://github.com/iplaypi/iplaypistudy/tree/master/iplaypistudy-normal/src/resource/20190105" target="_blank" rel="noopener">feedback_to_GitHub.txt</a> ,内容全文如下,仅供参考:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">A doubt with GitHub Pages</span><br><span class="line"></span><br><span class="line">Hello,</span><br><span class="line">I created my own homepage with GitHub Pages,it is https://github.com/iplaypi/iplaypi.github.io.If you input https://iplaypi.github.io,it jumps to https://www.playpi.org automatically because of CNAME file.The website is https://www.playpi.org,and my site only contains static pages and pictures.</span><br><span class="line"></span><br><span class="line">But I have a problem,the following is my detailed description:</span><br><span class="line">I use Google Search Console to crawl my pages and include them.I only need to provide a site file named website.xml,and it works fine.</span><br><span class="line"></span><br><span class="line">But when i use Baidu Webmaster Tools (a tool made by a Chinese search engine company),it doesn't work properly.I only need to provide a site file named baiduwebsite.xml,Baidu Spider will crawl the link in this file .But Baidu cannot include my pages finally,and the reason is Baidu Spider can't crawl my html pages.</span><br><span class="line"></span><br><span class="line">So,I am trying to find the real reason,then I succeeded.The real reason is Github Pages forbids the crawling of Baidu Spider.So when Baidu Spider crawls my pages,it will definitely fail.</span><br><span class="line"></span><br><span class="line">Here I want to know is this phenomenon real?If yes,why Github Pages forbids Baidu Spider?And what should i do?</span><br><span class="line"></span><br><span class="line">Thanks.</span><br><span class="line">Best regards.</span><br><span class="line">Perry</span><br></pre></td></tr></table></figure><p>没隔几个小时,就有回复了。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0oju7wg4tj21ar0npjtb.jpg" alt="GitHub 邮件回复" title="GitHub 邮件回复"></p><p>邮件回复的部分内容已经被我上传至 <code>GitHub</code>,读者可以下载查看:<a href="https://github.com/iplaypi/iplaypistudy/tree/master/iplaypistudy-normal/src/resource/20190105" target="_blank" rel="noopener">GitHub_reply_to_me.txt</a> ,回复的重点内容如下:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">I've confirmed that we are currently blocking the Baidu user agent from crawling GitHub Pages sites. We took this action in response to this user agent being responsible for an excessive amount of requests, which was causing availability issues for other GitHub customers. This is unlikely to change any time soon, so if you need the Baidu user agent to be able to crawl your site you will need to host it elsewhere.</span><br></pre></td></tr></table></figure><p>那么,我们再来回看一下百度站长里面爬取失败原因的页面,里面有一个用户代理的配置,其实就是构造 <code>http</code> 请求使用的消息头,可以看到正是 <code>Baiduspider/2.0</code>,所以才会被 <code>GitHub Pages</code> 给禁止了。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ojur917gj20sp0hz0t7.jpg" alt="百度爬虫的 UA" title="百度爬虫的 UA"></p><h1 id="解决方案"><a href="# 解决方案" class="headerlink" title="解决方案"></a>解决方案</h1><p>至此,我已经把问题的原因搞清楚了。本来这个问题是很好解决的【更换静态博客存储的主机即可,例如各种项目托管服务:码市、<code>gitcafe</code>、七牛云等,或者自己购买一台云主机】,但是我不能抛弃 <code>GitHub</code>,于是问题变得复杂了。</p><p>此时,我还有 3 个方案可以参考:</p><ul><li>使用 <code>CDN</code> 加速,把每个静态页面都缓存下来,这样百度爬虫的请求就可能不会到达 <code>GitHub Pages</code>,但是不知道有没有保证,可以试试</li><li>放弃 <strong>自动提交 </strong>方式里面的 <strong>sitemap 推送 </strong>,改为 <strong>主动推送 </strong>,<code>hexo</code> 里面有插件可以用。但是我是坚持大道至简的原则,不想再引用插件了,而且我看了那个插件,需要配置百度账号的信息,我不能把这些信息放在公共仓库里面,会暴露给别人,不想用</li><li>在更新博客的同时再部署一份相同的博客 <strong>【可以理解为镜像,需要在其它主机部署一份,可以自己搭建主机或者使用类似于 GitHub 的代码托管工具】</strong>,把 <code>master</code> 分支的内容复制过去即可,然后利用域名解析服务,把百度爬虫的流量引到这份服务器上面【只是为了让百度收录】,其他的流量仍然去访问 <code>GitHub Pages</code>,就可以让百度的爬虫顺利爬取到我的博客内容了。这个方法看起来虽然很绕,但是明白了细节实现起来就很简单,而且可靠,可以用</li></ul><h2 id="CDN- 加速"><a href="#CDN- 加速" class="headerlink" title="CDN 加速"></a>CDN 加速</h2><p>我先不选择这种方式了,因为需要收费或者免费的加广告,或者服务不稳定,我还是愿意选择稳妥的方式。可以选择的产品有:七牛云、又拍云、阿里云、腾讯云等。</p><h2 id="选择镜像方式"><a href="# 选择镜像方式" class="headerlink" title="选择镜像方式"></a>选择镜像方式</h2><p>既然选择了使用复制博客的方式,再加上域名解析服务转移流量,那接下来就开始动手部署了。我手里正好还有一台翻墙使用的 <code>VPS</code>,每个月的流量用不完,所以也不打算使用第三方托管服务了,直接部署在我自己的 <code>VPS</code> 上面就行了。只不过还需要动动手搭建一下 <code>Web</code> 服务,当然是使用强大的 <code>Nginx</code> 了。</p><h3 id="更改域名服务器和相关配置"><a href="# 更改域名服务器和相关配置" class="headerlink" title="更改域名服务器和相关配置"></a>更改域名服务器和相关配置</h3><p>1、在 <code>DNSPod</code> 中添加域名</p><p><code>DNSPod</code> 账号自行注册,我使用免费版本,当然会有一些限制,例如解析的域名 A 记录个数限制为 2 个【<code>GitHub Pages</code> 有 4 个 <code>ip</code>,我在 <code>Godaddy</code> 中都是配置 4 个,但是没影响,配置 2 个也可以。或者直接配置 <code>CNAME</code> 记录就行了,以前我不懂就配置了 <code>ip</code>,多麻烦,<code>ip</code> 还要通过 <code>ping iplaypi.github.io</code> 获取,每次还不一样,一共得到了 4 个,多此一举。当然,如果域名被墙了而 <code>ip</code> 没被墙,还是需要这样配置的】。<br><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0oviygtn3j21hc0qxgnz.jpg" alt="在 DNSPod 中添加域名" title="在 DNSPod 中添加域名"></p><p>2、添加域名解析记录</p><p>我把 <code>Godaddy</code> 中的解析记录直接抄过来就行,不同的是由于使用的是 <code>DNSPod</code> 免费版本,<code>A</code> 记录会少配置 2 个,基本不会有啥影响 <strong>【其实不配置 A 记录最好,直接配置 CNAME 就行了,会根据域名自动寻找 ip,以前我不懂】</strong>。另外还有一个就是需要针对百度爬虫专门配置一条 <code>www</code> 的 <code>A</code> 记录,针对百度的线路指向自己服务器的 <code>ip</code>【截图只是演示,其中 <code>CNAME</code> 记录应该配置域名,<code>A</code> 记录才是配置 <code>ip</code>】,如果使用的是第三方托管服务,直接添加 <code>CNAME</code> 记录,配置域名就行【例如 <code>yoursite.gitcafe.io</code>】。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovjinxzvj21hc0qxac2.jpg" alt="添加域名解析记录" title="添加域名解析记录"></p><p>不使用 <code>A</code> 记录的配置方式</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovk0xljij21hc0qxta3.jpg" alt="不使用 A 记录的配置方式" title="不使用 A 记录的配置方式"></p><p>3、在 <code>Godaddy</code> 中绑定自定义域名服务器</p><p>第 2 个步骤完成,我们回到 <code>DNSPod</code> 的域名界面,可以看到提示我们修改 <code>NS</code> 地址,如果不知道是什么意思,可以点击提示链接查看帮助手册【其实就是去购买域名的服务商那里绑定 <code>DNSPod</code> 的域名服务器】。</p><p>提示我们修改 <code>NS</code> 地址</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovkf6k08j21hc0qxtb9.jpg" alt="提示我们修改 NS 地址" title="提示我们修改 NS 地址"></p><p>帮助手册</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovkoigmaj20s60lymyk.jpg" alt="帮助手册" title="帮助手册"></p><p>我是在 <code>Godaddy</code> 中购买的域名【不需要备案】,所以需要在 <code>Godaddy</code> 中取消默认的 <code>DNS</code> 域名服务器,然后把 <code>DNSPod</code> 分配的域名服务器配置在 <code>Godaddy</code> 中。这里需要注意,在配置了新的域名服务器的时候,以前的配置的解析记录都没用了,因为 <code>Godaddy</code> 直接把域名解析的工作转给了我配置的 <code>DNSPod</code> 域名服务器【配置信息都转到了 <code>DNSPod</code> 中,也就是步骤 1、步骤 2 中的工作】。</p><p>原有的解析记录与原有的域名服务器</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovn10ghwj20wt0mvgm6.jpg" alt="原有的解析记录" title="原有的解析记录"></p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovncqfzuj20ww0atdfw.jpg" alt="原有的域名服务器" title="原有的域名服务器"></p><p>配置完成新的域名服务器【以前的解析记录都消失了】</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovnq0lhgj20yi0m7q3i.jpg" alt="配置完成新的域名服务器" title="配置完成新的域名服务器"></p><p>配置完成后使用 <strong>域名设置 </strong>里面的 <strong>自助诊断 </strong>功能,可以看到域名存在异常,主要是因为更改配置后的时间太少了,要耐心等待全球递归 DNS 服务器刷新【最多 72 小时】,不过一般 10 分钟就可以访问主页了。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ovl23kqbj20tl0lfabk.jpg" alt="自助诊断" title="自助诊断"></p><h3 id="设置镜像服务器"><a href="# 设置镜像服务器" class="headerlink" title="设置镜像服务器"></a>设置镜像服务器</h3><p>我没有使用第三方托管服务器,例如:<code>gitcafe</code>、码市、<code>coding</code>,而是直接使用自己的 <code>VPS</code>,然后搭配 <code>Nginx</code> 使用。</p><h4 id="安装 -Nginx(基于 -CentOS-7-X64)"><a href="# 安装 -Nginx(基于 -CentOS-7-X64)" class="headerlink" title="安装 Nginx(基于 CentOS 7 X64)"></a>安装 Nginx(基于 CentOS 7 X64)</h4><p><code>CentOS</code> 的安装过程参考:<a href="https://gist.github.com/ifels/c8cfdfe249e27ffa9ba1" target="_blank" rel="noopener">https://gist.github.com/ifels/c8cfdfe249e27ffa9ba1</a> 。但是,不是全部可信,抽取有用的即可。而且这种方式安装的是已经规划好的一个庞大的包,里面包含了一些常用的模块,可能有一些模块没用,而且如果自己想再安装一些新的模块,就不支持了,必须重新下源码编译安装。总而言之,这种安装方式就是给入门级别的人使用的,不能自定义。</p><p>1、由于 <code>Nginx</code> 的源头问题,先创建配置文件</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> /etc/yum.repos.d/</span><br><span class="line">vim nginx.repo</span><br></pre></td></tr></table></figure><p>填写内容</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[nginx]</span><br><span class="line">name=nginx repo</span><br><span class="line">baseurl=http://nginx.org/packages/centos/$releasever/$basearch/</span><br><span class="line">gpgcheck=0</span><br><span class="line">enabled=1</span><br></pre></td></tr></table></figure><p>2、安装配置 <code>Nginx</code></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 安装 </span></span><br><span class="line">yum install nginx -y</span><br><span class="line"><span class="comment"># 配置 </span></span><br><span class="line">vi /etc/nginx/nginx.conf</span><br></pre></td></tr></table></figure><p>以下配置内容的模板已经被我上传至 <code>GitHub</code>,读者可以下载查看:<a href="https://github.com/iplaypi/iplaypistudy/tree/master/iplaypistudy-normal/src/resource/20190105" target="_blank" rel="noopener">nginx_conf_http.template</a> 。</p><p>填写配置内容:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line">user nginx;</span><br><span class="line">worker_processes 1;</span><br><span class="line"></span><br><span class="line">error_log /var/log/nginx/error.log warn;</span><br><span class="line">pid /var/run/nginx.pid;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">events {</span><br><span class="line"> worker_connections 1024;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">http {</span><br><span class="line"> include /etc/nginx/mime.types;</span><br><span class="line"> default_type application/octet-stream;</span><br><span class="line"></span><br><span class="line"> log_format main '$remote_addr - $remote_user [$time_local] "$request" '</span><br><span class="line"> '$status $body_bytes_sent "$http_referer" '</span><br><span class="line"> '"$http_user_agent" "$http_x_forwarded_for"';</span><br><span class="line"></span><br><span class="line"> access_log /site/nginx.access.log main;</span><br><span class="line"></span><br><span class="line"> server {</span><br><span class="line"> listen 80;</span><br><span class="line"> server_name blog.playpi.org www.playpi.org;</span><br><span class="line"> access_log /site/iplaypi.github.io.access.log main;</span><br><span class="line"> root /site/iplaypi.github.io;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> sendfile on;</span><br><span class="line"> #tcp_nopush on;</span><br><span class="line"></span><br><span class="line"> keepalive_timeout 65;</span><br><span class="line"></span><br><span class="line"> #gzip on;</span><br><span class="line"></span><br><span class="line"> include /etc/nginx/conf.d/*.conf;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>3、开启 80 端口【不开启不行】,启动 <code>Nginx</code></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 查看已经开启的端口 </span></span><br><span class="line">firewall-cmd --list-ports</span><br><span class="line"><span class="comment"># 开启端口 </span></span><br><span class="line">firewall-cmd --permanent --zone=public --add-port=80/tcp</span><br><span class="line"><span class="comment"># 重载更新的端口信息 </span></span><br><span class="line">firewall-cmd --reload</span><br><span class="line"><span class="comment"># 启动 Nginx</span></span><br><span class="line"><span class="comment"># 这种方式不行,找不到目录 </span></span><br><span class="line">/etc/init.d/nginx start</span><br><span class="line"><span class="comment"># 这种方式可以 </span></span><br><span class="line">service nginx start</span><br></pre></td></tr></table></figure><h4 id="额外考虑情况"><a href="# 额外考虑情况" class="headerlink" title="额外考虑情况"></a>额外考虑情况</h4><p><strong>1、关于 https 认证</strong></p><p>要不要考虑 <code>https</code> 的情况,如果百度爬虫没用到 <code>https</code> 抓取【除了 <code>sitemap.xml</code> 文件还要考虑文件里面的所有链接格式,也是 <code>https</code> 的】,就不考虑。其实一定要考虑,因为百度爬虫用到了 <code>https</code> 链接去抓取,所以还要想办法开启 <code>Nginx</code> 的 <code>https</code>。此外,在百度的 <code>https</code> 认证里面,也是需要开启 <code>https</code> 的,否则申请不通过。</p><p>我的域名不知道什么时候验证失败了,但是一开始的时候是验证成功的【可能是 <code>GitHub Pages</code> 禁止百度爬虫的原因,因为以前全部都是 <code>GitHub Pages</code> 提供站点支持】</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pmkhqwygj20z40lojsp.jpg" alt="https 验证失败" title="https 验证失败"></p><p>我想重新验证一下,没想到有次数限制,还是先把 <code>Nginx</code> 的 <code>https</code> 开启之后再验证吧</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pmkls6dkj20e008mweh.jpg" alt="重新验证次数限制" title="重新验证次数限制"></p><p>开启 <code>Nginx</code> 的 <code>https</code>,并且保证站点全部的链接都是 <code>https</code> 的,但是同时也要支持 <code>http</code>,使用 301 重定向到 <code>https</code>。</p><p>1-1、查看 <code>Nginx</code> 的 <code>https</code> 模块</p><p>先查看我安装的小白版本的 <code>Nginx</code> 里面有没有关于 <code>https</code> 的模块,使用命令 <strong>nginx -V</strong>,可以看到是有的,这个模块就是 <strong>–with-http_ssl_module</strong>。<br><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0prgtihdkj21gr060js0.jpg" alt="查看 ssl 模块" title="查看 ssl 模块"></p><p>1-2、申请证书</p><p>可以购买或者从阿里云、腾讯云里面申请免费的,但是我还是觉得使用 <code>OpenSSL</code> 工具自己生成方便,先查看机器有没有安装 <code>OpenSSL</code> 工具,使用 <strong>openssl version</strong> 命令,如果没有则需要安装 <strong>yum install -y openssl openssl-devel</strong>,安装完成后开始生成证书。生成证书的命令:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">openssl req -x509 -nodes -days 36500 -newkey rsa:2048 -keyout /site/ssl-nginx.key -out /site/ssl-nginx.crt</span><br></pre></td></tr></table></figure><p>在生成的过程中还需要填写一些参数信息:国家、城市、机构名称、机构单位名称、域名、邮箱等,这里特别注意我为了能让多个子域名公用一个证书,采用了泛域名的方式【星号的模糊匹配:<code>*.playpi.org</code>】。这种生成证书的方式只是为了测试使用,最终的证书肯定是不可信的,浏览器会提示此证书不受信任,所以还是通过其它方式获取证书比较好【后续我会通过阿里云或者 <code>letsencrypt</code> 获取免费的证书,具体博客参考可以使用相关关键词在站内搜索 <strong>证书 </strong>,或者直接查看:<a href="https://www.playpi.org/2019030401.html">利用阿里云申请免费的 SSL 证书 </a>】。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0prhfu9p8j20p40chwf7.jpg" alt="证书参数" title="证书参数"></p><p>完整信息填写</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">Generating a 2048 bit RSA private key</span><br><span class="line">........+++</span><br><span class="line">..............+++</span><br><span class="line">writing new private key to '/site/ssl-nginx.key'</span><br><span class="line">-----</span><br><span class="line">You are about to be asked to enter information that will be incorporated</span><br><span class="line">into your certificate request.</span><br><span class="line">What you are about to enter is what is called a Distinguished Name or a DN.</span><br><span class="line">There are quite a few fields but you can leave some blank</span><br><span class="line">For some fields there will be a default value,</span><br><span class="line">If you enter '.', the field will be left blank.</span><br><span class="line">-----</span><br><span class="line">Country Name (2 letter code) [XX]:CN</span><br><span class="line">State or Province Name (full name) []:Guangdong</span><br><span class="line">Locality Name (eg, city) [Default City]:Guangzhou</span><br><span class="line">Organization Name (eg, company) [Default Company Ltd]:playpi</span><br><span class="line">Organizational Unit Name (eg, section) []:playpi</span><br><span class="line">Common Name (eg, your name or your server's hostname) []:*.playpi.org</span><br><span class="line">Email Address []:playpi@qq.com</span><br></pre></td></tr></table></figure><p>1-3、更改配置并重启 <code>Nginx</code></p><p>重新配置 <code>http</code> 与 <code>https</code> 的参数【只列出 <code>server</code> 的主要部分,<code>blog</code> 二级域名主要是为了测试使用的,<code>blog</code> 的流量全部导入我的 <code>VPS</code> 中】,特别注意 <code>rewrite</code> 的正则表达式,只替换域名部分,链接部分不能替换,否则都跳转到主页去了。</p><p>以下配置内容的模板已经被我上传至 <code>GitHub</code>,读者可以下载查看:<a href="https://github.com/iplaypi/iplaypistudy/tree/master/iplaypistudy-normal/src/resource/20190105" target="_blank" rel="noopener">nginx_conf_https.template</a> 。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"># 这里只是列出了 server 节点的部分,需要配合 nginx_conf_http.template 文件查看 </span><br><span class="line">server {</span><br><span class="line"> listen 80;</span><br><span class="line"> server_name www.playpi.org;</span><br><span class="line"> access_log /site/iplaypi.github.io.http-www-access.log main;</span><br><span class="line"> rewrite ^/(.*)$ https://www.playpi.org/$1 permanent;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> server {</span><br><span class="line"> listen 80;</span><br><span class="line"> server_name blog.playpi.org;</span><br><span class="line"> access_log /site/iplaypi.github.io.http-blog-access.log main;</span><br><span class="line"> rewrite ^/(.*)$ https://blog.playpi.org/$1 permanent;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> server {</span><br><span class="line"> listen 443 ssl;# 监听端口 </span><br><span class="line"> server_name www.playpi.org blog.playpi.org;# 域名 </span><br><span class="line"> access_log /site/iplaypi.github.io.https-access.log main;</span><br><span class="line"> root /site/iplaypi.github.io;</span><br><span class="line"> ssl_certificate /site/ssl-nginx.crt;# 证书路径 </span><br><span class="line"> ssl_certificate_key /site/ssl-nginx.key;#key 路径 </span><br><span class="line"> ssl_session_cache shared:SSL:1m;# 储存 SSL 会话的缓存类型和大小 </span><br><span class="line"> ssl_session_timeout 5m;# 配置会话超时时间 </span><br><span class="line"> ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4;# 为建立安全连接,服务器所允许的密码格式列表 </span><br><span class="line"> ssl_protocols TLSv1 TLSv1.1 TLSv1.2;</span><br><span class="line"> ssl_prefer_server_ciphers on;# 依赖 SSLv3 和 TLSv1 协议的服务器密码将优先于客户端密码 </span><br><span class="line"> #减少点击劫持 </span><br><span class="line"> add_header X-Frame-Options DENY;</span><br><span class="line"> #禁止服务器自动解析资源类型 </span><br><span class="line"> add_header X-Content-Type-Options nosniff;</span><br><span class="line"> #防 XSS 攻击 </span><br><span class="line"> add_header X-Xss-Protection 1;</span><br><span class="line"> }</span><br></pre></td></tr></table></figure><p>开启 443 端口,重启 <code>Nginx</code></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 查看已经开启的端口 </span></span><br><span class="line">firewall-cmd --list-ports</span><br><span class="line"><span class="comment"># 开启端口 </span></span><br><span class="line">firewall-cmd --permanent --zone=public --add-port=443/tcp</span><br><span class="line"><span class="comment"># 重载更新的端口信息 </span></span><br><span class="line">firewall-cmd --reload</span><br><span class="line"><span class="comment"># 验证 Nginx 配置是否准确 </span></span><br><span class="line">nginx -t</span><br><span class="line"><span class="comment"># 重新启动 Nginx</span></span><br><span class="line">nginx -s reload</span><br></pre></td></tr></table></figure><p>1-4、打开链接查看</p><p>使用 <code>blog</code> 二级域名测试【也需要在 <code>DNSPod</code> 中配置一条 <code>A</code> 记录解析规则】</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pri38049j21hc0rymzk.jpg" alt="使用 blog 二级域名测试" title="使用 blog 二级域名测试"></p><p>或者使用 <code>curl</code> 命令模拟请求,由于有重定向的问题,所以失败</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0proxgda3j20u204q0ss.jpg" alt="curl 无法获取重定向的内容" title="curl 无法获取重定向的内容"></p><p>既然开启了 <code>https</code>,可以使用 <code>curl</code> 关闭失效证书的方式【<code>-k</code> 参数】访问 <code>https</code> 链接</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0qmesphe5j20rb0g0q3o.jpg" alt="curl 关闭证书认证访问 https 链接" title="curl 关闭证书认证访问 https 链接"></p><p>去百度站长里面重新提交 <code>https</code> 认证【使用上面的测试证书是认证失败的,我去阿里云重新申请了证书,认证成功了,申请证书的教程可以在本站搜索,为了给 2 个二级域名不同的证书,<code>Nginx</code> 还需要重新配置 <code>server</code> 信息】</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0r33rvt0oj21hc0q9tdh.jpg" alt="blog 二级域名认证成功" title="blog 二级域名认证成功"></p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0r34hh1lxj21hc0q90xi.jpg" alt="www 二级域名认证成功" title="www 二级域名认证成功"></p><p><strong>2、端口的问题</strong></p><p>为什么在上面配置域名解析记录的时候,百度的 <code>A</code> 记录配置 <code>VPS</code> 的 <code>ip</code> 就行了呢,这是因为在 <code>VPS</code> 上面只有 <code>Nginx</code> 这一种 <code>Web</code> 服务,机器会分配给它一个端口【默认 80,也是 <code>http</code> 的默认端口,可以配置】,然后 <code>www</code> 的访问就使用这个端口【在 <code>Nginx</code> 的配置里面有,还有另外一个 <code>blog</code> 的】,所以可以忽略端口的信息。但是如果一台机器上面有各种 <code>Web</code> 服务,切记确保端口不要冲突【例如 <code>Tomcat</code> 和 <code>Nginx</code> 同时存在的情况】,并且给 <code>Nginx</code> 的就是 80 端口,然后如果有其它服务,可以使用 <code>Nginx</code> 做代理转发【例如把 <code>email</code> 二级域名转到一个端口,<code>blog</code> 二级域名转到另一个端口】。</p><h4 id="完善自动获取更新脚本,拉取 -mater- 分支的静态页面"><a href="# 完善自动获取更新脚本,拉取 -mater- 分支的静态页面" class="headerlink" title="完善自动获取更新脚本,拉取 mater 分支的静态页面"></a>完善自动获取更新脚本,拉取 mater 分支的静态页面</h4><p><strong>1、先用简单的方式</strong></p><p>使用 <code>git</code> 把项目克隆到:<code>/site/iplaypi.github.io</code> 即可。</p><p><strong>2、利用钩子自动拉取 master 分支内容到指定目录</strong></p><p>本来最简单的方式就是在 <code>travis</code> 自动构建的时候,把生成的静态页面直接拷贝到目标主机就行了。也就是把 <code>public</code> 目录里面的内容使用类似 <code>scp</code> 的命令拷贝到我的服务器即可。但是,我觉得这种方式太简易,我还是想利用起来 <code>GitHub</code> 的钩子功能,在项目有 <code>push</code> 发生的时候,自动触发我服务器上面的脚本,然后脚本就会执行 <code>pull</code> 的操作。</p><p>详情见我的另外一篇博客:<a href="https://www.playpi.org/2019030601.html">使用 Github 的 WebHooks 实现代码自动更新 </a>。</p><h2 id="验证结果"><a href="# 验证结果" class="headerlink" title="验证结果"></a> 验证结果</h2><p>以下验证都是在没有开启 <code>https</code> 的情况下,即没有对 <code>http</code> 进行 301 重定向,如果做了 301 重定向截图内容会有一点不一样,<code>curl</code> 也会直接失败【需要访问 <code>https</code> 格式的链接】。</p><p>使用最简单的方式验证就是在百度站长工具里面使用 <strong>抓取诊断 </strong>来进行模拟抓取多次,看看成功率是否是 100%。通过测试,可以看到,每次抓取都会成功,那么接下来就等待百度自己抓取了【百度爬虫抓取 <code>sitemap.xml</code> 文件的频率很低,可能要等一周】。</p><p>使用抓取诊断方式来验证,这个过程有一个插曲,就是无论怎么验证都是失败的,但是使用 <code>curl</code> 模拟请求却是成功的。我看了失败原因概述里面,抓取的 <code>ip</code> 地址仍旧是 <code>GitHub Pages</code> 的,说明百度爬虫的流量没有到我自己的 <code>VPS</code> 上面。我一开始还以为是 <code>DNSPod</code> 配置没生效,但是通过 <code>curl</code> 模拟请求却可以,说明 <code>DNSPod</code> 配置没问题,那就是百度的问题了,应该是缓存。后来,我在移动端 <code>UA</code> 与 <code>PC</code> 端 <code>UA</code> 切换了一下,然后就行了。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pmjyvmbtj218h0qx76r.jpg" alt="使用抓取诊断方式来验证" title="使用抓取诊断方式来验证"></p><p>此外,既然我们知道了百度爬虫设置的用户代理,那么就可以直接使用 <code>curl</code> 命令来模拟百度爬虫的请求,观察返回的 <code>http</code> 结果是否正常。模拟命令如下:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">curl -A <span class="string">"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"</span> http://blog.playpi.org/baidusitemap.xml</span><br></pre></td></tr></table></figure><p>模拟请求的结果,可以看到也是正常的【下面的截图在没有开启 <code>https</code> 的情况下,如果开启 301 重定向就不行了,需要直接访问 <code>https</code> 链接】。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pmjky1dwj20v50hnq3v.jpg" alt="模拟请求的结果" title="模拟请求的结果"></p><p>如果开启了 <code>https</code>,即对 <code>http</code> 请求进行 301 重定向,则可以直接访问 <code>https</code> 链接【如果证书是无效的,像我截图中的,则可以使用 <code>curl</code> 关闭无效证书的方式,加一个 <code>-k</code> 参数】。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0qmguu51qj21660i9ab1.jpg" alt="curl 关闭证书认证访问 https 链接" title="curl 关闭证书认证访问 https 链接"></p><p>我也去看了 <code>VPS</code> 上面的 <code>Nginx</code> 日志,确实百度爬虫的流量都被引入到这里来了,皆大欢喜。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0pmj6s4syj21150a4dhn.jpg" alt="Nginx 日志" title="Nginx 日志"></p><p>后续还需要观察看看百度的收录结果【等待 3 天后更新了,结果如下】。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0uis3rezoj21hc0q9wgm.jpg" alt="sitemap 方式提交链接生效" title="sitemap 方式提交链接生效"></p><h1 id="问题总结"><a href="# 问题总结" class="headerlink" title="问题总结"></a>问题总结</h1><p>1、这篇博客耗费了我一个多月才完成,当然不是写了一个多月,而是从发现问题到解决问题,最终写成这篇博客,前后经历了一个多月。在这一个多月里,我看了很多别人的博客,问了一些人,也看了一些技术资料,学到了很多以前不了解的知识,而且通过动手去解决问题,整个过程收获颇丰。</p><p>2、写 <code>Markdown</code> 文档,使用代码块标记的时候,使用 3 个反单引号来标记,如果不熟悉代码块里面的编程语言,可以省略类型【例如 <code>java</code>、<code>bash</code>、<code>javascript</code>】,不要填写,否则填错了生成的 <code>html</code> 静态文件是空白的。还有就是如果代码块里面放的是一段英文文本,和编程语言无关,也不要填写类型,否则生成的 <code>html</code> 静态文件也是空白的。</p><p>3、通过实战学习了一些网络知识,例如:<code>CNAME</code>、<code>A</code> 记录、域名服务器、二级域名等、<code>https</code> 证书,也学习了一些关于 <code>Nginx</code> 的知识。</p><p>4、关于访问速度的问题,<code>GitHub Pages</code> 的 <code>CDN</code> 还是很强大的,不会出现卡顿的情况。但是有时候貌似 <code>GitHub</code> 会被墙,打不开。此外,我搞这么久就是为了让百度爬虫能收录我的站点文章,所以自己搭建的 <code>VPS</code> 只是为了给百度爬虫爬取用的,其它正常人或者爬虫仍旧是访问 <code>GitHub Pages</code> 的链接。</p><p>5、关于 <code>https</code>,使用 <code>GitHub Pages</code> 的时候,服务全部是 <code>GitHub Pages</code> 提供的,我无需关心。但是,自己使用 <code>VPS</code> 做了一个镜像,就需要配置一模一样的环境给百度爬虫使用,否则会导致一些失败的现象,例如 <code>htps</code> 认证失败、链接抓取失败。因此,一定要开启 <code>https</code>,并且同时也支持 <code>http</code>。以下是整理的网络请求流程图,清晰明了。</p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/old/b7f2e3a3gy1g0ptlseegej20p00howf6.jpg" alt="网络请求流程图" title="网络请求流程图"></p><p>以上流程图的原始文件已经被我上传至 <code>GitHub</code>,读者可以下载使用:<a href="https://github.com/iplaypi/iplaypistudy/tree/master/iplaypistudy-normal/src/resource/20190105" target="_blank" rel="noopener">网络请求流程图.gliffy</a> 。</p></div><div><div id="wechat_subscriber" style="display:block;padding:10px 0;margin:20px auto;width:100%;text-align:center"><img id="wechat_subscriber_qcode" src="/images/wechat-qr-personal.jpg" alt="虾丸派 wechat" style="width:200px;max-width:100%"><div>扫一扫添加博主,进技术交流群,共同学习进步</div></div></div><div><div style="padding:10px 0;margin:20px auto;width:90%;text-align:center"><div>永不止步</div><button id="rewardButton" disable="enable" onclick='var qr=document.getElementById("QR");"none"===qr.style.display?qr.style.display="block":qr.style.display="none"'><span>打赏</span></button><div id="QR" style="display:none"><div id="wechat" style="display:inline-block"><img id="wechat_qr" src="/images/wechat-pay-playpi.png" alt="虾丸派 微信支付"><p>微信支付</p></div></div></div></div><div><ul class="post-copyright"><li class="post-copyright-author"><strong>本文作者:</strong> 虾丸派</li><li class="post-copyright-link"><strong>本文链接:</strong> <a href="https://www.playpi.org/2019010501.html" title="GitHub Pages 禁止百度蜘蛛爬取的问题">https://www.playpi.org/2019010501.html</a></li><li class="post-copyright-license"><strong>版权声明: </strong>本博客所有文章除特别声明外,均采用 <a href="https://creativecommons.org/licenses/by-nc-sa/3.0/" rel="external nofollow" target="_blank">CC BY-NC-SA 3.0</a> 许可协议。转载请注明出处!</li></ul></div><footer class="post-footer"><div class="post-tags"><a href="/tags/building-chn/" rel="tag"><i class="fa fa-tag"></i> 建站</a> <a href="/tags/GitHub-Pages/" rel="tag"><i class="fa fa-tag"></i> GitHub Pages</a> <a href="/tags/SEO/" rel="tag"><i class="fa fa-tag"></i> SEO</a> <a href="/tags/Baidu-Spider-chn/" rel="tag"><i class="fa fa-tag"></i> 百度蜘蛛</a> <a href="/tags/Baiduspider/" rel="tag"><i class="fa fa-tag"></i> Baiduspider</a></div><div class="post-nav"><div class="post-nav-next post-nav-item"><a href="/2019010101.html" rel="next" title="微博电影文稿备份"><i class="fa fa-chevron-left"></i> 微博电影文稿备份</a></div><span class="post-nav-divider"></span><div class="post-nav-prev post-nav-item"><a href="/2019010601.html" rel="prev" title="使用 Gson 将 = 转为 u003d 的问题">使用 Gson 将 = 转为 u003d 的问题 <i class="fa fa-chevron-right"></i></a></div></div></footer></div></article><div class="post-spread"></div></div></div><div class="comments" id="comments"><div id="vcomments"></div></div></div><div class="sidebar-toggle"><div class="sidebar-toggle-line-wrap"><span class="sidebar-toggle-line sidebar-toggle-line-first"></span> <span class="sidebar-toggle-line sidebar-toggle-line-middle"></span> <span class="sidebar-toggle-line sidebar-toggle-line-last"></span></div></div><aside id="sidebar" class="sidebar"><div class="sidebar-inner"><ul class="sidebar-nav motion-element"><li class="sidebar-nav-toc sidebar-nav-active" data-target="post-toc-wrap">文章目录</li><li class="sidebar-nav-overview" data-target="site-overview-wrap">站点概览</li></ul><section class="site-overview-wrap sidebar-panel"><div class="site-overview"><div class="site-author motion-element" itemprop="author" itemscope itemtype="http://schema.org/Person"><img class="site-author-image" itemprop="image" src="/images/favicon-1536x1536-playpi.png" alt="虾丸派"><p class="site-author-name" itemprop="name">虾丸派</p><p class="site-description motion-element" itemprop="description">记录知识 | 分享技术</p></div><nav class="site-state motion-element"><div class="site-state-item site-state-posts"><a href="/archives/"><span class="site-state-item-count">144</span> <span class="site-state-item-name">日志</span></a></div><div class="site-state-item site-state-categories"><a href="/categories/index.html"><span class="site-state-item-count">13</span> <span class="site-state-item-name">分类</span></a></div><div class="site-state-item site-state-tags"><a href="/tags/index.html"><span class="site-state-item-count">294</span> <span class="site-state-item-name">标签</span></a></div></nav><div class="feed-link motion-element"><a href="/atom.xml" rel="alternate"><i class="fa fa-rss"></i> RSS</a></div><div class="links-of-author motion-element"><span class="links-of-author-item"><a href="https://github.com/iplaypi" target="_blank" title="GitHub"><i class="fa fa-fw fa-github"></i>GitHub</a> </span><span class="links-of-author-item"><a href="https://weibo.com/u/3086148515" target="_blank" title="微博"><i class="fa fa-fw fa-weibo"></i>微博</a> </span><span class="links-of-author-item"><a href="mailto:playpi@qq.com" target="_blank" title="E-Mail"><i class="fa fa-fw fa-envelope"></i>E-Mail</a></span></div><div class="cc-license motion-element" itemprop="license"><a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" class="cc-opacity" target="_blank" rel="external nofollow"><img src="/images/cc-by-nc-sa.svg" alt="Creative Commons"></a></div><div class="links-of-blogroll motion-element links-of-blogroll-inline"><div class="links-of-blogroll-title"><i class="fa fa-fw fa-link"></i> 友情链接</div><ul class="links-of-blogroll-list"><li class="links-of-blogroll-item"><a href="https://github.com/iplaypi" title="GitHub" target="_blank" rel="external nofollow">GitHub</a></li><li class="links-of-blogroll-item"><a href="https://weibo.com/u/3086148515" title="Weibo" target="_blank" rel="external nofollow">Weibo</a></li><li class="links-of-blogroll-item"><a href="https://www.playpi.org" title="虾丸派" target="_blank" rel="external nofollow">虾丸派</a></li><li class="links-of-blogroll-item"><a href="https://www.playpi.org" title="playpi" target="_blank" rel="external nofollow">playpi</a></li><li class="links-of-blogroll-item"><a href="https://www.liaoxuefeng.com" title="廖雪峰" target="_blank" rel="external nofollow">廖雪峰</a></li><li class="links-of-blogroll-item"><a href="http://www.ruanyifeng.com" title="阮一峰" target="_blank" rel="external nofollow">阮一峰</a></li><li class="links-of-blogroll-item"><a href="https://travis-ci.org/iplaypi/iplaypi.github.io" title="travis-ci" target="_blank" rel="external nofollow">travis-ci</a></li><li class="links-of-blogroll-item"><a href="https://www.vultr.com/?ref=7861302-4F" title="Vultr" target="_blank" rel="external nofollow">Vultr</a></li></ul></div></div></section><section class="post-toc-wrap motion-element sidebar-panel sidebar-panel-active"><div class="post-toc"><div class="post-toc-content"><ol class="nav"><li class="nav-item nav-level-1"><a class="nav-link" href="#问题出现"><span class="nav-number">1.</span> <span class="nav-text">问题出现</span></a><ol class="nav-child"><li class="nav-item nav-level-2"><a class="nav-link" href="#网页收录对比差距大"><span class="nav-number">1.1.</span> <span class="nav-text">网页收录对比差距大</span></a></li><li class="nav-item nav-level-2"><a class="nav-link" href="#通过百度反馈寻找原因"><span class="nav-number">1.2.</span> <span class="nav-text">通过百度反馈寻找原因</span></a></li><li class="nav-item nav-level-2"><a class="nav-link" href="#通过 -GitHub-Pages- 找原因"><span class="nav-number">1.3.</span> <span class="nav-text">通过 GitHub Pages 找原因</span></a></li></ol></li><li class="nav-item nav-level-1"><a class="nav-link" href="#解决方案"><span class="nav-number">2.</span> <span class="nav-text">解决方案</span></a><ol class="nav-child"><li class="nav-item nav-level-2"><a class="nav-link" href="#CDN- 加速"><span class="nav-number">2.1.</span> <span class="nav-text">CDN 加速</span></a></li><li class="nav-item nav-level-2"><a class="nav-link" href="#选择镜像方式"><span class="nav-number">2.2.</span> <span class="nav-text">选择镜像方式</span></a><ol class="nav-child"><li class="nav-item nav-level-3"><a class="nav-link" href="#更改域名服务器和相关配置"><span class="nav-number">2.2.1.</span> <span class="nav-text">更改域名服务器和相关配置</span></a></li><li class="nav-item nav-level-3"><a class="nav-link" href="#设置镜像服务器"><span class="nav-number">2.2.2.</span> <span class="nav-text">设置镜像服务器</span></a><ol class="nav-child"><li class="nav-item nav-level-4"><a class="nav-link" href="#安装 -Nginx(基于 -CentOS-7-X64)"><span class="nav-number">2.2.2.1.</span> <span class="nav-text">安装 Nginx(基于 CentOS 7 X64)</span></a></li><li class="nav-item nav-level-4"><a class="nav-link" href="#额外考虑情况"><span class="nav-number">2.2.2.2.</span> <span class="nav-text">额外考虑情况</span></a></li><li class="nav-item nav-level-4"><a class="nav-link" href="#完善自动获取更新脚本,拉取 -mater- 分支的静态页面"><span class="nav-number">2.2.2.3.</span> <span class="nav-text">完善自动获取更新脚本,拉取 mater 分支的静态页面</span></a></li></ol></li></ol></li><li class="nav-item nav-level-2"><a class="nav-link" href="#验证结果"><span class="nav-number">2.3.</span> <span class="nav-text">验证结果</span></a></li></ol></li><li class="nav-item nav-level-1"><a class="nav-link" href="#问题总结"><span class="nav-number">3.</span> <span class="nav-text">问题总结</span></a></li></ol></div></div></section></div></aside></div></main><footer id="footer" class="footer"><div class="footer-inner"><div class="copyright">© 2016–<span itemprop="copyrightYear">2021</span> <span class="post-meta-divider">|</span> <span class="with-love"><i class="fa fa-heart"></i> </span><span class="author" itemprop="copyrightHolder">虾丸派</span> <span class="post-meta-divider">|</span> <span class="post-meta-item-icon"><i class="fa fa-area-chart"></i> </span><span class="post-meta-item-text">全站字数统计</span> <span title="全站字数统计">326.3k 字</span></div><div class="powered-by">由 <a class="theme-link" target="_blank" href="https://hexo.io" rel="external nofollow">Hexo</a> 强力驱动</div><span class="post-meta-divider">|</span><div class="theme-info">主题 <a class="theme-link" target="_blank" href="https://github.com/iissnan/hexo-theme-next" rel="external nofollow">NexT.Mist</a><script async src="//busuanzi.ibruce.info/busuanzi/2.3/busuanzi.pure.mini.js"></script><span id="busuanzi_container_site_pv" style="display:none"><span class="post-meta-divider">|</span> 总访问量 <span id="busuanzi_value_site_pv"></span> 次 </span><span id="busuanzi_container_site_uv" style="display:none"><span class="post-meta-divider">|</span> 总访客 <span id="busuanzi_value_site_uv"></span> 人</span></div><div class="busuanzi-count"><script async src="https://dn-lbstatics.qbox.me/busuanzi/2.3/busuanzi.pure.mini.js"></script></div></div></footer><div class="back-to-top"><i class="fa fa-arrow-up"></i> <span id="scrollpercent"><span>0</span>%</span></div></div><script type="text/javascript">"[object Function]"!==Object.prototype.toString.call(window.Promise)&&(window.Promise=null)</script><script type="text/javascript" src="/lib/jquery/index.js?v=2.1.3"></script><script type="text/javascript" src="/lib/fastclick/lib/fastclick.min.js?v=1.0.6"></script><script type="text/javascript" src="/lib/jquery_lazyload/jquery.lazyload.js?v=1.9.7"></script><script type="text/javascript" src="/lib/velocity/velocity.min.js?v=1.2.1"></script><script type="text/javascript" src="/lib/velocity/velocity.ui.min.js?v=1.2.1"></script><script type="text/javascript" src="/lib/fancybox/source/jquery.fancybox.pack.js?v=2.1.5"></script><script type="text/javascript" src="/js/src/utils.js?v=5.1.3"></script><script type="text/javascript" src="/js/src/motion.js?v=5.1.3"></script><script type="text/javascript" src="/js/src/scrollspy.js?v=5.1.3"></script><script type="text/javascript" src="/js/src/post-details.js?v=5.1.3"></script><script type="text/javascript" src="/js/src/bootstrap.js?v=5.1.3"></script><script src="//unpkg.com/valine@1.3.7/dist/Valine.min.js"></script><script type="text/javascript">new Valine({av:AV,el:"#comments",verify:!1,notify:!1,app_id:"FC5Jijeg1meo2K2OzPYWK327-gzGzoHsz",app_key:"6A1ReY8tjhPutK00F01YbJSq",placeholder:"没有问题吗?"})</script><script type="text/javascript">var isfetched=!1,isXml=!0,search_path="search.xml";0===search_path.length?search_path="search.xml":/json$/i.test(search_path)&&(isXml=!1);var path="/"+search_path,onPopupClose=function(t){$(".popup").hide(),$("#local-search-input").val(""),$(".search-result-list").remove(),$("#no-result").remove(),$(".local-search-pop-overlay").remove(),$("body").css("overflow","")};function proceedsearch(){$("body").append('<div class="search-popup-overlay local-search-pop-overlay"></div>').css("overflow","hidden"),$(".search-popup-overlay").click(onPopupClose),$(".popup").toggle();var t=$("#local-search-input");t.attr("autocapitalize","none"),t.attr("autocorrect","off"),t.focus()}var searchFunc=function(t,e,s){"use strict";$("body").append('<div class="search-popup-overlay local-search-pop-overlay"><div id="search-loading-icon"><i class="fa fa-spinner fa-pulse fa-5x fa-fw"></i></div></div>').css("overflow","hidden"),$("#search-loading-icon").css("margin","20% auto 0 auto").css("text-align","center"),$.ajax({url:t,dataType:isXml?"xml":"json",async:!0,success:function(t){isfetched=!0,$(".popup").detach().appendTo(".header-inner");var o=isXml?$("entry",t).map(function(){return{title:$("title",this).text(),content:$("content",this).text(),url:$("url",this).text()}}).get():t,n=document.getElementById(e),r=document.getElementById(s),t=function(){var m=n.value.trim().toLowerCase(),x=m.split(/[\s\-]+/);1<x.length&&x.push(m);var e,w=[];0<m.length&&o.forEach(function(t){var e=!1,o=0,h=0,n=t.title.trim(),r=n.toLowerCase(),s=t.content.trim().replace(/<[^>]+>/g,""),a=s.toLowerCase(),i=decodeURIComponent(t.url),c=[],l=[];if(""!=n&&(x.forEach(function(t){function e(t,e,o){var n=t.length;if(0===n)return[];var r,s=0,a=[];for(o||(e=e.toLowerCase(),t=t.toLowerCase());-1<(r=e.indexOf(t,s));)a.push({position:r,word:t}),s=r+n;return a}c=c.concat(e(t,r,!1)),l=l.concat(e(t,a,!1))}),(0<c.length||0<l.length)&&(e=!0,o=c.length+l.length)),e){function p(t,e,o,n){for(var r=n[n.length-1],s=r.position,a=r.word,i=[],c=0;s+a.length<=o&&0!=n.length;){a===m&&c++,i.push({position:s,length:a.length});var l=s+a.length;for(n.pop();0!=n.length&&(s=(r=n[n.length-1]).position,a=r.word,s<l);)n.pop()}return h+=c,{hits:i,start:e,end:o,searchTextCount:c}}[c,l].forEach(function(t){t.sort(function(t,e){return e.position!==t.position?e.position-t.position:t.word.length-e.word.length})});t=[];0!=c.length&&t.push(p(0,0,n.length,c));for(var u=[];0!=l.length;){var f=l[l.length-1],d=f.position,g=f.word,v=d-20,f=d+80;v<0&&(v=0),(f=f<d+g.length?d+g.length:f)>s.length&&(f=s.length),u.push(p(0,v,f,l))}u.sort(function(t,e){return t.searchTextCount!==e.searchTextCount?e.searchTextCount-t.searchTextCount:t.hits.length!==e.hits.length?e.hits.length-t.hits.length:t.start-e.start});e=parseInt("1");function $(o,t){var n="",r=t.start;return t.hits.forEach(function(t){n+=o.substring(r,t.position);var e=t.position+t.length;n+='<b class="search-keyword">'+o.substring(t.position,e)+"</b>",r=e}),n+=o.substring(r,t.end)}0<=e&&(u=u.slice(0,e));var C="";0!=t.length?C+="<li><a href='"+i+"' class='search-result-title'>"+$(n,t[0])+"</a>":C+="<li><a href='"+i+"' class='search-result-title'>"+n+"</a>",u.forEach(function(t){C+="<a href='"+i+'\'><p class="search-result">'+$(s,t)+"...</p></a>"}),C+="</li>",w.push({item:C,searchTextCount:h,hitCount:o,id:w.length})}}),1===x.length&&""===x[0]?r.innerHTML='<div id="no-result"><i class="fa fa-search fa-5x" /></div>':0===w.length?r.innerHTML='<div id="no-result"><i class="fa fa-frown-o fa-5x" /></div>':(w.sort(function(t,e){return t.searchTextCount!==e.searchTextCount?e.searchTextCount-t.searchTextCount:t.hitCount!==e.hitCount?e.hitCount-t.hitCount:e.id-t.id}),e='<ul class="search-result-list">',w.forEach(function(t){e+=t.item}),e+="</ul>",r.innerHTML=e)};n.addEventListener("input",t),$(".local-search-pop-overlay").remove(),$("body").css("overflow",""),proceedsearch()}})};$(".popup-trigger").click(function(t){t.stopPropagation(),!1===isfetched?searchFunc(path,"local-search-input","local-search-result"):proceedsearch()}),$(".popup-btn-close").click(onPopupClose),$(".popup").click(function(t){t.stopPropagation()}),$(document).on("keyup",function(t){27===t.which&&$(".search-popup").is(":visible")&&onPopupClose()})</script><script>!function(){var t=document.createElement("script"),e=window.location.protocol.split(":")[0];t.src="https"===e?"https://zz.bdstatic.com/linksubmit/push.js":"http://push.zhanzhang.baidu.com/push.js";e=document.getElementsByTagName("script")[0];e.parentNode.insertBefore(t,e)}()</script><script type="text/javascript" src="/js/src/js.cookie.js?v=5.1.3"></script><script type="text/javascript" src="/js/src/scroll-cookie.js?v=5.1.3"></script><script src="/live2dw/lib/L2Dwidget.min.js?094cbace49a39548bed64abff5988b05"></script><script>L2Dwidget.init({pluginRootPath:"live2dw/",pluginJsPath:"lib/",pluginModelPath:"assets/",tagMode:!1,debug:!1,model:{scale:1,jsonPath:"/live2dw/assets/hijiki.model.json"},display:{position:"left",width:100,height:200,hOffset:0,vOffset:-20},mobile:{show:!1,motion:!0,scale:.3},log:!1})</script></body></html>