最近想对博客做一下SEO优化，但是里面还是遇到了一些坑的，写来记录一下。

Hexo做SEO优化

对Hexo做SEO优化主要做了以下几个部分：

首页title优化

更改index.swig文件(your-hexo-site\themes\next\layout\index.swig),将

1	{% block title %} {{ config.title }} {% endblock %}

改为

1	{% block title %} {{ config.title }} - {{ theme.description }} {% endblock %}

这样更符合网站名称 - 网站描述这种格式。

生成sitemap并提交百度

1.安装sitemap自动生成插件

1 2	npm install hexo-generator-sitemap --save npm install hexo-generator-baidu-sitemap --save

2.在主题配置文件中添加配置

sitemap: 
  path: sitemap.xml
baidusitemap:
  path: baidusitemap.xml

3.在站点配置文件中修改URL

1	url: http://www.admintony.com

添加nofollow标签

nofollow标签是由谷歌领头创新的一个“反垃圾链接”的标签，并被百度、yahoo等各大搜索引擎广泛支持，引用nofollow标签的目的是：用于指示搜索引擎不要追踪（即抓取）网页上的带有nofollow属性的任何出站链接，以减少垃圾链接的分散网站权重。

以hexo的NexT主题为例，需要修改两处:

1.找到footer.swig，路径在your-hexo-site\themes\next\layout\_partials

将下面代码

1	{{ __('footer.powered', '<a class="theme-link" href="http://hexo.io">Hexo</a>') }}

改成

1	{{ __('footer.powered', '<a class="theme-link" href="http://hexo.io" rel="external nofollow">Hexo</a>') }}

将下面代码

1	<a class="theme-link" href="https://github.com/iissnan/hexo-theme-next">

改成

1	<a class="theme-link" href="https://github.com/iissnan/hexo-theme-next" rel="external nofollow">

2.修改sidebar.swig文件，路径在your-hexo-site\themes\next\layout_macro

将下面代码

1	<a href="{{ link }}" target="_blank">{{ name }}</a>

改成

1	<a href="{{ link }}" target="_blank" rel="external nofollow">{{ name }}</a>

将下面代码

1	<a href="http://creativecommons.org/licenses/{{ theme.creative_commons }}/4.0" class="cc-opacity" target="_blank">

改成

1	<a href="http://creativecommons.org/licenses/{{ theme.creative_commons }}/4.0" class="cc-opacity" target="_blank" rel="external nofollow">

可以使用chinaz站长工具进行各项检测。

添加robots.txt

1.添加蜘蛛协议（放在blog\source目录下）

# hexo robots.txt
User-agent: *
Allow: /
Allow: /archives/

Disallow: /vendors/
Disallow: /js/
Disallow: /css/
Disallow: /fonts/
Disallow: /vendors/
Disallow: /fancybox/

Sitemap: http://www.admintony.com/sitemap.xml
Sitemap: http://www.admintony.com/baidusitemap.xml

2.在百度站长平台监测并更新Robots

修改文章链接

HEXO默认的文章链接形式为domain/year/month/day/postname，默认就是一个四级url，并且可能造成url过长，对搜索引擎是十分不友好的，我们可以改成 domain/postname的形式。编辑站点_config.yml文件，修改其中的permalink字段改为permalink: :title.html即可。

keywords和description

在\scaffolds\post.md中添加如下代码，用于生成的文章中添加关键字和描述。

1 2	keywords: description:

在\themes\next\layout\_partials\head.swig有如下代码，用于生成文章的keywords。

{% if page.keywords %}
  <meta name="keywords" content="{{ page.keywords }}" />
{% elif page.tags and page.tags.length %}
  <meta name="keywords" content="{% for tag in page.tags %}{{ tag.name }},{% endfor %}" />
{% elif theme.keywords %}
  <meta name="keywords" content="{{ theme.keywords }}" />
{% endif %}

文章的摘要会变为description。

主动推送插件

1.安装hexo-baidu-url-submit插件

1	npm install hexo-baidu-url-submit --save

2.配置站点的_config.yml文件

baidu_url_submit:
  count: 3 ## 比如3，代表提交最新的三个链接
  host: www.admintony.com ## 在百度站长平台中注册的域名
  token: your_token ## 请注意这是您的秘钥， 请不要发布在公众仓库里!
  path: baidu_urls.txt ## 文本文档的地址， 新链接会保存在此文本文档里

3.加入新的deployer(站点的_config.yml文件)

deploy:
- type: git
  repo:
    coding: https://用户名:密码@git.coding.net/TinyJay/blog.git,master
- type: baidu_url_submitter # 新加入的

从github迁至coding

github是禁止百度蜘蛛爬行的，所以无法收录，因此将博客从Github迁移到了coding上，coding没有屏蔽百度蜘蛛，很方便被收录。

遇到的问题

主要遇到了以下几点问题：

1.admintony.com和wwww.admintony.com收录情况不同

不加www的属于顶级域名；加www的属于顶级域名的子域名，对搜索引擎来说这两种域名是不同的站点。从而自然收录也会不一样。查收录时加www查到的数据是带www的这个制定网址网站的收录量；不加www的则还包括所有的二级域名网页收录在内。所以一般情况下，不加www比加www的收录量要大。

2.URL规则已经改成http://域名/文章名.html，但收录URL却是http://域名/年份/月份/天/文章名/

咨询了李春以后，得知，百度先将url对应的内容爬取到百度的数据库，然后再从数据库中去看内容是否符合百度的展示要求，因此爬去URL的内容和收录是有时间间隔的，所以我更换了URL规则，却收录的还是旧的URL规则。并且李春提示我要在百度进行网站改版。

网站改版

规则改版

新旧链接301

要在旧链接中添加301跳转到新链接，301状态码表示永久性的迁移到新链接，用html实现代码如下：

1	<META HTTP-EQUIV=REFRESH CONTENT="5;URL=http://www.admintony.com/SVN源代码泄露利用工具.html">

也就意味着，把17个链接都做一下301则需要在source目录下，创建年份/月份/日期/标题/index.html，且index.html中的跳转链接必须是新的链接，一个一个写的话会累死，所以写了一个代码实现，代码如下：

# coding = utf-8
"""
  用户输入数据：/2018/03/05/腾讯云COS图床智能上传工具编写/
  然后在当前目录下创建：依次创建这2018、03、05、腾讯云COS图床智能上传工具编写 这几个目录，并且在目录下创建一个index.html
  其内容如下，其中{} 表示根据用户输入来更变的
    <html>
    
    <head>
    <META HTTP-EQUIV=REFRESH CONTENT="5;URL=http://www.admintony.com/{}.html">
    </head>
    <body>
    很抱歉给您带来不便，由于站点URL规则更变，您可访问<a href="http://www.admintony.com/{}.html">{}</a>来进行文章阅读，也可以等待5秒后自动跳转到该页面。   
    <br>
    <span>感谢您对AdminTony的关注与支持</span>
    </body>
    </html>
"""

import os,re,sys

def makeDir(str):
    re_ = re.compile(r'/(\d+)/(\d+)/(\d+)/(.+)/')
    list = re_.findall(str)
    year = list[0][0]
    mouth = list[0][1]
    day = list[0][2]
    title = list[0][3]

    # 创建年份目录
    if not os.path.exists(year):
        os.mkdir(year)

    # 创建月份目录
    if not os.path.exists(year+"/"+mouth):
        os.mkdir(year+"/"+mouth)

    # 创建日期目录
    if not os.path.exists(year+"/"+mouth+"/"+day):
        os.mkdir(year+"/"+mouth+"/"+day)

    # 创建名称目录
    if not os.path.exists(year+"/"+mouth+"/"+day+"/"+title):
        os.mkdir(year+"/"+mouth+"/"+day+"/"+title)

    # 在日期目录下创建index.html
    with open(year+"/"+mouth+"/"+day+"/"+title+"/"+"index.html","wb+") as file:
        content = """<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<META HTTP-EQUIV=REFRESH CONTENT="5;URL=http://admintony.com/{}.html">
</head>
<body>
很抱歉给您带来不便，由于站点URL规则更变，您可访问<a href="http://www.admintony.com/{}.html">{}</a>来进行文章阅读，也可以等待5秒后自动跳转到该页面。

<br>

<span>感谢您对AdminTony的关注与支持</span>
<br>
<br>
<span align="right">AdminTony 致上</span>
</body>
</html>
        """.format(title,title,title)
        #print(content)
        file.write(content.encode("utf-8"))
        print("[+] 已创建完成")
    #print(list[0][1])

def main():
    if len(sys.argv)==1:
        while True:
            path = input("[+] Please Enter URL:")
            if "exit" in path:
                print("[+] Bye ~ ")
                break
            makeDir(path)
    elif len(sys.argv)==2:
        with open(sys.argv[1],"r+",encoding='UTF-8') as f:
            line = f.readlines()
            i = 0
            for l in line:
                i = i+1
                makeDir(l)
            print("[+] 共创建记录{}条".format(i))

if __name__ == '__main__':
    main()

补充

昨天把规则提交以后出现了一些状况，如下：

1.跳转关系与规则不符

对于这个错误我想了想具体是哪里出错了，后来想到在做301跳转的时候，把admintony.com的url都跳转到了www.admintony.com，而规则是填写的是admintony.com/${4}.html，所以出现这个错误。

解决方案：将301跳转到www.admintony.com改为跳转到admintony.com即可。

2.改版前旧链接抓取失败

做301跳转的时候，我只把百度收录的页面做了，而那些已经提交到百度数据库的页面，我并没有做301跳转，因此出现此错误。

博客一共写了49篇文章了，需要对49篇文章的旧链接都做一个301跳转，主要难题在于收集旧链接，站点已经更换了新链接，因此执笔写下工具来进行收集旧链接。

工具原理：正则匹配站点中的文章名称和发布时间，然后构造http://admintony.com/年份/月份/日期/文章名/并保存在url.txt。(工具在最下方)

在blog/source目录下用新旧链接301中的工具生成301跳转页面。

重新发布页面后，重新向百度申请网站规则改版即可。

# coding = utf-8

import re
import requests

def getUrl(domain):
    re_title = re.compile(r'<link itemprop="mainEntityOfPage" href="(?:.+)/(.+).html">')
    re_date = re.compile(r'datetime="(\d+)-(\d+)-(\d+)T(?:.+)">')
    res = requests.get(domain)
    title = re_title.findall(res.text)
    date = re_date.findall(res.text)
    with open("url.txt","ab") as f:
        for i,j in zip(date,title):
            data = "http://admintony.com/{}/{}/{}/{}/\n".format(i[0],i[1],i[2],j)
            f.write(data.encode("utf-8"))

def main():
    domain = input('请输入网址：')
    page_max = input('请输入一共有多少分页：')
    page_max = int(page_max)+1
    for i in range(1,page_max):
        if i == 1:
            getUrl(domain)
        else:
            getUrl(domain+"/page/{}/".format(i))
        print("[+] 正在爬取第{}页".format(i))
    print("[+] 爬取完毕，URL保存在url.txt中")

if __name__ == '__main__':
    main()

学无止境苦与乐