Python爬取百度热榜


这是一个非常基础的爬虫;使用Python爬取百度热榜的标题和链接;

python爬取百度热榜

完整源码:


# 需有requests、lxml库
import requests
from urllib import parse
from lxml import etree

#请求头必须有合理的 User-Agent 值
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}

response = requests.get('https://www.baidu.com/',headers=headers)

html = response.text

document = etree.HTML(html)

for content in document.xpath('//textarea [@id="hotsearch_data"]/text()'):
    for item in eval(content).get('hotsearch'):
        #打印标题
        print(item.get('pure_title'))
        #打印链接
        print(parse.unquote(item.get('linkurl')))

转载声明
本文版权归作者所有

如需转载,请注明出处;本文地址: https://www.perfcode.com/p/python-get-baidu-hotsearch.html