使用Python爬取网页上的所有链接

BeautifulSoup requests

Python

2023-02-24 10:31:06

要使用Python爬取网页上的所有链接，可以使用Python的requests库和BeautifulSoup库。

以下是一个简单的示例，它可以使用requests库获取网页内容，然后使用BeautifulSoup库解析HTML并获取所有链接：

import requests
from bs4 import BeautifulSoup

url = "https://www.baidu.com"
response = requests.get(url)
html_content = response.text

soup = BeautifulSoup(html_content, 'html.parser')
links = soup.find_all('a')

for link in links:
    print(link.get('href'))

在这个示例中，我们首先使用requests库获取URL的HTML内容。然后，我们将HTML内容传递给BeautifulSoup构造函数，以创建一个BeautifulSoup对象。我们使用find_all方法查找所有链接标记<a>，并使用get方法获取每个链接标记的href属性值。

请注意，在使用BeautifulSoup进行HTML解析时，需要确保已经安装了BeautifulSoup库。可以使用以下命令在终端或命令提示符中安装BeautifulSoup：

pip install beautifulsoup4