Python3网络爬虫(三)

最新推荐文章于 2024-07-18 20:32:06 发布

凡凡不知所错于 2019-01-07 11:31:54 发布

对于一些网站，一开始能请求，但是时间久了，网站有可能会封ip，Requests库对此的解决办法：

import requests proxies = { 'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080', } requests.get('https://www.taobao.com', proxies=proxies) 1234567

如果代理需要HTTP Basic Auth：

import requests proxies = { 'https': 'http://user:password@10.10.1.10:3128/', } requests.get('https://www.taobao.com', proxies=proxies) 12345

！！！！这里如果我们想用各种不同的ip来访问网站呢？？？
因为对单一IP，很多网站会设置访问间隔。
解决办法，先去免费IP网站爬取所有的IP地址，然后使用这些IP爬取目标网站：

import requests from bs4 import BeautifulSoup headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0' } r = requests.get("https://www.xicidaili.com/nt/", headers=headers) soup = BeautifulSoup(r.text, 'lxml') ips = soup.findAll('tr') proxy_list = [] for x in range(1, len(ips)): ip = ips[x] tds = ip.findAll("td") ip_temp = 'http://'+tds[1].contents[0]+":"+tds[2].contents[0] proxy_list.append(ip_temp) # 上面已经获取了IP，下面是爬取目标网站 run_times = 100000 for i in range(run_times): for item in proxy_list: proxies = { 'http': item, 'https': item, } print(proxies) try: requests.get('目标网站', proxies=proxies, timeout=1) print('ok') except: continue

123456789101112131415161718192021222324252627282930

Python3网络爬虫(三)

"晨曦微光，爱在细微处绚烂绽放——李明与林晓的温柔邂逅"

醉简单的心情

家庭养花知识大全(家庭养花知识大全与技巧)

养花知识大全,养花技巧大全

家庭养花风水知识家庭养花“五行说”

家庭养花知识大全家庭养花有什么好处

Python3网络爬虫(三)

"晨曦微光，爱在细微处绚烂绽放——李明与林晓的温柔邂逅"

醉简单的心情

家庭养花知识大全(家庭养花知识大全与技巧)

养花知识大全,养花技巧大全

家庭养花风水知识 家庭养花“五行说”

家庭养花知识大全 家庭养花有什么好处

家庭养花风水知识家庭养花“五行说”

家庭养花知识大全家庭养花有什么好处