本来要爬帅哥图的,想想还是算了,怕被人怀疑取向,糟糕!
这里的requests.get就代表get请求,跟urlopen不设定data参数差不多
但是requests用起来更加方便,还有很多强大功能有空去研究一下,先占坑
from bs4 import BeautifulSoup from urllib.request import urlretrieve import requests import os import time head = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 " "(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"} urlbase = "http://www.shuaia.net" pages_url = [urlbase] for i in range(2, 5): pages_url.append(urlbase + "/index_%d.html" % i) pictures_url = [] for url in pages_url: req = requests.get(url, head) req.encoding = "utf-8" html = req.text soup = BeautifulSoup(html, 'lxml') target = soup.find_all("a", class_="item-img") for picture in target: name = picture.img.get("alt") if "花" in name or "女" in name: picture_url = picture.get("href") final_link = name + "=" + picture_url pictures_url.append(final_link) for eachurl in pictures_url: name, target_url = eachurl.split("=") filename = name + ".jpg" pic_req = requests.get(target_url, head) pic_req.encoding = "utf-8" pic_html = pic_req.text soup = BeautifulSoup(pic_html, 'lxml') div1 = soup.find("div", class_="wr-single-content-list") try: pic_url = urlbase + div1.img["src"] if "Driver_images" not in os.listdir("F:"): os.makedirs(r"F:Driver_images") urlretrieve(pic_url, "F:Driver_images" + filename) print(name) except AttributeError: print("无效链接!") # time.sleep(1) # 小网站不用延时
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647相关知识
一个简单的python网路爬虫示例——爬取《后来的我们》影评
用Python爬虫获取网络园艺社区植物养护和种植技巧
【python】爬取植物数据库
用Python做兼职,轻松赚取零花钱,分享Python兼职经验
使用美汤从HTML中提取特定的标题
这个好玩!用Python识别花卉种类,并自动整理分类!
Python 常用的标准库以及第三方库有哪些?
Python GUI编程实战
《Python机器学习开发实战》电子书在线阅读
半径为2.11的圆球的体积python
网址: python爬虫实战:requests爬取妹子图片 https://m.huajiangbk.com/newsview549914.html
上一篇: 龙山镇实验小学开展“校园吉祥物” |
下一篇: 6步打造原味森系之家 给你清新自 |