爬虫实战
发布日期:2021-06-29 12:36:11 浏览次数:2 分类:技术文章

本文共 1479 字,大约阅读时间需要 4 分钟。

 

 

 

 

from bs4 import BeautifulSoupwith open('new_index.html') as wb_date:	Soup = BeautifulSoup(wb_date,'lxml')	lis = Soup.select('body > div.main-content > ul > li')	for li in lis:		image=li.select('li > img')[0].get('src')		title= li.select('li > div.article-info > h3 > a')[0].get_text()		descs = li.select('li > div.article-info > p.description')[0].get_text()		rates = li.select('li > div.rate > span')[0].get_text()		cates =list(li.select('li > div.article-info > p.meta-info ')[0].stripped_strings)		if float(rates)>3:			print(title,descs,rates,image)			'''from bs4 import BeautifulSoupinfo =[]with open('new_index.html') as wb_date:	Soup = BeautifulSoup(wb_date,'lxml')		images=Soup.select('body > div.main-content > ul > li > img')	titles = Soup.select('body > div.main-content > ul > li > div.article-info > h3 > a')	descs = Soup.select('body > div.main-content > ul > li > div.article-info > p.description')	rates = Soup.select('body > div.main-content > ul > li > div.rate > span')	cates =Soup.select('body > div.main-content > ul > li > div.article-info > p.meta-info ')			#print(images,title,descs,rates,cates,sep ='\n------------------\n')	for title,desc,rate,cate,image in zip(titles,descs,rates,cates,images):	data = {		'title': title.get_text(),		'desc': desc.get_text(),		'rate': rate.get_text(),		'cate': list(cate.stripped_strings),		'image': image.get('src')	}	#print(date)	info.append(data)for i in info:	if float(i['rate'])>3:		print(i['title'],i['cate'])'''

  

转载地址:https://bypass.blog.csdn.net/article/details/107438902 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:常见未授权访问漏洞总结
下一篇:常见端口渗透笔录

发表评论

最新留言

很好
[***.229.124.182]2024年04月21日 17时37分34秒

关于作者

    喝酒易醉,品茶养心,人生如梦,品茶悟道,何以解忧?唯有杜康!
-- 愿君每日到此一游!

推荐文章