python的web抓取_带有请求的Python Web抓取-登录后
发布日期:2021-06-24 15:07:23 浏览次数:2 分类:技术文章

本文共 2574 字,大约阅读时间需要 8 分钟。

I have a python requests/beatiful soup code below which enables me to login to a url successfully. However, after logon, to get the data I need would normally have to manually have to:

1) click on 'statement' in the first row:

2) Select dates, click 'run statement':

3) view data:

This is the code that I have used to logon to get to step 1 above:

import requests

from bs4 import BeautifulSoup

logurl = "https://login.flash.co.za/apex/f?p=pwfone:login"

posturl = 'https://login.flash.co.za/apex/wwv_flow.accept'

with requests.Session() as s:

s.headers = {"User-Agent":"Mozilla/5.0"}

res = s.get(logurl)

soup = BeautifulSoup(res.text,"html.parser")

arg_names =[]

for name in soup.select("[name='p_arg_names']"):

arg_names.append(name['value'])

values = {

'p_flow_id': soup.select_one("[name='p_flow_id']")['value'],

'p_flow_step_id': soup.select_one("[name='p_flow_step_id']")['value'],

'p_instance': soup.select_one("[name='p_instance']")['value'],

'p_page_submission_id': soup.select_one("[name='p_page_submission_id']")['value'],

'p_request': 'LOGIN',

'p_t01': 'solar',

'p_arg_names': arg_names,

'p_t02': 'password',

'p_md5_checksum': soup.select_one("[name='p_md5_checksum']")['value'],

'p_page_checksum': soup.select_one("[name='p_page_checksum']")['value']

}

s.headers.update({'Referer': logurl})

r = s.post(posturl, data=values)

print (r.content)

My question is, (beginner speaking), how could I skip steps 1 and 2 and simply do another headers update and post using the final URL using selected dates as form entries (headers and form info below)? (The referral header is step 2 above):

]

Edit 1: network request from csv file download:

解决方案

As others have recommended, Selenium is a good tool for this sort of task. However, I'd try to suggest a way to use requests for this purpose as that's what you asked for in the question.

The success of this approach would really depend on how the webpage is built and how data files are made available (if "Save as CSV" in the view data is what you're targeting).

If the login mechanism is cookie-based, you can use Sessions and Cookies in requests. When you submit a login form, a cookie is returned in the response headers. You add the cookie to request headers in any subsequent page requests to make your login stick.

Also, you should inspect the network request for "Save as CSV" action in the Developer Tools network pane. If you can see a structure to the request, you may be able to make a direct request within your authenticated session, and use a statement identifier and dates as the payload to get your results.

转载地址:https://blog.csdn.net/weixin_33565558/article/details/111915155 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:kswapd0 挖矿_Linux kswapd0 进程CPU占用过高
下一篇:3t硬盘 xp_怎么让xp支持3T硬盘

发表评论

最新留言

很好
[***.229.124.182]2024年04月10日 03时06分10秒

关于作者

    喝酒易醉,品茶养心,人生如梦,品茶悟道,何以解忧?唯有杜康!
-- 愿君每日到此一游!

推荐文章

Scratch画圆 蓝桥杯Scratch国赛真题答案和解析 2019-04-28
Scratch画圆形螺旋 蓝桥杯Scratch国赛真题答案和解析 2019-04-28
初学python100例-案例30 计算闰年 少儿编程案例讲解 2019-04-28
Scratch螺旋多边形小游戏 蓝桥杯Scratch国赛真题答案和解析 2019-04-28
初学python100例-案例31 输入星期几的第一个字母判断是星期几 少儿编程案例讲解 2019-04-28
初学python100例-案例32 男孩女孩小孩人数 少儿编程案例讲解 2019-04-28
【蓝桥杯真题12】Scratch角色装扮 少儿编程scratch蓝桥杯选拔赛真题讲解 2019-04-28
毕业设计 C#开发实现影院信息管理商城系统 毕业论文【论文+源程序+数据库】 2019-04-28
电子学会图形化scratch编程等级考试二级真题答案解析(判断)2020-9 2019-04-28
scratch别碰红块小游戏 电子学会图形化编程scratch等级考试二级真题和答案解析2020-9 2019-04-28
scratch打棒球游戏 电子学会图形化编程scratch等级考试四级真题和答案解析2019 2019-04-28
scratch列表移位 电子学会图形化编程scratch等级考试四级真题和答案解析2019 2019-04-28
scratch动物拥抱游戏 电子学会图形化编程scratch等级考试一级真题答案2020-9 2019-04-28
spring aop之对象内部方法间的嵌套失效 2019-04-28
SecureCRT工具登录跳板机,直接连接目标机器 2019-04-28
Google首席工程师Joshua Bloch谈如何设计优秀的API 2021-07-01
java SPI 与cooma(dubbo 微容器改良品)--1 2021-07-01
Java后端--3--Springboot基础开发规范 2021-07-01
安全合规--45--基于国内法律法规的企业数据合规体系建设经验总结(四) 2021-07-01
安全合规--46--基于国内法律法规的企业数据合规体系建设经验总结(五) 2021-07-01