python - 【用户、商品】【购买、浏览】数据处理
发布日期:2021-06-30 19:51:16 浏览次数:2 分类:技术文章

本文共 3034 字,大约阅读时间需要 10 分钟。

import pandas as pdclos = ['Time', 'Action', 'User', 'Product', 'Quantity', 'Price']orders = pd.read_csv('Data/purchase_order.tab', sep='\t',parse_dates={'Dates' : [0]}, names=clos, encoding='utf-8')orders.info()

RangeIndex: 54772 entries, 0 to 54771Data columns (total 6 columns):Dates 54772 non-null datetime64[ns]Action 54772 non-null objectUser 54772 non-null objectProduct 54772 non-null objectQuantity 54772 non-null int64Price 54721 non-null float64dtypes: datetime64[ns](1), float64(1), int64(1), object(3)memory usage: 2.5+ MB
orders.head()

# 求某个商品的总售出量orders[orders['Product'] == 'P0002267974']['Price'].describe()

count     10.0mean     285.0std        0.0min      285.025%      285.050%      285.075%      285.0max      285.0Name: Price, dtype: float64

# 商品去重orders['Product'].unique()

array(['P0006944501', 'P0006018073', 'P0002267974', ..., 'P0022884606',       'P0013911085', 'P0025123755'], dtype=object)
# 商品单价orders.groupby('Product')['Price'].mean().sort_values(ascending=False).head()

ProductP0000143511    438888.0P0000143500    438888.0P0006584093    320000.0P0025280275    183900.0P0000150006    111375.0Name: Price, dtype: float64
# 增加总价格栏位orders['Total_Price'] = orders['Quantity'] * orders['Price']orders.head()

# 查看哪个用户今天消费最高orders.groupby('User')['Total_Price'].sum().sort_values(ascending=False).head()
UserU166708333      2942744.0U10120098943    1451117.0U142809250       747550.0U1006283751      515688.0U10114715330     456782.0Name: Total_Price, dtype: float64
import pandas as pdclos = ['Time', 'Action', 'User', 'Product']views = pd.read_csv('Data/purchase_view.tab', sep='\t',parse_dates={'Dates' : [0]}, names=clos, encoding='utf-8')views.info()

# 查看用户购买记录clos = ['Time', 'Action', 'User', 'Product', 'Quantity', 'Price']orders = pd.read_csv('Data/purchase_order.tab', sep='\t',parse_dates={'Dates' : [0]}, names=clos, encoding='utf-8') # reset_index name column,DataFrameorders_cnt = orders.groupby(['User', 'Product'])['Product'].count().reset_index(name='buys')orders_cnt.head()

# 查看用户浏览记录views_cnt = views.groupby(['User', 'Product'])['Product'].count().reset_index(name='views')views_cnt.head()

# 合并购买与浏览记录merge_df = pd.merge(orders_cnt, views_cnt, on=['User', 'Product'], how='right')merge_df.head()

%pylab inlineviews_cnt_by_date.plot(kind='line', figsize=[10, 5])

# 根据小时计算用户的浏览记录总和views_cnt_by_hour = views.groupby(views['Dates'].dt.hour)['Action'].count()views_cnt_by_hour.plot(kind='line', figsize=[10, 5])

# 查看消费大户orders['Total_Price'] = orders['Quantity'] * orders['Price']g = orders.groupby('User')['Total_Price'].sum().sort_values(ascending=False)[0:10]g.plot(kind='bar', figsize=[10, 5])

# merge 用户每日浏览量和购买量views_daily_cnt = views.groupby(by=views['Dates'].dt.date)['Action'].count()orders_daily_cnt = orders.groupby(by=orders['Dates'].dt.date)['Action'].count()df = pd.concat([views_daily_cnt, orders_daily_cnt], axis=1) #用index 做merge keydf.dropna(inplace=True)df.columns = ['views', 'orders']df.plot(kind='line', figsize=[10, 5], rot=45)

转载地址:https://lipenglin.blog.csdn.net/article/details/78009616 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!

上一篇:python - sql + pandas 与 sqlite 结合
下一篇:python - pandas 从 yahoo finance 读取 BABA 数据进行 visualization

发表评论

最新留言

能坚持,总会有不一样的收获!
[***.219.124.196]2024年04月17日 11时55分15秒