python - 数据转换-白红宇的个人博客

python - 数据转换

发布日期：2021-06-30 19:51:11 浏览次数：2 分类：技术文章

本文共 2416 字，大约阅读时间需要 8 分钟。

1. Data Transformation

# coding: utf-8# In[21]:import pandas df = pandas.read_excel('Data/house_sample.xlsx')df.head(3)# In[22]:df['总价'] * 10000# In[23]:import numpy as npnp.sqrt(df['总价']) # 开平方# In[24]:df['均价'] = df['总价'] * 10000 / df['建筑面积']df.head(3)# ## 非数值变量转换# In[25]:df['物 业 费'].map(lambda e : e.split('元')[0])# In[26]:df1 = pandas.DataFrame([    [60,70,50],    [80,79,68],    [63,66,82]], columns = ['First', 'Second', 'Third'])df1# In[28]:df1.apply(lambda e : e.max() - e.min())# In[29]:df.head(3)# In[30]:# 将暂无资料标记为缺失值df.applymap(lambda e : np.nan if e == '暂无资料' else e)# In[ ]:

2. Time Conversion

from datetime import datetime  current_time = datetime.now()  current_time.strftime('%Y-%m-%d') current_time.yearcurrent_time.monthcurrent_time.daycurrent_time.hourcurrent_time.minutecurrent_time.second

# coding: utf-8# ## datetime# In[4]:from datetime import datetimecurrent_time = datetime.now()current_time# In[6]:current_time.strftime('%Y-%m-%d')# In[8]:s = '2017/9-12'datetime.strptime(s, '%Y/%m-%d')# In[10]:from datetime import timedeltacurrent_time - timedelta(days = 1) # 昨天 # In[13]:# 向后走7天for i in range(1, 8):    dt = current_time + timedelta(days = i)    print(dt.strftime('%Y/%m/%d'))# ## timestamp# In[17]:from time import mktimemktime(current_time.timetuple())# In[19]:from time import timetime()# In[20]:datetime.fromtimestamp(1505196106)# ## 张贴日期转为datetime# In[21]:import pandas df = pandas.read_excel('Data/house_sample.xlsx')df.head(3)# In[23]:df['张贴日期'] = pandas.to_datetime(df['张贴日期'], format='西元%Y年%m月%d日')df.head(3)

3. Dummy variable & Reshaping Data

# coding: utf-8# In[82]:import pandas df = pandas.read_excel('Data/house_sample.xlsx')df.head(3)# ## 虚拟变量 dummy variable# In[83]:pandas.get_dummies(df['朝向'])# In[84]:df = pandas.concat([df, pandas.get_dummies(df['朝向'])], axis=1)df.head(3)# In[85]:del df['朝向']df.head()# ## 透视表 pivot_table# In[86]:df2 = df.pivot_table(index='张贴日期', columns='产权性质', values='总价', aggfunc=sum, fill_value=0)df2.head(3)# In[87]:df3 = df.pivot_table(index='产权性质', columns='张贴日期', values='总价', aggfunc=sum, fill_value=0)df3# In[88]:df3.T.head(3)# ## stack、unstack 长宽表格转换# In[89]:df_multi_idx = df.pivot_table(index=['装修','楼层'], columns='张贴日期', values='总价', aggfunc=sum, fill_value=0)df_multi_idx# In[90]:df_multi_idx.T# In[91]:# 行转列df_wide = df_multi_idx.unstack(level=1) # 把数量少的那类index拿到column,默认是最内层 1df_wide# In[92]:# 列转行df_long = df_wide.stack()df_long

转载地址：https://lipenglin.blog.csdn.net/article/details/77946870 如侵犯您的版权，请留言回复原文章的地址，我们会给您删除此文章，给您带来不便请您谅解！

上一篇：python - 在 DataFrame 中使用正则表达式

下一篇：python - 房屋资料处理（detection and imputation）

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！

发表评论

最新留言

关于作者

推荐文章