本篇博客整理各种pandas中dataframe的操作技巧,长期更新。
1:原有列基础生成新列
常见使用情景:两列相减的值为新的一列,或者多列操作生成新的一列
技巧:
import pandas as pd # make a simple dataframe df = pd.DataFrame({'a':[1,2], 'b':[3,4]}) df # a b # 0 1 3 # 1 2 4 # this just creates an unattached column: df.apply(lambda row: row.a + row.b, axis=1) # 0 4 # 1 6 # do same but attach it to the dataframe df['c'] = df.apply(lambda row: row.a + row.b, axis=1) df # a b c # 0 1 3 4 # 1 2 4 6
2:如何遍历Pandas Dataframe每一行
c1 c2 0 10 100 1 11 110 2 12 120 In [18]: for index, row in df.iterrows(): ....: print row['c1'], row['c2'] ....: 10 100 11 110 12 120
详见:https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
3:Pandas Dataframe通过matplotlib画图如何鼠标滚轮缩放
主要缩放函数:
import matplotlib.pyplot as plt def zoom_factory(ax,base_scale = 2.): def zoom_fun(event): # get the current x and y limits cur_xlim = ax.get_xlim() cur_ylim = ax.get_ylim() cur_xrange = (cur_xlim[1] - cur_xlim[0])*.5 cur_yrange = (cur_ylim[1] - cur_ylim[0])*.5 xdata = event.xdata # get event x location ydata = event.ydata # get event y location if event.button == 'up': # deal with zoom in scale_factor = 1/base_scale elif event.button == 'down': # deal with zoom out scale_factor = base_scale else: # deal with something that should never happen scale_factor = 1 print event.button # set new limits ax.set_xlim([xdata - cur_xrange*scale_factor, xdata + cur_xrange*scale_factor]) ax.set_ylim([ydata - cur_yrange*scale_factor, ydata + cur_yrange*scale_factor]) plt.draw() # force re-draw fig = ax.get_figure() # get the figure of interest # attach the call back fig.canvas.mpl_connect('scroll_event',zoom_fun) #return the function return zoom_fun
如何使用(直接对matplotlib的轴对象ax进行操作即可):
ax.plot(range(10)) scale = 1.5 f = zoom_factory(ax,base_scale = scale)
可选参数base_scale允许您将比例因子设置为您想要的值。
一定要确保缩放函数有个返回值对象f。所以如果你不保存f,该缩放返回值可能被垃圾回收。
演示:
详见:https://stackoverflow.com/questions/11551049/matplotlib-plot-zooming-with-scroll-wheel
4:pandas更改索引,更改index为某列
In [1]: import pandas as pd In [2]: df = pd.read_csv('hello.csv') In [3]: df Out[3]: name gender 0 Lucas Male 1 Lucy Female 2 Lily Female 3 Jim Male In [4]: df.set_index('name') Out[4]: gender name Lucas Male Lucy Female Lily Female Jim Male
5:ndarray切片区域选取
In [1]: from numpy import * In [2]: a = arange(36).reshape((6,6)) In [3]: a Out[3]: array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]]) In [4]: a[1,2] Out[4]: 8 In [5]: a[1,:] Out[5]: array([ 6, 7, 8, 9, 10, 11]) In [6]: a[:,2] Out[6]: array([ 2, 8, 14, 20, 26, 32]) In [7]: a[0:2, 0:2] Out[7]: array([[0, 1], [6, 7]])
语法:中括号中第一个选取就是行,第二个选取的就是列,中间用逗号隔开。
6:新增一列是某列的累加
In [1]: import pandas as pd In [2]: num_list = [1, 2, 3, 4] In [3]: df = pd.DataFrame(data=num_list) In [4]: df[1] = df.cumsum() In [5]: df Out[6]: 0 1 0 1 1 1 2 3 2 3 6
文章的脚注信息由WordPress的wp-posturl插件自动生成