【整理】Pandas Dataframe操作技巧

原创文章，转载请注明： 转载自勤奋的小青蛙
本文链接地址: 【整理】Pandas Dataframe操作技巧

本篇博客整理各种pandas中dataframe的操作技巧，长期更新。

1：原有列基础生成新列

常见使用情景：两列相减的值为新的一列，或者多列操作生成新的一列

技巧：

import pandas as pd

# make a simple dataframe
df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
df
#    a  b
# 0  1  3
# 1  2  4

# this just creates an unattached column:
df.apply(lambda row: row.a + row.b, axis=1)
# 0    4
# 1    6

# do same but attach it to the dataframe
df['c'] = df.apply(lambda row: row.a + row.b, axis=1)
df
#    a  b  c
# 0  1  3  4
# 1  2  4  6

详见：https://stackoverflow.com/questions/26886653/pandas-create-new-column-based-on-values-from-other-columns

2：如何遍历Pandas Dataframe每一行

   c1   c2
0  10  100
1  11  110
2  12  120

In [18]: for index, row in df.iterrows():
   ....:     print row['c1'], row['c2']
   ....:     
10 100
11 110
12 120

详见：https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas

3：Pandas Dataframe通过matplotlib画图如何鼠标滚轮缩放

主要缩放函数：

import matplotlib.pyplot as plt


def zoom_factory(ax,base_scale = 2.):
    def zoom_fun(event):
        # get the current x and y limits
        cur_xlim = ax.get_xlim()
        cur_ylim = ax.get_ylim()
        cur_xrange = (cur_xlim[1] - cur_xlim[0])*.5
        cur_yrange = (cur_ylim[1] - cur_ylim[0])*.5
        xdata = event.xdata # get event x location
        ydata = event.ydata # get event y location
        if event.button == 'up':
            # deal with zoom in
            scale_factor = 1/base_scale
        elif event.button == 'down':
            # deal with zoom out
            scale_factor = base_scale
        else:
            # deal with something that should never happen
            scale_factor = 1
            print event.button
        # set new limits
        ax.set_xlim([xdata - cur_xrange*scale_factor,
                     xdata + cur_xrange*scale_factor])
        ax.set_ylim([ydata - cur_yrange*scale_factor,
                     ydata + cur_yrange*scale_factor])
        plt.draw() # force re-draw

    fig = ax.get_figure() # get the figure of interest
    # attach the call back
    fig.canvas.mpl_connect('scroll_event',zoom_fun)

    #return the function
    return zoom_fun

如何使用（直接对matplotlib的轴对象ax进行操作即可）：

 ax.plot(range(10))
 scale = 1.5
 f = zoom_factory(ax,base_scale = scale)

可选参数base_scale允许您将比例因子设置为您想要的值。

一定要确保缩放函数有个返回值对象f。所以如果你不保存f，该缩放返回值可能被垃圾回收。

演示：

详见：https://stackoverflow.com/questions/11551049/matplotlib-plot-zooming-with-scroll-wheel

4：pandas更改索引，更改index为某列

In [1]: import pandas as pd

In [2]: df = pd.read_csv('hello.csv')

In [3]: df
Out[3]:
    name  gender
0  Lucas    Male
1   Lucy  Female
2   Lily  Female
3    Jim    Male

In [4]: df.set_index('name')
Out[4]:
       gender
name
Lucas    Male
Lucy   Female
Lily   Female
Jim      Male

5：ndarray切片区域选取

In [1]: from numpy import *

In [2]: a = arange(36).reshape((6,6))

In [3]: a
Out[3]:
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

In [4]: a[1,2]
Out[4]: 8

In [5]: a[1,:]
Out[5]: array([ 6,  7,  8,  9, 10, 11])

In [6]: a[:,2]
Out[6]: array([ 2,  8, 14, 20, 26, 32])

In [7]: a[0:2, 0:2]
Out[7]:
array([[0, 1],
       [6, 7]])

语法：中括号中第一个选取就是行，第二个选取的就是列，中间用逗号隔开。

6：新增一列是某列的累加

In [1]: import pandas as pd
In [2]: num_list = [1, 2, 3, 4]
In [3]: df = pd.DataFrame(data=num_list)
In [4]: df[1] = df.cumsum()
In [5]: df
Out[6]:
   0   1
0  1   1
1  2   3
2  3   6

原创文章，转载请注明： 转载自勤奋的小青蛙
本文链接地址: 【整理】Pandas Dataframe操作技巧

文章的脚注信息由WordPress的wp-posturl插件自动生成