pandas groupby sort within groups

I want to group my dataframe by two columns and then sort the aggregated results within the groups.

    In [167]:
    df

    Out[167]:
    count   job source
    0   2   sales   A
    1   4   sales   B
    2   6   sales   C
    3   3   sales   D
    4   7   sales   E
    5   5   market  A
    6   3   market  B
    7   2   market  C
    8   4   market  D
    9   1   market  E

    In [168]:
    df.groupby(['job','source']).agg({'count':sum})

    Out[168]:
                count
    job     source  
    market  A   5
            B   3
            C   2
            D   4
            E   1
    sales   A   2
            B   4
            C   6
            D   3
            E   7

I would now like to sort the count column in descending order within each of the groups. And then take only the top three rows. To get something like:

                count
    job     source  
    market  A   5
            D   4
            B   3
    sales   E   7
            C   6
            B   4

What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.

Starting from the result of the first groupby:

    In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

We group by the first level of the index:

    In [63]: g = df_agg['count'].groupby(level=0, group_keys=False)

Then we want to sort ('order') each group and take the first three elements:

    In [64]: res = g.apply(lambda x: x.order(ascending=False).head(3))

However, for this, there is a shortcut function to do this, nlargest:

    In [65]: g.nlargest(3)
    Out[65]:
    job     source
    market  A         5
            D         4
            B         3
    sales   E         7
            C         6
            B         4
    dtype: int64

From: stackoverflow.com/q/27842613

Back to homepage or read more recommendations: