Pandas groupby month and year

I have the following dataframe:

    Date        abc    xyz
    01-Jun-13   100    200
    03-Jun-13   -20    50
    15-Aug-13   40     -5
    20-Jan-14   25     15
    21-Feb-14   60     80

I need to group the data by year and month. ie: Group by Jan 2013, Feb 2013, Mar 2013 etc... I will be using the newly grouped data to create a plot showing abc vs xyz per year/month.

I've tried various combinations of groupby and sum but just can't seem to get anything to work.

Thank you for any assistance.

You can use either resample or Grouper (which resamples under the hood).

First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime). It's easier if it's a DatetimeIndex:

    In [11]: df1
    Out[11]:
                abc  xyz
    Date
    2013-06-01  100  200
    2013-06-03  -20   50
    2013-08-15   40   -5
    2014-01-20   25   15
    2014-02-21   60   80

    In [12]: g = df1.groupby(pd.Grouper(freq="M"))  # DataFrameGroupBy (grouped by Month)

    In [13]: g.sum()
    Out[13]:
                abc  xyz
    Date
    2013-06-30   80  250
    2013-07-31  NaN  NaN
    2013-08-31   40   -5
    2013-09-30  NaN  NaN
    2013-10-31  NaN  NaN
    2013-11-30  NaN  NaN
    2013-12-31  NaN  NaN
    2014-01-31   25   15
    2014-02-28   60   80

    In [14]: df1.resample("M", how='sum')  # the same
    Out[14]:
                abc  xyz
    Date
    2013-06-30   40  125
    2013-07-31  NaN  NaN
    2013-08-31   40   -5
    2013-09-30  NaN  NaN
    2013-10-31  NaN  NaN
    2013-11-30  NaN  NaN
    2013-12-31  NaN  NaN
    2014-01-31   25   15
    2014-02-28   60   80

Note: Previously pd.Grouper(freq="M") was written as pd.TimeGrouper("M"). The latter is now deprecated since 0.21.

I had thought the following would work, but it doesn't (due to as_index not being respected? I'm not sure.). I'm including this for interest's sake.

If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime), you can use the PeriodIndex:

    In [21]: df
    Out[21]:
            Date  abc  xyz
    0 2013-06-01  100  200
    1 2013-06-03  -20   50
    2 2013-08-15   40   -5
    3 2014-01-20   25   15
    4 2014-02-21   60   80

    In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way
    Out[22]:
    <class 'pandas.tseries.period.PeriodIndex'>
    [2013-06, ..., 2014-02]
    Length: 5, Freq: M

    In [23]: per = df.Date.dt.to_period("M")  # new way to get the same

    In [24]: g = df.groupby(per)

    In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps)
    Out[25]:
             abc  xyz
    2013-06   80  250
    2013-08   40   -5
    2014-01   25   15
    2014-02   60   80

To get the desired result we have to reindex...

From: stackoverflow.com/q/26646191