Add missing dates to pandas dataframe

My data can have multiple events on a given date or NO events on a date. I take these events, get a count by date and plot them. However, when I plot them, my two series don't always match.

    idx = pd.date_range(df['simpleDate'].min(), df['simpleDate'].max())
    s = df.groupby(['simpleDate']).size()

In the above code idx becomes a range of say 30 dates. 09-01-2013 to 09-30-2013 However S may only have 25 or 26 days because no events happened for a given date. I then get an AssertionError as the sizes dont match when I try to plot:

    fig, ax = plt.subplots()    
    ax.bar(idx.to_pydatetime(), s, color='green')

What's the proper way to tackle this? Do I want to remove dates with no values from IDX or (which I'd rather do) is add to the series the missing date with a count of 0. I'd rather have a full graph of 30 days with 0 values. If this approach is right, any suggestions on how to get started? Do I need some sort of dynamic reindex function?

Here's a snippet of S ( df.groupby(['simpleDate']).size() ), notice no entries for 04 and 05.

    09-02-2013     2
    09-03-2013    10
    09-06-2013     5
    09-07-2013     1

You could use Series.reindex:

    import pandas as pd

    idx = pd.date_range('09-01-2013', '09-30-2013')

    s = pd.Series({'09-02-2013': 2,
                   '09-03-2013': 10,
                   '09-06-2013': 5,
                   '09-07-2013': 1})
    s.index = pd.DatetimeIndex(s.index)

    s = s.reindex(idx, fill_value=0)
    print(s)

yields

    2013-09-01     0
    2013-09-02     2
    2013-09-03    10
    2013-09-04     0
    2013-09-05     0
    2013-09-06     5
    2013-09-07     1
    2013-09-08     0
    ...

From: stackoverflow.com/q/19324453

Back to homepage or read more recommendations: