pandas DataFrame: replace nan values with average of columns

I've got a pandas DataFrame filled mostly with real numbers, but there is a few nan values in it as well.

How can I replace the nans with averages of columns where they are?

This question is very similar to this one: numpy array: replace nan values with average of columns but, unfortunately, the solution given there doesn't work for a pandas DataFrame.

You can simply use DataFrame.fillna to fill the nan's directly:

    In [27]: df 
    Out[27]: 
              A         B         C
    0 -0.166919  0.979728 -0.632955
    1 -0.297953 -0.912674 -1.365463
    2 -0.120211 -0.540679 -0.680481
    3       NaN -2.027325  1.533582
    4       NaN       NaN  0.461821
    5 -0.788073       NaN       NaN
    6 -0.916080 -0.612343       NaN
    7 -0.887858  1.033826       NaN
    8  1.948430  1.025011 -2.982224
    9  0.019698 -0.795876 -0.046431

    In [28]: df.mean()
    Out[28]: 
    A   -0.151121
    B   -0.231291
    C   -0.530307
    dtype: float64

    In [29]: df.fillna(df.mean())
    Out[29]: 
              A         B         C
    0 -0.166919  0.979728 -0.632955
    1 -0.297953 -0.912674 -1.365463
    2 -0.120211 -0.540679 -0.680481
    3 -0.151121 -2.027325  1.533582
    4 -0.151121 -0.231291  0.461821
    5 -0.788073 -0.231291 -0.530307
    6 -0.916080 -0.612343 -0.530307
    7 -0.887858  1.033826 -0.530307
    8  1.948430  1.025011 -2.982224
    9  0.019698 -0.795876 -0.046431

The docstring of fillna says that value should be a scalar or a dict, however, it seems to work with a Series as well. If you want to pass a dict, you could use df.mean().to_dict().

From: stackoverflow.com/q/18689823

Back to homepage or read more recommendations: