Replace invalid values with None in Pandas DataFrame

Is there any method to replace values with None in Pandas in Python?

You can use df.replace('pre', 'post') and can replace a value with another, but this can't be done if you want to replace with None value, which if you try, you get a strange result.

So here's an example:

    df = DataFrame(['-',3,2,5,1,-5,-1,'-',9])
    df.replace('-', 0)

which returns a successful result.

But,

    df.replace('-', None)

which returns a following result:

    0
    0   - // this isn't replaced
    1   3
    2   2
    3   5
    4   1
    5  -5
    6  -1
    7  -1 // this is changed to `-1`...
    8   9

Why does such a strange result be returned?

Since I want to pour this data frame into MySQL database, I can't put NaN values into any element in my data frame and instead want to put None. Surely, you can first change '-' to NaN and then convert NaN to None, but I want to know why the dataframe acts in such a terrible way.

Actually in later versions of pandas this will give a TypeError:

    df.replace('-', None)
    TypeError: If "to_replace" and "value" are both None then regex must be a mapping

You can do it by passing either a list or a dictionary:

    In [11]: df.replace('-', df.replace(['-'], [None]) # or .replace('-', {0: None})
    Out[11]:
          0
    0  None
    1     3
    2     2
    3     5
    4     1
    5    -5
    6    -1
    7  None
    8     9

But I recommend using NaNs rather than None:

    In [12]: df.replace('-', np.nan)
    Out[12]:
         0
    0  NaN
    1    3
    2    2
    3    5
    4    1
    5   -5
    6   -1
    7  NaN
    8    9

From: stackoverflow.com/q/17097236

Back to homepage or read more recommendations: