Efficiently checking if arbitrary object is NaN in Python / numpy / pandas?
My numpy arrays use
np.nan to designate missing values. As I iterate over the data set, I need to detect such missing values and handle them in special ways.
Naively I used
numpy.isnan(val), which works well unless
val isn't among the subset of types supported by
numpy.isnan(). For example, missing data can occur in string fields, in which case I get:
>>> np.isnan('some_string') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Not implemented for this type
Other than writing an expensive wrapper that catches the exception and returns
False, is there a way to handle this elegantly and efficiently?
pandas.isnull() checks for missing values in both numeric and string/object arrays. From the documentation, it checks for:
NaN in numeric arrays, None/NaN in object arrays
import pandas as pd import numpy as np s = pd.Series(['apple', np.nan, 'banana']) pd.isnull(s) Out: 0 False 1 True 2 False dtype: bool
The idea of using
numpy.nan to represent missing values is something that
pandas introduced, which is why
pandas has the tools to deal with it.
Datetimes too (if you use
pd.NaT you won't need to specify the dtype)
In : s = Series([Timestamp('20130101'),np.nan,Timestamp('20130102 9:30')],dtype='M8[ns]') In : s Out: 0 2013-01-01 00:00:00 1 NaT 2 2013-01-02 09:30:00 dtype: datetime64[ns]`` In : pd.isnull(s) Out: 0 False 1 True 2 False dtype: bool