How to implement 'in' and 'not in' for Pandas dataframe

How can I achieve the equivalents of SQL's IN and NOT IN?

I have a list with the required values. Here's the scenario:

    df = pd.DataFrame({'countries':['US','UK','Germany','China']})
    countries = ['UK','China']

    # pseudo-code:
    df[df['countries'] not in countries]

My current way of doing this is as follows:

    df = pd.DataFrame({'countries':['US','UK','Germany','China']})
    countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})

    # IN
    df.merge(countries,how='inner',on='countries')

    # NOT IN
    not_in = df.merge(countries,how='left',on='countries')
    not_in = not_in[pd.isnull(not_in['matched'])]

But this seems like a horrible kludge. Can anyone improve on it?

You can use pd.Series.isin.

For "IN" use: something.isin(somewhere)

Or for "NOT IN": ~something.isin(somewhere)

As a worked example:

    >>> df
      countries
    0        US
    1        UK
    2   Germany
    3     China
    >>> countries
    ['UK', 'China']
    >>> df.countries.isin(countries)
    0    False
    1     True
    2    False
    3     True
    Name: countries, dtype: bool
    >>> df[df.countries.isin(countries)]
      countries
    1        UK
    3     China
    >>> df[~df.countries.isin(countries)]
      countries
    0        US
    2   Germany

From: stackoverflow.com/q/19960077

Back to homepage or read more recommendations: