Python Pandas: Get index of rows which column matches certain value

Given a DataFrame with a column "BoolCol", we want to find the indexes of the DataFrame in which the values for "BoolCol" == True

I currently have the iterating way to do it, which works perfectly:

    for i in range(100,3000):
        if df.iloc[i]['BoolCol']== True:
             print i,df.iloc[i]['BoolCol']

But this is not the correct panda's way to do it. After some research, I am currently using this code:

    df[df['BoolCol'] == True].index.tolist()

This one gives me a list of indexes, but they dont match, when I check them by doing:

    df.iloc[i]['BoolCol']

The result is actually False!!

Which would be the correct Pandas way to do this?

df.iloc[i] returns the ith row of df. i does not refer to the index label, i is a 0-based index.

In contrast, the attributeindex returns actual index labels , not numeric row-indices:

    df.index[df['BoolCol'] == True].tolist()

or equivalently,

    df.index[df['BoolCol']].tolist()

You can see the difference quite clearly by playing with a DataFrame with an "unusual" index:

    df = pd.DataFrame({'BoolCol': [True, False, False, True, True]},
           index=[10,20,30,40,50])

    In [53]: df
    Out[53]: 
       BoolCol
    10    True
    20   False
    30   False
    40    True
    50    True

    [5 rows x 1 columns]

    In [54]: df.index[df['BoolCol']].tolist()
    Out[54]: [10, 40, 50]

If you want to use the index ,

    In [56]: idx = df.index[df['BoolCol']]

    In [57]: idx
    Out[57]: Int64Index([10, 40, 50], dtype='int64')

then you can select the rows usingloc instead of iloc :

    In [58]: df.loc[idx]
    Out[58]: 
       BoolCol
    10    True
    40    True
    50    True

    [3 rows x 1 columns]

Note that loc can also accept boolean arrays :

    In [55]: df.loc[df['BoolCol']]
    Out[55]: 
       BoolCol
    10    True
    40    True
    50    True

    [3 rows x 1 columns]

If you have a boolean array,mask, and need ordinal index values, you can compute them using np.flatnonzero :

    In [110]: np.flatnonzero(df['BoolCol'])
    Out[112]: array([0, 3, 4])

Use df.iloc to select rows by ordinal index:

    In [113]: df.iloc[np.flatnonzero(df['BoolCol'])]
    Out[113]: 
       BoolCol
    10    True
    40    True
    50    True

From: stackoverflow.com/q/21800169

Back to homepage or read more recommendations: