How to "select distinct" across multiple data frame columns in pandas?

I'm looking for a way to do the equivalent to the sql

"SELECT DISTINCT col1, col2 FROM dataframe_table"

The pandas sql comparison doesn't have anything about "distinct"

.unique() only works for a single column, so I suppose I could concat the columns, or put them in a list/tuple and compare that way, but this seems like something pandas should do in a more native way.

Am I missing something obvious, or is there no way to do this?

You can use the drop_duplicates method to get the unique rows in a DataFrame:

    In [29]: df = pd.DataFrame({'a':[1,2,1,2], 'b':[3,4,3,5]})

    In [30]: df
    Out[30]:
       a  b
    0  1  3
    1  2  4
    2  1  3
    3  2  5

    In [32]: df.drop_duplicates()
    Out[32]:
       a  b
    0  1  3
    1  2  4
    3  2  5

You can also provide the subset keyword argument if you only want to use certain columns to determine uniqueness. See the docstring.

From: stackoverflow.com/q/30530663