Count unique values with pandas per groups

I need to count unique ID values in every domain I have data

    ID, domain
    123, 'vk.com'
    123, 'vk.com'
    123, 'twitter.com'
    456, 'vk.com'
    456, 'facebook.com'
    456, 'vk.com'
    456, 'google.com'
    789, 'twitter.com'
    789, 'vk.com'

I try df.groupby(['domain', 'ID']).count() But I want to get

    domain, count
    vk.com   3
    twitter.com   2
    facebook.com   1
    google.com   1

You need nunique:

    df = df.groupby('domain')['ID'].nunique()

    print (df)
    domain
    'facebook.com'    1
    'google.com'      1
    'twitter.com'     2
    'vk.com'          3
    Name: ID, dtype: int64

If you need to strip ' characters:

    df = df.ID.groupby([df.domain.str.strip("'")]).nunique()
    print (df)
    domain
    facebook.com    1
    google.com      1
    twitter.com     2
    vk.com          3
    Name: ID, dtype: int64

Or as Jon Clements commented:

    df.groupby(df.domain.str.strip("'"))['ID'].nunique()

You can retain the column name like this:

    df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})
    print(df)
        domain  ID
    0       fb   1
    1      ggl   1
    2  twitter   2
    3       vk   3

The difference is that nunique() returns a Series and agg() returns a DataFrame.

From: stackoverflow.com/q/38309729

Back to homepage or read more recommendations: