Scatter plots in Pandas/Pyplot: How to plot by category

I am trying to make a simple scatter plot in pyplot using a Pandas DataFrame object, but want an efficient way of plotting two variables but have the symbols dictated by a third column (key). I have tried various ways using df.groupby, but not successfully. A sample df script is below. This colours the markers according to 'key1', but Id like to see a legend with 'key1' categories. Am I close? Thanks.

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three'))
    df['key1'] = (4,4,4,6,6,6,8,8,8,8)
    fig1 = plt.figure(1)
    ax1 = fig1.add_subplot(111)
    ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8)
    plt.show()

You can use scatter for this, but that requires having numerical values for your key1, and you won't have a legend, as you noticed.

It's better to just use plot for discrete categories like this. For example:

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    np.random.seed(1974)

    # Generate Data
    num = 20
    x, y = np.random.random((2, num))
    labels = np.random.choice(['a', 'b', 'c'], num)
    df = pd.DataFrame(dict(x=x, y=y, label=labels))

    groups = df.groupby('label')

    # Plot
    fig, ax = plt.subplots()
    ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
    for name, group in groups:
        ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
    ax.legend()

    plt.show()

enter image description here

If you'd like things to look like the default pandas style, then just update the rcParams with the pandas stylesheet and use its color generator. (I'm also tweaking the legend slightly):

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    np.random.seed(1974)

    # Generate Data
    num = 20
    x, y = np.random.random((2, num))
    labels = np.random.choice(['a', 'b', 'c'], num)
    df = pd.DataFrame(dict(x=x, y=y, label=labels))

    groups = df.groupby('label')

    # Plot
    plt.rcParams.update(pd.tools.plotting.mpl_stylesheet)
    colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random')

    fig, ax = plt.subplots()
    ax.set_color_cycle(colors)
    ax.margins(0.05)
    for name, group in groups:
        ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
    ax.legend(numpoints=1, loc='upper left')

    plt.show()

enter image description here

From: stackoverflow.com/q/21654635

Back to homepage or read more recommendations: