How to explode a list inside a Dataframe cell into separate rows

I'm looking to turn a pandas cell containing a list into rows for each of those values.

So, take this:

enter image description here

If I'd like to unpack and stack the values in the 'nearest_neighbors" column so that each value would be a row within each 'opponent' index, how would I best go about this? Are there pandas methods that are meant for operations like this? I'm just not aware.

Thanks in advance, guys.

In the code below, I first reset the index to make the row iteration easier.

I create a list of lists where each element of the outer list is a row of the target DataFrame and each element of the inner list is one of the columns. This nested list will ultimately be concatenated to create the desired DataFrame.

I use a lambda function together with list iteration to create a row for each element of the nearest_neighbors paired with the relevant name and opponent.

Finally, I create a new DataFrame from this list (using the original column names and setting the index back to name and opponent).

    df = (pd.DataFrame({'name': ['A.J. Price'] * 3, 
                        'opponent': ['76ers', 'blazers', 'bobcats'], 
                        'nearest_neighbors': [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']] * 3})
          .set_index(['name', 'opponent']))

    >>> df
                                                        nearest_neighbors
    name       opponent                                                  
    A.J. Price 76ers     [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
               blazers   [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
               bobcats   [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]

    df.reset_index(inplace=True)
    rows = []
    _ = df.apply(lambda row: [rows.append([row['name'], row['opponent'], nn]) 
                             for nn in row.nearest_neighbors], axis=1)
    df_new = pd.DataFrame(rows, columns=df.columns).set_index(['name', 'opponent'])

    >>> df_new
                        nearest_neighbors
    name       opponent                  
    A.J. Price 76ers          Zach LaVine
               76ers           Jeremy Lin
               76ers        Nate Robinson
               76ers                Isaia
               blazers        Zach LaVine
               blazers         Jeremy Lin
               blazers      Nate Robinson
               blazers              Isaia
               bobcats        Zach LaVine
               bobcats         Jeremy Lin
               bobcats      Nate Robinson
               bobcats              Isaia

EDIT JUNE 2017

An alternative method is as follows:

    >>> (pd.melt(df.nearest_neighbors.apply(pd.Series).reset_index(), 
                 id_vars=['name', 'opponent'],
                 value_name='nearest_neighbors')
         .set_index(['name', 'opponent'])
         .drop('variable', axis=1)
         .dropna()
         .sort_index()
         )

From: stackoverflow.com/q/32468402

Back to homepage or read more recommendations: