Python pandas: Keep selected column as DataFrame instead of Series

When selecting a single column from a pandas DataFrame(say df.iloc[:, 0], df['A'], or df.A, etc), the resulting vector is automatically converted to a Series instead of a single-column DataFrame. However, I am writing some functions that takes a DataFrame as an input argument. Therefore, I prefer to deal with single-column DataFrame instead of Series so that the function can assume say df.columns is accessible. Right now I have to explicitly convert the Series into a DataFrame by using something like pd.DataFrame(df.iloc[:, 0]). This doesn't seem like the most clean method. Is there a more elegant way to index from a DataFrame directly so that the result is a single-column DataFrame instead of Series?

As @Jeff mentions there are a few ways to do this, but I recommend using loc/iloc to be more explicit (and raise errors early if your trying something ambiguous):

    In [10]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

    In [11]: df
    Out[11]:
       A  B
    0  1  2
    1  3  4

    In [12]: df[['A']]

    In [13]: df[[0]]

    In [14]: df.loc[:, ['A']]

    In [15]: df.iloc[:, [0]]

    Out[12-15]:  # they all return the same thing:
       A
    0  1
    1  3

The latter two choices remove ambiguity in the case of integer column names (precisely why loc/iloc were created). For example:

    In [16]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 0])

    In [17]: df
    Out[17]:
       A  0
    0  1  2
    1  3  4

    In [18]: df[[0]]  # ambiguous
    Out[18]:
       A
    0  1
    1  3

From: stackoverflow.com/q/16782323

Back to homepage or read more recommendations: