What is the difference between a pandas Series and a single-column DataFrame?
Why does pandas make a distinction between a
Series and a single-column
In other words: what is the reason of existence of the
I'm mainly using time series with datetime index, maybe that helps to set the context.
Quoting the Pandas docs
pandas. DataFrame ( data=None, index=None, columns=None, dtype=None, copy=False )
Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure
(Emphasis mine, sentence fragment not mine)
So the Series is the datastructure for a single column of a
DataFrame , not only conceptually, but literally i.e. the data in a
DataFrame is actually stored in memory as a collection of
Analogously: We need both lists and matrices, because matrices are built with lists. Single row matricies, while equivalent to lists in functionality still cannot exists without the list(s) they're composed of.
They both have extremely similar APIs, but you'll find that
DataFrame methods always cater to the possibility that you have more than one column. And of course, you can always add another
Series (or equivalent object) to a
DataFrame, while adding a
Series to another
Series involves creating a