Difference between map, applymap and apply methods in Pandas
Can you tell me when to use these vectorization methods with basic examples?
I see that
map is a
Series method whereas the rest are
DataFrame methods. I got confused about
applymap methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!
Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:
In : frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon']) In : frame Out: b d e Utah -0.029638 1.081563 1.280300 Ohio 0.647747 0.831136 -1.549481 Texas 0.513416 -0.884417 0.195343 Oregon -0.485454 -0.477388 -0.309548 In : f = lambda x: x.max() - x.min() In : frame.apply(f) Out: b 1.133201 d 1.965980 e 2.829781 dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
In : format = lambda x: '%.2f' % x In : frame.applymap(format) Out: b d e Utah -0.03 1.08 1.28 Ohio 0.65 0.83 -1.55 Texas 0.51 -0.88 0.20 Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
In : frame['e'].map(format) Out: Utah 1.28 Ohio -1.55 Texas 0.20 Oregon -0.31 Name: e, dtype: object
apply works on a row / column basis of a DataFrame,
applymap works element-wise on a DataFrame, and
map works element-wise on a Series.