What is the difference between join and merge in Pandas?

Suppose I have two DataFrames like so:

    left = pd.DataFrame({'key1': ['foo', 'bar'], 'lval': [1, 2]})

    right = pd.DataFrame({'key2': ['foo', 'bar'], 'rval': [4, 5]})

I want to merge them, so I try something like this:

    pd.merge(left, right, left_on='key1', right_on='key2')

And I'm happy

        key1    lval    key2    rval
    0   foo     1       foo     4
    1   bar     2       bar     5

But I'm trying to use the join method, which I've been lead to believe is pretty similar.

    left.join(right, on=['key1', 'key2'])

And I get this:

    //anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _validate_specification(self)
        406             if self.right_index:
        407                 if not ((len(self.left_on) == self.right.index.nlevels)):
    --> 408                     raise AssertionError()
        409                 self.right_on = [None] * n
        410         elif self.right_on is not None:

    AssertionError:

What am I missing?

I always use join on indices:

    import pandas as pd
    left = pd.DataFrame({'key': ['foo', 'bar'], 'val': [1, 2]}).set_index('key')
    right = pd.DataFrame({'key': ['foo', 'bar'], 'val': [4, 5]}).set_index('key')
    left.join(right, lsuffix='_l', rsuffix='_r')

         val_l  val_r
    key            
    foo      1      4
    bar      2      5

The same functionality can be had by using merge on the columns follows:

    left = pd.DataFrame({'key': ['foo', 'bar'], 'val': [1, 2]})
    right = pd.DataFrame({'key': ['foo', 'bar'], 'val': [4, 5]})
    left.merge(right, on=('key'), suffixes=('_l', '_r'))

       key  val_l  val_r
    0  foo      1      4
    1  bar      2      5

From: stackoverflow.com/q/22676081

Back to homepage or read more recommendations: