Combine two pandas Data Frames (join on a common column)

I have 2 dataframes:


    Data columns (total 13 columns):
    business_id      4503  non-null values
    categories       4503  non-null values
    city             4503  non-null values
    full_address     4503  non-null values
    latitude         4503  non-null values
    longitude        4503  non-null values
    name             4503  non-null values
    neighborhoods    4503  non-null values
    open             4503  non-null values
    review_count     4503  non-null values
    stars            4503  non-null values
    state            4503  non-null values
    type             4503  non-null values
    dtypes: bool(1), float64(3), int64(1), object(8)`



    Int64Index: 158430 entries, 0 to 229905
    Data columns (total 8 columns):
    business_id    158430  non-null values
    date           158430  non-null values
    review_id      158430  non-null values
    stars          158430  non-null values
    text           158430  non-null values
    type           158430  non-null values
    user_id        158430  non-null values
    votes          158430  non-null values
    dtypes: int64(1), object(7)

I would like to join these two DataFrames to make them into a single dataframe using the DataFrame.join() command in pandas.

I have tried the following line of code:

    #the following line of code creates a left join of restaurant_ids_frame and   restaurant_review_frame on the column 'business_id'

But when I try this I get the following error:

    Exception: columns overlap: Index([business_id, stars, type], dtype=object)

I am very new to pandas and have no clue what I am doing wrong as far as executing the join statement is concerned.

any help would be much appreciated.

You can use merge to combine two dataframes into one:

    import pandas as pd
    pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer')

where on specifies field name that exists in both dataframes to join on, and how defines whether its inner/outer/left/right join, with outer using 'union of keys from both frames (SQL: full outer join).' Since you have 'star' column in both dataframes, this by default will create two columns star_x and star_y in the combined dataframe. As @DanAllan mentioned for the join method, you can modify the suffixes for merge by passing it as a kwarg. Default is suffixes=('_x', '_y'). if you wanted to do something like star_restaurant_id and star_restaurant_review, you can do:

     pd.merge(restaurant_ids_dataframe, restaurant_review_frame, on='business_id', how='outer', suffixes=('_restaurant_id', '_restaurant_review'))

The parameters are explained in detail in this link.