# Use .corr to get the correlation between two columns

I have the following pandas dataframe `Top15`

:

I create a column that estimates the number of citable documents per person:

```
Top15['PopEst'] = Top15['Energy Supply'] / Top15['Energy Supply per Capita']
Top15['Citable docs per Capita'] = Top15['Citable documents'] / Top15['PopEst']
```

I want to know the correlation between the number of citable documents per capita and the energy supply per capita. So I use the `.corr()`

method (Pearson's correlation):

```
data = Top15[['Citable docs per Capita','Energy Supply per Capita']]
correlation = data.corr(method='pearson')
```

I want to return a single number, but the result is:

Without actual data it is hard to answer the question but I guess you are looking for something like this:

```
Top15['Citable docs per Capita'].corr(Top15['Energy Supply per Capita'])
```

That calculates the correlation between your two columns `'Citable docs per Capita'`

and `'Energy Supply per Capita'`

.

To give an example:

```
import pandas as pd
df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})
A B
0 0 0
1 1 2
2 2 4
3 3 6
```

Then

```
df['A'].corr(df['B'])
```

gives `1`

as expected.

Now, if you change a value, e.g.

```
df.loc[2, 'B'] = 4.5
A B
0 0 0.0
1 1 2.0
2 2 4.5
3 3 6.0
```

the command

```
df['A'].corr(df['B'])
```

returns

```
0.99586
```

which is still close to 1, as expected.

If you apply `.corr`

directly to your dataframe, it will return all pairwise correlations between your columns; that's why you then observe `1s`

at the diagonal of your matrix (each column is perfectly correlated with itself).

```
df.corr()
```

will therefore return

```
A B
A 1.000000 0.995862
B 0.995862 1.000000
```

In the graphic you show, only the upper left corner of the correlation matrix is represented (I assume).

There can be cases, where you get `NaN`

s in your solution - check this post for an example.

If you want to filter entries above/below a certain threshold, you can check this question. If you want to plot a heatmap of the correlation coefficients, you can check this answer and if you then run into the issue with overlapping axis-labels check the following post.

From: stackoverflow.com/q/42579908