# Understanding NumPy's einsum

I'm struggling to understand exactly how `einsum`

works. I've looked at the documentation and a few examples, but it's not seeming to stick.

Here's an example we went over in class:

```
C = np.einsum("ij,jk->ki", A, B)
```

for two arrays`A`

and `B`

I think this would take `A^T * B`

, but I'm not sure (it's taking the transpose of one of them right?). Can anyone walk me through exactly what's happening here (and in general when using `einsum`

)?

(Note: this answer is based on a short blog post about `einsum`

I wrote a while ago.)

## What does `einsum`

do?

Imagine that we have two multi-dimensional arrays, `A`

and `B`

. Now let's suppose we want to...

*multiply*`A`

with`B`

in a particular way to create new array of products; and then maybe*sum*this new array along particular axes; and then maybe*transpose*the axes of the new array in a particular order.

There's a good chance that `einsum`

will help us do this faster and more memory-efficiently that combinations of the NumPy functions like `multiply`

, `sum`

and `transpose`

will allow.

## How does `einsum`

work?

Here's a simple (but not completely trivial) example. Take the following two arrays:

```
A = np.array([0, 1, 2])
B = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
```

We will multiply `A`

and `B`

element-wise and then sum along the rows of the new array. In "normal" NumPy we'd write:

```
>>> (A[:, np.newaxis] * B).sum(axis=1)
array([ 0, 22, 76])
```

So here, the indexing operation on `A`

lines up the first axes of the two arrays so that the multiplication can be broadcast. The rows of the array of products is then summed to return the answer.

Now if we wanted to use `einsum`

instead, we could write:

```
>>> np.einsum('i,ij->i', A, B)
array([ 0, 22, 76])
```

The *signature* string `'i,ij->i'`

is the key here and needs a little bit of explaining. You can think of it in two halves. On the left-hand side (left of the `->`

) we've labelled the two input arrays. To the right of `->`

, we've labelled the array we want to end up with.

Here is what happens next:

`A`

has one axis; we've labelled it`i`

. And`B`

has two axes; we've labelled axis 0 as`i`

and axis 1 as`j`

.By

**repeating**the label`i`

in both input arrays, we are telling`einsum`

that these two axes should be**multiplied**together. In other words, we're multiplying array`A`

with each column of array`B`

, just like`A[:, np.newaxis] * B`

does.Notice that

`j`

does not appear as a label in our desired output; we've just used`i`

(we want to end up with a 1D array). By**omitting**the label, we're telling`einsum`

to**sum**along this axis. In other words, we're summing the rows of the products, just like`.sum(axis=1)`

does.

That's basically all you need to know to use `einsum`

. It helps to play about a little; if we leave both labels in the output, `'i,ij->ij'`

, we get back a 2D array of products (same as `A[:, np.newaxis] * B`

). If we say no output labels, `'i,ij->`

, we get back a single number (same as doing `(A[:, np.newaxis] * B).sum()`

).

The great thing about `einsum`

however, is that is does not build a temporary array of products first; it just sums the products as it goes. This can lead to big savings in memory use.

## A slightly bigger example

To explain the dot product, here are two new arrays:

```
A = array([[1, 1, 1],
[2, 2, 2],
[5, 5, 5]])
B = array([[0, 1, 0],
[1, 1, 0],
[1, 1, 1]])
```

We will compute the dot product using `np.einsum('ij,jk->ik', A, B)`

. Here's a picture showing the labelling of the `A`

and `B`

and the output array that we get from the function:

You can see that label `j`

is repeated - this means we're multiplying the rows of `A`

with the columns of `B`

. Furthermore, the label `j`

is not included in the output - we're summing these products. Labels `i`

and `k`

are kept for the output, so we get back a 2D array.

It might be even clearer to compare this result with the array where the label `j`

is *not* summed. Below, on the left you can see the 3D array that results from writing `np.einsum('ij,jk->ijk', A, B)`

(i.e. we've kept label `j`

):

Summing axis `j`

gives the expected dot product, shown on the right.

## Some exercises

To get more of feel for `einsum`

, it can be useful to implement familiar NumPy array operations using the subscript notation. Anything that involves combinations of multiplying and summing axes can be written using `einsum`

.

Let A and B be two 1D arrays with the same length. For example, `A = np.arange(10)`

and `B = np.arange(5, 15)`

.

- The sum of
`A`

can be written:

```
np.einsum('i->', A)
```

- Element-wise multiplication,
`A * B`

, can be written:

```
np.einsum('i,i->i', A, B)
```

- The inner product or dot product,
`np.inner(A, B)`

or`np.dot(A, B)`

, can be written:

```
np.einsum('i,i->', A, B) # or just use 'i,i'
```

- The outer product,
`np.outer(A, B)`

, can be written:

```
np.einsum('i,j->ij', A, B)
```

For 2D arrays, `C`

and `D`

, provided that the axes are compatible lengths (both the same length or one of them of has length 1), here are a few examples:

- The trace of
`C`

(sum of main diagonal),`np.trace(C)`

, can be written:

```
np.einsum('ii', C)
```

- Element-wise multiplication of
`C`

and the transpose of`D`

,`C * D.T`

, can be written:

```
np.einsum('ij,ji->ij', C, D)
```

- Multiplying each element of
`C`

by the array`D`

(to make a 4D array),`C[:, :, None, None] * D`

, can be written:

```
np.einsum('ij,kl->ijkl', C, D)
```

From: stackoverflow.com/q/26089893