# What is the difference between contiguous and non-contiguous arrays?

In the numpy manual about the reshape() function, it says

```
>>> a = np.zeros((10, 2))
# A transpose make the array non-contiguous
>>> b = a.T
# Taking a view makes it possible to modify the shape without modifying the
# initial object.
>>> c = b.view()
>>> c.shape = (20)
AttributeError: incompatible shape for a non-contiguous array
```

My questions are:

- What are continuous and noncontiguous arrays? Is it similar to the contiguous memory block in C like What is a contiguous memory block?
- Is there any performance difference between these two? When should we use one or the other?
- Why does transpose make the array non-contiguous?
- Why does
`c.shape = (20)`

throws an error`incompatible shape for a non-contiguous array`

?

Thanks for your answer!

A contiguous array is just an array stored in an unbroken block of memory: to access the next value in the array, we just move to the next memory address.

Consider the 2D array `arr = np.arange(12).reshape(3,4)`

. It looks like this:

In the computer's memory, the values of `arr`

are stored like this:

This means `arr`

is a **C contiguous** array because the *rows* are stored as contiguous blocks of memory. The next memory address holds the next row value on that row. If we want to move down a column, we just need to jump over three blocks (e.g. to jump from 0 to 4 means we skip over 1,2 and 3).

Transposing the array with `arr.T`

means that C contiguity is lost because adjacent row entries are no longer in adjacent memory addresses. However, `arr.T`

is **Fortran contiguous** since the *columns* are in contiguous blocks of memory:

Performance-wise, accessing memory addresses which are next to each other is very often faster than accessing addresses which are more "spread out" (fetching a value from RAM could entail a number of neighbouring addresses being fetched and cached for the CPU.) This means that operations over contiguous arrays will often be quicker.

As a consequence of C contiguous memory layout, row-wise operations are usually faster than column-wise operations. For example, you'll typically find that

```
np.sum(arr, axis=1) # sum the rows
```

is slightly faster than:

```
np.sum(arr, axis=0) # sum the columns
```

Similarly, operations on columns will be slightly faster for Fortran contiguous arrays.

Finally, why can't we flatten the Fortran contiguous array by assigning a new shape?

```
>>> arr2 = arr.T
>>> arr2.shape = 12
AttributeError: incompatible shape for a non-contiguous array
```

In order for this to be possible NumPy would have to put the rows of `arr.T`

together like this:

(Setting the `shape`

attribute directly assumes C order - i.e. NumPy tries to perform the operation row-wise.)

This is impossible to do. For any axis, NumPy needs to have a *constant* stride length (the number of bytes to move) to get to the next element of the array. Flattening `arr.T`

in this way would require skipping forwards and backwards in memory to retrieve consecutive values of the array.

If we wrote `arr2.reshape(12)`

instead, NumPy would copy the values of arr2 into a new block of memory (since it can't return a view on to the original data for this shape).

From: stackoverflow.com/q/26998223