# What is the difference between contiguous and non-contiguous arrays?

In the numpy manual about the reshape() function, it says

```    >>> a = np.zeros((10, 2))
# A transpose make the array non-contiguous
>>> b = a.T
# Taking a view makes it possible to modify the shape without modifying the
# initial object.
>>> c = b.view()
>>> c.shape = (20)
AttributeError: incompatible shape for a non-contiguous array
```

My questions are:

1. What are continuous and noncontiguous arrays? Is it similar to the contiguous memory block in C like What is a contiguous memory block?
2. Is there any performance difference between these two? When should we use one or the other?
3. Why does transpose make the array non-contiguous?
4. Why does `c.shape = (20)` throws an error `incompatible shape for a non-contiguous array`?

A contiguous array is just an array stored in an unbroken block of memory: to access the next value in the array, we just move to the next memory address.

Consider the 2D array `arr = np.arange(12).reshape(3,4)`. It looks like this:

In the computer's memory, the values of `arr` are stored like this:

This means `arr` is a C contiguous array because the rows are stored as contiguous blocks of memory. The next memory address holds the next row value on that row. If we want to move down a column, we just need to jump over three blocks (e.g. to jump from 0 to 4 means we skip over 1,2 and 3).

Transposing the array with `arr.T` means that C contiguity is lost because adjacent row entries are no longer in adjacent memory addresses. However, `arr.T` is Fortran contiguous since the columns are in contiguous blocks of memory:

Performance-wise, accessing memory addresses which are next to each other is very often faster than accessing addresses which are more "spread out" (fetching a value from RAM could entail a number of neighbouring addresses being fetched and cached for the CPU.) This means that operations over contiguous arrays will often be quicker.

As a consequence of C contiguous memory layout, row-wise operations are usually faster than column-wise operations. For example, you'll typically find that

```    np.sum(arr, axis=1) # sum the rows
```

is slightly faster than:

```    np.sum(arr, axis=0) # sum the columns
```

Similarly, operations on columns will be slightly faster for Fortran contiguous arrays.

Finally, why can't we flatten the Fortran contiguous array by assigning a new shape?

```    >>> arr2 = arr.T
>>> arr2.shape = 12
AttributeError: incompatible shape for a non-contiguous array
```

In order for this to be possible NumPy would have to put the rows of `arr.T` together like this:

(Setting the `shape` attribute directly assumes C order - i.e. NumPy tries to perform the operation row-wise.)

This is impossible to do. For any axis, NumPy needs to have a constant stride length (the number of bytes to move) to get to the next element of the array. Flattening `arr.T` in this way would require skipping forwards and backwards in memory to retrieve consecutive values of the array.

If we wrote `arr2.reshape(12)` instead, NumPy would copy the values of arr2 into a new block of memory (since it can't return a view on to the original data for this shape).

From: stackoverflow.com/q/26998223

Back to homepage or read more recommendations: