# When is hash(n) == n in Python?

I've been playing with Python's hash function. For small integers, it appears `hash(n) == n`

always. However this does not extend to large numbers:

```
>>> hash(2**100) == 2**100
False
```

I'm not surprised, I understand hash takes a finite range of values. What is that range?

I tried using binary search to find the smallest number `hash(n) != n`

```
>>> import codejamhelpers # pip install codejamhelpers
>>> help(codejamhelpers.binary_search)
Help on function binary_search in module codejamhelpers.binary_search:
binary_search(f, t)
Given an increasing function :math:`f`, find the greatest non-negative integer :math:`n` such that :math:`f(n) \le t`. If :math:`f(n) > t` for all :math:`n \ge 0`, return None.
>>> f = lambda n: int(hash(n) != n)
>>> n = codejamhelpers.binary_search(f, 0)
>>> hash(n)
2305843009213693950
>>> hash(n+1)
0
```

What's special about 2305843009213693951? I note it's less than `sys.maxsize == 9223372036854775807`

Edit: I'm using Python 3. I ran the same binary search on Python 2 and got a different result 2147483648, which I note is `sys.maxint+1`

I also played with `[hash(random.random()) for i in range(10**6)]`

to estimate the range of hash function. The max is consistently below n above. Comparing the min, it seems Python 3's hash is always positively valued, whereas Python 2's hash can take negative values.

Based on python documentation in `pyhash.c`

file:

For numeric types, the hash of a number x is based on the reduction of x modulo the prime

`P = 2**_PyHASH_BITS - 1`

. It's designed so that`hash(x) == hash(y)`

whenever x and y are numerically equal, even if x and y have different types.

So for a 64/32 bit machine, the reduction would be 2 _PyHASH_BITS - 1, but what is `_PyHASH_BITS`

?

You can find it in `pyhash.h`

header file which for a 64 bit machine has been defined as 61 (you can read more explanation in `pyconfig.h`

file).

```
#if SIZEOF_VOID_P >= 8
# define _PyHASH_BITS 61
#else
# define _PyHASH_BITS 31
#endif
```

So first off all it's based on your platform for example in my 64bit Linux platform the reduction is 261-1, which is `2305843009213693951`

:

```
>>> 2**61 - 1
2305843009213693951
```

Also You can use `math.frexp`

in order to get the mantissa and exponent of `sys.maxint`

which for a 64 bit machine shows that max int is 263:

```
>>> import math
>>> math.frexp(sys.maxint)
(0.5, 64)
```

And you can see the difference by a simple test:

```
>>> hash(2**62) == 2**62
True
>>> hash(2**63) == 2**63
False
```

Read the complete documentation about python hashing algorithm https://github.com/python/cpython/blob/master/Python/pyhash.c#L34

As mentioned in comment you can use `sys.hash_info`

(in python 3.X) which will give you a struct sequence of parameters used for computing hashes.

```
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>>
```

Alongside the modulus that I've described in preceding lines, you can also get the `inf`

value as following:

```
>>> hash(float('inf'))
314159
>>> sys.hash_info.inf
314159
```

From: stackoverflow.com/q/37612524

**★ Back to homepage or read more recommendations:**