hash function in Python 3.3 returns different results between sessions
I've implemented a BloomFilter in python 3.3, and got different results every session. Drilling down this weird behavior got me to the internal hash() function - it returns different hash values for the same string every session.
>>> hash("235") -310569535015251310
----- opening a new python console -----
>>> hash("235") -1900164331622581997
Why is this happening? Why is this useful?
Python uses a random hash seed to prevent attackers from tar-pitting your application by sending you keys designed to collide. See the original vulnerability disclosure. By offsetting the hash with a random seed (set once at startup) attackers can no longer predict what keys will collide.
You can set a fixed seed or disable the feature by setting the
PYTHONHASHSEED environment variable; the default is
random but you can set it to a fixed positive integer value, with
0 disabling the feature altogether.
Python versions 2.7 and 3.2 have the feature disabled by default (use the
-R switch or set
PYTHONHASHSEED=random to enable it); it is enabled by default in Python 3.3 and up.
If you were relying on the order of keys in a Python dictionary or set, then don't. Python uses a hash table to implement these types and their order depends on the insertion and deletion history as well as the random hash seed.
Also see the
object.__hash__() special method documentation:
Note : By default, the
__hash__()values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.
This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.
Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).
Since the offset consists of a prefix and a suffix (start value and final XORed value, respectively) you cannot just store the offset, unfortunately. On the plus side, this does mean that attackers cannot easily determine the offset with timing attacks either.