list() uses more memory than list comprehension
So i was playing with
list objects and found little strange thing that if
list is created with
list() it uses more memory, than list comprehension? I'm using Python 3.5.2
In : import sys In : a = list(range(100)) In : sys.getsizeof(a) Out: 1008 In : b = [i for i in range(100)] In : sys.getsizeof(b) Out: 912 In : type(a) == type(b) Out: True In : a == b Out: True In : sys.getsizeof(list(b)) Out: 1008
From the docs:
Lists may be constructed in several ways:
- Using a pair of square brackets to denote the empty list:
- Using square brackets, separating items with commas:
[a, b, c]
- Using a list comprehension:
[x for x in iterable]
- Using the type constructor:
But it seems that using
list() it uses more memory.
And as much
list is bigger, the gap increases.
Why this happens?
Test with Python 3.6.0b2:
Python 3.6.0b2 (default, Oct 11 2016, 11:52:53) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getsizeof(list(range(100))) 1008 >>> sys.getsizeof([i for i in range(100)]) 912
Test with Python 2.7.12:
Python 2.7.12 (default, Jul 1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getsizeof(list(xrange(100))) 1016 >>> sys.getsizeof([i for i in xrange(100)]) 920
I think you're seeing over-allocation patterns this is a sample from the source:
/* This over-allocates proportional to the list size, making room * for additional growth. The over-allocation is mild, but is * enough to give linear-time amortized behavior over a long * sequence of appends() in the presence of a poorly-performing * system realloc(). * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ... */ new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
Printing the sizes of list comprehensions of lengths 0-88 you can see the pattern matches:
# create comprehensions for sizes 0-88 comprehensions = [sys.getsizeof([1 for _ in range(l)]) for l in range(90)] # only take those that resulted in growth compared to previous length steps = zip(comprehensions, comprehensions[1:]) growths = [x for x in list(enumerate(steps)) if x != x] # print the results: for growth in growths: print(growth)
Results (format is
(list length, (old total size, new total size))):
(0, (64, 96)) (4, (96, 128)) (8, (128, 192)) (16, (192, 264)) (25, (264, 344)) (35, (344, 432)) (46, (432, 528)) (58, (528, 640)) (72, (640, 768)) (88, (768, 912))
The over-allocation is done for performance reasons allowing lists to grow without allocating more memory with every growth (better amortized performance).
A probable reason for the difference with using list comprehension, is that list comprehension can not deterministically calculate the size of the generated list, but
list() can. This means comprehensions will continuously grow the list as it fills it using over-allocation until finally filling it.
It is possible that is will not grow the over-allocation buffer with unused allocated nodes once its done (in fact, in most cases it wont, that would defeat the over-allocation purpose).
list(), however, can add some buffer no matter the list size since it knows the final list size in advance.
Another backing evidence, also from the source, is that we see list comprehensions invoking
LIST_APPEND, which indicates usage of
list.resize, which in turn indicates consuming the pre-allocation buffer without knowing how much of it will be filled. This is consistent with the behavior you're seeing.
list() will pre-allocate more nodes as a function of the list size
>>> sys.getsizeof(list([1,2,3])) 60 >>> sys.getsizeof(list([1,2,3,4])) 64
List comprehension does not know the list size so it uses append operations as it grows, depleting the pre-allocation buffer:
# one item before filling pre-allocation buffer completely >>> sys.getsizeof([i for i in [1,2,3]]) 52 # fills pre-allocation buffer completely # note that size did not change, we still have buffered unused nodes >>> sys.getsizeof([i for i in [1,2,3,4]]) 52 # grows pre-allocation buffer >>> sys.getsizeof([i for i in [1,2,3,4,5]]) 68