Why is variable1 += variable2 much faster than variable1 = variable1 + variable2?
I have inherited some Python code which is used to create huge tables (of up to 19 columns wide by 5000 rows). It took nine seconds for the table to be drawn on the screen. I noticed that each row was added using this code:
sTable = sTable + '\n' + GetRow()
sTable is a string.
I changed that to:
sTable += '\n' + GetRow()
and I noticed that the table now appeared in six seconds.
And then I changed it to:
sTable += '\n%s' % GetRow()
based on these Python performance tips (still six seconds).
Since this was called about 5000 times, it highlighted the performance issue. But why was there such a large difference? And why didn't the compiler spot the problem in the first version and optimise it?
This isn't about using inplace
+ binary add. You didn't tell us the whole story. Your original version concatenated 3 strings, not just two:
sTable = sTable + '\n' + sRow # simplified, sRow is a function call
Python tries to help out and optimises string concatenation; both when using
strobj += otherstrobj and
strobj = strobj + otherstringobj, but it cannot apply this optimisation when more than 2 strings are involved.
Python strings are immutable normally , but if there are no other references to the left-hand string object and it is being rebound anyway, then Python cheats and mutates the string. This avoids having to create a new string each time you concatenate, and that can lead to a big speed improvement.
This is implemented in the bytecode evaluation loop. Both when using
BINARY_ADD on two strings and when using
INPLACE_ADD on two strings, Python delegates concatenation to a special helper function
string_concatenate(). To be able to optimize the concatenation by mutating the string, it first needs to make sure that the string has no other references to it; if only the stack and the original variable reference it then this can be done, and the next operation is going to replace the original variable reference.
So if there are just 2 references to the string, and the next operator is one of
STORE_FAST (set a local variable),
STORE_DEREF (set a variable referenced by closed over functions) or
STORE_NAME (set a global variable), and the affected variable currently references the same string, then that target variable is cleared to reduce the number of references to just 1, the stack.
And this is why your original code could not use this optimization fully. The first part of your expression is
sTable + '\n' and the next operation is _another
>>> import dis >>> dis.dis(compile(r"sTable = sTable + '\n' + sRow", '<stdin>', 'exec')) 1 0 LOAD_NAME 0 (sTable) 3 LOAD_CONST 0 ('\n') 6 BINARY_ADD 7 LOAD_NAME 1 (sRow) 10 BINARY_ADD 11 STORE_NAME 0 (sTable) 14 LOAD_CONST 1 (None) 17 RETURN_VALUE
BINARY_ADD is followed by a
LOAD_NAME to access the
sRow variable, not a store operation. This first
BINARY_ADD must always result in a new string object, ever larger as
sTable grows and it takes more and more time to create this new string object.
You changed this code to:
sTable += '\n%s' % sRow
which removed the second concatenation. Now the bytecode is:
>>> dis.dis(compile(r"sTable += '\n%s' % sRow", '<stdin>', 'exec')) 1 0 LOAD_NAME 0 (sTable) 3 LOAD_CONST 0 ('\n%s') 6 LOAD_NAME 1 (sRow) 9 BINARY_MODULO 10 INPLACE_ADD 11 STORE_NAME 0 (sTable) 14 LOAD_CONST 1 (None) 17 RETURN_VALUE
and all we have left is an
INPLACE_ADD followed by a store. Now
sTable can be altered in-place, not resulting in a ever larger new string object.
You'd have gotten the same speed difference with:
sTable = sTable + ('\n%s' % sRow)
A time trial shows the difference:
>>> import random >>> from timeit import timeit >>> testlist = [''.join([chr(random.randint(48, 127)) for _ in range(random.randrange(10, 30))]) for _ in range(1000)] >>> def str_threevalue_concat(lst): ... res = '' ... for elem in lst: ... res = res + '\n' + elem ... >>> def str_twovalue_concat(lst): ... res = '' ... for elem in lst: ... res = res + ('\n%s' % elem) ... >>> timeit('f(l)', 'from __main__ import testlist as l, str_threevalue_concat as f', number=10000) 6.196403980255127 >>> timeit('f(l)', 'from __main__ import testlist as l, str_twovalue_concat as f', number=10000) 2.3599119186401367
The moral of this story is that you should not be using string concatenation in the first place. The proper way to build a new string from loads of other strings is to use a list, then use
table_rows =  for something in something_else: table_rows += ['\n', GetRow()] sTable = ''.join(table_rows)
This is faster still:
>>> def str_join_concat(lst): ... res = ''.join(['\n%s' % elem for elem in lst]) ... >>> timeit('f(l)', 'from __main__ import testlist as l, str_join_concat as f', number=10000) 1.7978830337524414
but you cannot beat using just
>>> timeit('f(l)', 'from __main__ import testlist as l, nl_join_concat as f', number=10000) 0.23735499382019043