One may find her/himself creating scripts with initial and final instances of datetime.now(), executing a block of code in between them and then obtaining a delta between the two timestamps, in order to understand how long the code will take for its execution, or measuring performance between two or more approaches. I remember doing this in my early days of programming. It's not a bad approach: it actually works pretty well, but Python is a "batteries-included" language, with an extensive set of libraries available on its installation, like timeit, that will do the job for plenty of tasks that we come across.
Timeit does an excellent job, indeed. It executes a block of code N times and it repeats this loop R times as well, providing the best execution time while, for example, running a piece of code 100000 times, in 5 separated rounds of execution.
Running via Command Line
Here are some quick examples of how to use via command line.
Default values
$ python3 -m timeit "'foo' + 'bar'" | |
50000000 loops, best of 5: 5.31 nsec per loop |
Defining 10 rounds
$ python3 -m timeit -r 10 "'foo' + 'bar'" | |
50000000 loops, best of 10: 5.29 nsec per loop |
Defining 10 rounds of 100000 executions
$ python3 -m timeit -r 10 -n 100000 "'foo' + 'bar'" | |
100000 loops, best of 10: 5.28 nsec per loop |
With this understanding, for example, we can compare the different methods of joining strings, and how efficient each one them can be:
$ python3 -m timeit -r 10 -n 100000 "''.join(('foo', 'bar'))" | |
100000 loops, best of 10: 42.5 nsec per loop |
$ python3 -m timeit -r 10 -n 100000 "'%s, %s' % ('foo', 'bar')" | |
100000 loops, best of 10: 63.1 nsec per loop |
Testing Module Functions and Comparing Approaches
Here's a more formal approach, testing a module and a couple of functions from it. I'm now comparing 3 different ways of joining strings, being it via concatenation or interpolation:def test_str_1() -> str: | |
return 'foo' + 'bar' | |
def test_str_2() -> str: | |
return ''.join(('foo', 'bar')) | |
def test_str_3() -> str: | |
return '%s, %s' % ('foo', 'bar') | |
if __name__ == "__main__": | |
import timeit | |
r = 10 | |
n = 100000 | |
print("test_str_1: ", timeit.repeat(setup="from __main__ import test_str_1", stmt="test_str_1()", number=n, repeat=r)) | |
print("test_str_2: ", timeit.repeat(setup="from __main__ import test_str_2", stmt="test_str_2()", number=n, repeat=r)) | |
print("test_str_3: ", timeit.repeat(setup="from __main__ import test_str_3", stmt="test_str_3()", number=n, repeat=r)) | |
# $ python3 ex_6.py | |
# test_str_1: [0.003694088023621589, 0.0037375990068539977, 0.003711616969667375, 0.0036614019772969186, 0.0036590969539247453, 0.003655606007669121, 0.003692480968311429, 0.0037061700131744146, 0.0036806080024689436, 0.003687592048663646] | |
# test_str_2: [0.008561848953831941, 0.008489075000397861, 0.00847748201340437, 0.00858063600026071, 0.008465468999929726, 0.008496129012200981, 0.008449406013824046, 0.008478877949528396, 0.008490820997394621, 0.008434668998233974] | |
# test_str_3: [0.013133066997397691, 0.013086761988233775, 0.013070349988993257, 0.013110997970215976, 0.01309297897387296, 0.013118749018758535, 0.013096819049678743, 0.013113581982906908, 0.013256893958896399, 0.013094236026518047] | |
Seems that the first approach is more efficient, indeed. To my surprise, I've always thought that ''.join() would be more efficient than normal string concatenation. Worth to notice that each execution of timeit.repeat(), returns a list, that we can use as series for plotting on a chart, so we can have a more pictorial presentation of time differences between each approach.
MORE Comparisons...
Lists have different ways of being extended, incremented, etc. Let's review this too.
def list_concat_1(): | |
l = ["a", "b", "c"] | |
return l + ["d", "e", "f"] | |
def list_concat_2(): | |
l = ["a", "b", "c"] | |
l += ["d", "e", "f"] | |
return l | |
def list_concat_3(): | |
l = ["a", "b", "c"] | |
l.extend(["d", "e", "f"]) | |
return l | |
if __name__ == "__main__": | |
import timeit | |
r = 10 | |
n = 100000 | |
print("list_concat_1: ", timeit.repeat(setup="from __main__ import list_concat_1", stmt="list_concat_1()", number=n, repeat=r)) | |
print("list_concat_2: ", timeit.repeat(setup="from __main__ import list_concat_2", stmt="list_concat_2()", number=n, repeat=r)) | |
print("list_concat_3: ", timeit.repeat(setup="from __main__ import list_concat_3", stmt="list_concat_3()", number=n, repeat=r)) | |
# $ python3 ex_7.py | |
# list_concat_1: [0.013129231985658407, 0.013070706045255065, 0.012952743971254677, 0.012848541024141014, 0.012713467003777623, 0.012749644985888153, 0.012814807007089257, 0.0128303820383735, 0.012878363020718098, 0.012793435947969556] | |
# list_concat_2: [0.01266855897847563, 0.01276836299803108, 0.012766197032760829, 0.013836722995620221, 0.012670096999499947, 0.012525454978458583, 0.0126781280268915, 0.01254989899462089, 0.012621487025171518, 0.012537747039459646] | |
# list_concat_3: [0.01478160498663783, 0.016013532993383706, 0.015955634997226298, 0.014748011017218232, 0.014714905992150307, 0.01476882299175486, 0.015143102034926414, 0.015066416002810001, 0.014712321979459375, 0.014758766046725214] |
There isn't so much difference but, it's fair to say that extend() isn't the fastest method of extending a list. For better spotting the difference (which is pretty small in nanoseconds), let's plot these series on a chart.
Plotting
Just a couple of general settings for a Matplotlib chart, easily configured by looking at Matplotlib official documentation:def list_concat_1(): | |
l = ["a", "b", "c"] | |
return l + ["d", "e", "f"] | |
def list_concat_2(): | |
l = ["a", "b", "c"] | |
l += ["d", "e", "f"] | |
return l | |
def list_concat_3(): | |
l = ["a", "b", "c"] | |
l.extend(["d", "e", "f"]) | |
return l | |
if __name__ == "__main__": | |
import matplotlib.pyplot as plt | |
import timeit | |
r = 10 | |
n = 1000000 | |
list_concat_1 = timeit.repeat(setup="from __main__ import list_concat_1", stmt="list_concat_1()", number=n, repeat=r) | |
list_concat_2 = timeit.repeat(setup="from __main__ import list_concat_2", stmt="list_concat_2()", number=n, repeat=r) | |
list_concat_3 = timeit.repeat(setup="from __main__ import list_concat_3", stmt="list_concat_3()", number=n, repeat=r) | |
f = plt.figure() | |
f.set_figwidth(10) | |
f.set_figheight(5) | |
plt.plot(list_concat_1, marker='o', label="sum") | |
plt.plot(list_concat_2, marker='o', label="incremented") | |
plt.plot(list_concat_3, marker='o', label="extended") | |
plt.title(f"List Extension Methods and Best Times from {n} Executions in {r} Rounds") | |
plt.xlabel("Rounds") | |
plt.ylabel("Nanoseconds (nsecs)") | |
plt.xticks(range(0, r)) | |
plt.legend() | |
plt.show() |
Here the results:
Indeed, incrementing is the most efficient method for extending a list.
But what about the joining of strings through concatenation or interpolation?
Here's an adaptation of the previous code, but using the string functions observed before:
def test_str_1() -> str: | |
return 'foo' + 'bar' | |
def test_str_2() -> str: | |
return ''.join(('foo', 'bar')) | |
def test_str_3() -> str: | |
return '%s, %s' % ('foo', 'bar') | |
if __name__ == "__main__": | |
import matplotlib.pyplot as plt | |
import timeit | |
r = 10 | |
n = 1000000 | |
test_str_1 = timeit.repeat(setup="from __main__ import test_str_1", stmt="test_str_1()", number=n, repeat=r) | |
test_str_2 = timeit.repeat(setup="from __main__ import test_str_2", stmt="test_str_2()", number=n, repeat=r) | |
test_str_3 = timeit.repeat(setup="from __main__ import test_str_3", stmt="test_str_3()", number=n, repeat=r) | |
f = plt.figure() | |
f.set_figwidth(10) | |
f.set_figheight(5) | |
plt.plot(test_str_1, marker='o', label="sum") | |
plt.plot(test_str_2, marker='o', label="join") | |
plt.plot(test_str_3, marker='o', label="% interpolation") | |
plt.title(f"List Extension Methods and Best Times from {n} Executions in {r} Rounds") | |
plt.xlabel("Rounds") | |
plt.ylabel("Units of Seconds") | |
plt.xticks(range(0, r)) | |
plt.legend() | |
plt.show() |
And the difference in nanoseconds scale, is even bigger:
Any time difference represents a big difference, when we think about millions of executions of a instruction, even for the list extension methods. Using the most efficient method in order to achieve something, it's a practice to be maintained and it always pays off. But if the code isn't repeated so many times, it's ok to use whichever method you like, as long as the code is clean and objective (and documented, please),
Final Words
After using timeit, it's hard to think about other methods for measuring the time efficiency of blocks of code, or even small pieces of code. It's not intended to be a full profiler tool, but at least, gives you a clear perspective of how fast a approach or method can be, spotting minimal differences between similar approaches, and these small differences, will certainly represent more time consumption to your program, if the code is exposed to thousands of iterations.