Explicit is better than implicit

python
numpy
A basic but not so obvious numpy example.
Author

Fabrizio Damicelli

Published

May 10, 2020

TL; DR: Only use the form array *= something if you’re 100% sure you are doing the right thing, otherwise, just go for array = array * something.

Let’s see why.
We define two functions that to the eyes of many (including past me) do just the same.

import numpy as np

def multiply(array, scalar):
    array *= scalar  # <-- handy short hand, right?  ;)
    return array

def multiply2(array, scalar):
    array = array * scalar
    return array

Let’s see them in action

a = np.arange(10.)  # dot casts to float to avoid type errors
b = np.arange(10.)
a
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
b
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
multiply(a, 2)
array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18.])
multiply(a, 2)
array([ 0.,  4.,  8., 12., 16., 20., 24., 28., 32., 36.])

Hey, wait! What’s going on?

a
array([ 0.,  4.,  8., 12., 16., 20., 24., 28., 32., 36.])
Warning

The operation modifies the array in place.

Let’s see what the other version of our function does.

multiply2(b, 2)
array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18.])
multiply2(b, 2)
array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18.])

This time the input array stays the same, ie., the modification remained in the scope of the function.

Despite it being very basic, it is actually more difficult to debug than for the toy example in real life cases.
For instance, in the middle of a long data preprocessing pipeline. If you load your data once and run the preprocessing pipeline once, you will probably not notice the bug (that’s the tricky thing!).
But if the loaded data are passed more than once through the pipeline (without reloading the whole data), each pass will be actually feeding different input.
For example, if you run K-Fold cross-validation, most likely it won’t crash or anything, but you will be passing K different datasets to your model and your validation will be just rubbish!

Conclusions:
- array *= something is very different from array = array * something
- You’d better be really sure of what you’re doing with array = array * something.

/Fin

Any bugs, questions, comments, suggestions? Ping me on twitter or drop me an e-mail (fabridamicelli at gmail).
Share this article on your favourite platform: