why numpy's where operation is faster than apply function?

  • Last Update :
  • Techknowledgy :

This call to apply is row-wise iteration:

df["log2FC"] = df.apply(lambda x: np.log2(x["C2Mean"] / x["C1Mean"]) if x["C1Mean"] > 0
   else np.log2(x["C2Mean"]), axis = 1)

Your other snippet

df["log2FC"] = np.where(df["C1Mean"] == 0,
   np.log2(df["C2Mean"]),
   np.log2(df["C2Mean"] / df["C1Mean"]))

Your calls to np.log2 are meaningless in this context as you pass scalar values:

 np.log2(x["C2Mean"] / x["C1Mean"])

Suggestion : 2

Last Updated : 13 Aug, 2021,GATE CS 2021 Syllabus

Output: 

Time taken by Lists: 1.1984527111053467 seconds
Time taken by NumPy Arrays: 0.13434123992919922 seconds

Output:

Concatenation:
   Time taken by Lists: 0.02946329116821289 seconds
Time taken by NumPy Arrays: 0.011709213256835938 seconds

Dot Product:
   Time taken by Lists: 0.179551362991333 seconds
Time taken by NumPy Arrays: 0.004144191741943359 seconds

Scalar Addition:
   Time taken by Lists: 0.09385180473327637 seconds
Time taken by NumPy Arrays: 0.005884408950805664 seconds

Deletion:
   Time taken by Lists: 0.01268625259399414 seconds
Time taken by NumPy Arrays: 3.814697265625e-06 seconds

Suggestion : 3

In fact, most of the functions you call using NumPy in your python code are merely wrappers for underlying code in C where most of the heavy lifting happens. In this way, NumPy can move the execution of loops to C, which is much more efficient than Python when it comes to looping. Notice this can be only done as the array enforces the elements of the array to be of the same kind. Otherwise, it would not be possible to convert the Python data types to native C ones to be executed under the hood.,In order to really appreciate the speed boosts NumPy provides, we must come up with a way to measure the running time of a piece of code. ,Times on your machine may differ depending upon processing power and other tasks running in background. But you will nevertheless notice considerable speedups to the tune of about 20-30x when using the NumPy's vectorized solution.,In this series I will cover best practices on how to speed up your code using NumPy, how to make use of features like vectorization and broadcasting, when to ditch specialized features in favor of vanilla Python offerings, and a case study where we will use NumPy to write a fast implementation of the K-Means clustering algorithm.

arr = np.arange(12).reshape(3, 4)

col_vector = np.array([5, 6, 7])

num_cols = arr.shape[1]

for col in range(num_cols):
   arr[: , col] += col_vector

Suggestion : 4

Mathematical functions for fast operations on entire arrays of data without having to write loops.,Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays applies the operation element-wise:,NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.,NumPy operations perform complex computations on entire arrays without the need for Python for loops.

The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data. For example, a list is a good candidate for conversion:

In[19]: data1 = [6, 7.5, 8, 0, 1]

In[20]: arr1 = np.array(data1)

In[21]: arr1
Out[21]: array([6., 7.5, 8., 0., 1.])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In[22]: data2 = [
   [1, 2, 3, 4],
   [5, 6, 7, 8]
]

In[23]: arr2 = np.array(data2)

In[24]: arr2
Out[24]:
   array([
      [1, 2, 3, 4],
      [5, 6, 7, 8]
   ])

Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape inferred from the data. We can confirm this by inspecting the ndim and shape attributes:

In[25]: arr2.ndim
Out[25]: 2

In[26]: arr2.shape
Out[26]: (2, 4)

In addition to np.array, there are a number of other functions for creating new arrays. As examples, zeros and ones create arrays of 0s or 1s, respectively, with a given length or shape. empty creates an array without initializing its values to any particular value. To create a higher dimensional array with these methods, pass a tuple for the shape:

In[29]: np.zeros(10)
Out[29]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In[30]: np.zeros((3, 6))
Out[30]:
   array([
      [0., 0., 0., 0., 0., 0.],
      [0., 0., 0., 0., 0., 0.],
      [0., 0., 0., 0., 0., 0.]
   ])

In[31]: np.empty((2, 3, 2))
Out[31]:
   array([
      [
         [0., 0.],
         [0., 0.],
         [0., 0.]
      ],
      [
         [0., 0.],
         [0., 0.],
         [0., 0.]
      ]
   ])

arange is an array-valued version of the built-in Python range function:

In[32]: np.arange(15)
Out[32]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

The data type or dtype is a special object containing the information (or metadata, data about data) the ndarray needs to interpret a chunk of memory as a particular type of data:

In[33]: arr1 = np.array([1, 2, 3], dtype = np.float64)

In[34]: arr2 = np.array([1, 2, 3], dtype = np.int32)

In[35]: arr1.dtype
Out[35]: dtype('float64')

In[36]: arr2.dtype
Out[36]: dtype('int32')

You can explicitly convert or cast an array from one dtype to another using ndarray’s astype method:

In[37]: arr = np.array([1, 2, 3, 4, 5])

In[38]: arr.dtype
Out[38]: dtype('int64')

In[39]: float_arr = arr.astype(np.float64)

In[40]: float_arr.dtype
Out[40]: dtype('float64')

In this example, integers were cast to floating point. If I cast some floating-point numbers to be of integer dtype, the decimal part will be truncated:

In[41]: arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])

In[42]: arr
Out[42]: array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])

In[43]: arr.astype(np.int32)
Out[43]: array([3, -1, -2, 0, 12, 10], dtype = int32)

You can also use another array’s dtype attribute:

In[46]: int_array = np.arange(10)

In[47]: calibers = np.array([.22, .270, .357, .380, .44, .50], dtype = np.float64)

In[48]: int_array.astype(calibers.dtype)
Out[48]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

There are shorthand type code strings you can also use to refer to a dtype:

In[49]: empty_uint32 = np.empty(8, dtype = 'u4')

In[50]: empty_uint32
Out[50]:
   array([0, 1075314688, 0, 1075707904, 0,
      1075838976, 0, 1072693248
   ], dtype = uint32)

Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays applies the operation element-wise:

In[51]: arr = np.array([
   [1., 2., 3.],
   [4., 5., 6.]
])

In[52]: arr
Out[52]:
   array([
      [1., 2., 3.],
      [4., 5., 6.]
   ])

In[53]: arr * arr
Out[53]:
   array([
      [1., 4., 9.],
      [16., 25., 36.]
   ])

In[54]: arr - arr
Out[54]:
   array([
      [0., 0., 0.],
      [0., 0., 0.]
   ])

Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In[55]: 1 / arr
Out[55]:
   array([
      [1., 0.5, 0.3333],
      [0.25, 0.2, 0.1667]
   ])

In[56]: arr ** 0.5
Out[56]:
   array([
      [1., 1.4142, 1.7321],
      [2., 2.2361, 2.4495]
   ])

Comparisons between arrays of the same size yield boolean arrays:

In[57]: arr2 = np.array([
   [0., 4., 1.],
   [7., 2., 12.]
])

In[58]: arr2
Out[58]:
   array([
      [0., 4., 1.],
      [7., 2., 12.]
   ])

In[59]: arr2 > arr
Out[59]:
   array([
      [False, True, False],
      [True, False, True]
   ])

NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:

In[60]: arr = np.arange(10)

In[61]: arr
Out[61]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In[62]: arr[5]
Out[62]: 5

In[63]: arr[5: 8]
Out[63]: array([5, 6, 7])

In[64]: arr[5: 8] = 12

In[65]: arr
Out[65]: array([0, 1, 2, 3, 4, 12, 12, 12, 8, 9])

To give an example of this, I first create a slice of arr:

In[66]: arr_slice = arr[5: 8]

In[67]: arr_slice
Out[67]: array([12, 12, 12])

Now, when I change values in arr_slice, the mutations are reflected in the original array arr:

In[68]: arr_slice[1] = 12345

In[69]: arr
Out[69]:
   array([0, 1, 2, 3, 4, 12, 12345, 12, 8,
      9
   ])

With higher dimensional arrays, you have many more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:

In[72]: arr2d = np.array([
   [1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]
])

In[73]: arr2d[2]
Out[73]: array([7, 8, 9])

Thus, individual elements can be accessed recursively. But that is a bit too much work, so you can pass a comma-separated list of indices to select individual elements. So these are equivalent:

In[74]: arr2d[0][2]
Out[74]: 3

In[75]: arr2d[0, 2]
Out[75]: 3

Like one-dimensional objects such as Python lists, ndarrays can be sliced with the familiar syntax:

In[88]: arr
Out[88]: array([0, 1, 2, 3, 4, 64, 64, 64, 8, 9])

In[89]: arr[1: 6]
Out[89]: array([1, 2, 3, 4, 64])

Consider the two-dimensional array from before, arr2d. Slicing this array is a bit different:

In[90]: arr2d
Out[90]:
   array([
      [1, 2, 3],
      [4, 5, 6],
      [7, 8, 9]
   ])

In[91]: arr2d[: 2]
Out[91]:
   array([
      [1, 2, 3],
      [4, 5, 6]
   ])

You can pass multiple slices just like you can pass multiple indexes:

In[92]: arr2d[: 2, 1: ]
Out[92]:
   array([
      [2, 3],
      [5, 6]
   ])

Similarly, I can select the third column but only the first two rows like so:

In[94]: arr2d[: 2, 2]
Out[94]: array([3, 6])

See Figure 4-2 for an illustration. Note that a colon by itself means to take the entire axis, so you can slice only higher dimensional axes by doing:

In[95]: arr2d[: ,: 1]
Out[95]:
   array([
      [1],
      [4],
      [7]
   ])

Let’s consider an example where we have some data in an array and an array of names with duplicates. I’m going to use here the randn function in numpy.random to generate some random normally distributed data:

In[98]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In[99]: data = np.random.randn(7, 4)

In[100]: names
Out[100]: array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype = '<U4')

In[101]: data
Out[101]:
   array([
      [0.0929, 0.2817, 0.769, 1.2464],
      [1.0072, -1.2962, 0.275, 0.2289],
      [1.3529, 0.8864, -2.0016, -0.3718],
      [1.669, -0.4386, -0.5397, 0.477],
      [3.2489, -1.0212, -0.5771, 0.1241],
      [0.3026, 0.5238, 0.0009, 1.3438],
      [-0.7135, -0.8312, -2.3702, -1.8608]
   ])

Suppose each name corresponds to a row in the data array and we wanted to select all the rows with corresponding name 'Bob'. Like arithmetic operations, comparisons (such as ==) with arrays are also vectorized. Thus, comparing names with the string 'Bob' yields a boolean array:

In[102]: names == 'Bob'
Out[102]: array([True, False, False, True, False, False, False])

This boolean array can be passed when indexing the array:

In[103]: data[names == 'Bob']
Out[103]:
   array([
      [0.0929, 0.2817, 0.769, 1.2464],
      [1.669, -0.4386, -0.5397, 0.477]
   ])

To select everything but 'Bob', you can either use != or negate the condition using ~:

In[106]: names != 'Bob'
Out[106]: array([False, True, True, False, True, True, True])

In[107]: data[~(names == 'Bob')]
Out[107]:
   array([
      [1.0072, -1.2962, 0.275, 0.2289],
      [1.3529, 0.8864, -2.0016, -0.3718],
      [3.2489, -1.0212, -0.5771, 0.1241],
      [0.3026, 0.5238, 0.0009, 1.3438],
      [-0.7135, -0.8312, -2.3702, -1.8608]
   ])

The ~ operator can be useful when you want to invert a general condition:

In[108]: cond = names == 'Bob'

In[109]: data[~cond]
Out[109]:
   array([
      [1.0072, -1.2962, 0.275, 0.2289],
      [1.3529, 0.8864, -2.0016, -0.3718],
      [3.2489, -1.0212, -0.5771, 0.1241],
      [0.3026, 0.5238, 0.0009, 1.3438],
      [-0.7135, -0.8312, -2.3702, -1.8608]
   ])