This call to apply
is row-wise iteration:
df["log2FC"] = df.apply(lambda x: np.log2(x["C2Mean"] / x["C1Mean"]) if x["C1Mean"] > 0
else np.log2(x["C2Mean"]), axis = 1)
Your other snippet
df["log2FC"] = np.where(df["C1Mean"] == 0,
np.log2(df["C2Mean"]),
np.log2(df["C2Mean"] / df["C1Mean"]))
Your calls to np.log2
are meaningless in this context as you pass scalar values:
np.log2(x["C2Mean"] / x["C1Mean"])
Last Updated : 13 Aug, 2021,GATE CS 2021 Syllabus
Output:
Time taken by Lists: 1.1984527111053467 seconds
Time taken by NumPy Arrays: 0.13434123992919922 seconds
Output:
Concatenation: Time taken by Lists: 0.02946329116821289 seconds Time taken by NumPy Arrays: 0.011709213256835938 seconds Dot Product: Time taken by Lists: 0.179551362991333 seconds Time taken by NumPy Arrays: 0.004144191741943359 seconds Scalar Addition: Time taken by Lists: 0.09385180473327637 seconds Time taken by NumPy Arrays: 0.005884408950805664 seconds Deletion: Time taken by Lists: 0.01268625259399414 seconds Time taken by NumPy Arrays: 3.814697265625e-06 seconds
In fact, most of the functions you call using NumPy in your python code are merely wrappers for underlying code in C where most of the heavy lifting happens. In this way, NumPy can move the execution of loops to C, which is much more efficient than Python when it comes to looping. Notice this can be only done as the array enforces the elements of the array to be of the same kind. Otherwise, it would not be possible to convert the Python data types to native C ones to be executed under the hood.,In order to really appreciate the speed boosts NumPy provides, we must come up with a way to measure the running time of a piece of code. ,Times on your machine may differ depending upon processing power and other tasks running in background. But you will nevertheless notice considerable speedups to the tune of about 20-30x when using the NumPy's vectorized solution.,In this series I will cover best practices on how to speed up your code using NumPy, how to make use of features like vectorization and broadcasting, when to ditch specialized features in favor of vanilla Python offerings, and a case study where we will use NumPy to write a fast implementation of the K-Means clustering algorithm.
arr = np.arange(12).reshape(3, 4) col_vector = np.array([5, 6, 7]) num_cols = arr.shape[1] for col in range(num_cols): arr[: , col] += col_vector
Mathematical functions for fast operations on entire arrays of data without having to write loops.,Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays applies the operation element-wise:,NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.,NumPy operations perform complex computations on entire arrays without the need for Python for loops.
The easiest way to create an array is to use the array
function.
This accepts any sequence-like object (including other arrays) and
produces a new NumPy array containing the passed data. For example, a
list is a good candidate for conversion:
In[19]: data1 = [6, 7.5, 8, 0, 1]
In[20]: arr1 = np.array(data1)
In[21]: arr1
Out[21]: array([6., 7.5, 8., 0., 1.])
Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:
In[22]: data2 = [
[1, 2, 3, 4],
[5, 6, 7, 8]
]
In[23]: arr2 = np.array(data2)
In[24]: arr2
Out[24]:
array([
[1, 2, 3, 4],
[5, 6, 7, 8]
])
Since data2
was a list of lists, the NumPy
array arr2
has two dimensions with shape inferred
from the data. We can confirm this by inspecting the ndim
and
shape
attributes:
In[25]: arr2.ndim
Out[25]: 2
In[26]: arr2.shape
Out[26]: (2, 4)
In addition to np.array
, there
are a number of other functions for creating new arrays. As examples,
zeros
and ones
create arrays of 0s or 1s, respectively, with a given length or shape. empty
creates an array without initializing its values to any particular
value. To create a higher dimensional array with these methods, pass a
tuple for the shape:
In[29]: np.zeros(10)
Out[29]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
In[30]: np.zeros((3, 6))
Out[30]:
array([
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]
])
In[31]: np.empty((2, 3, 2))
Out[31]:
array([
[
[0., 0.],
[0., 0.],
[0., 0.]
],
[
[0., 0.],
[0., 0.],
[0., 0.]
]
])
arange
is an array-valued
version of the built-in Python range
function:
In[32]: np.arange(15)
Out[32]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
The data type or dtype
is a special object containing the information (or
metadata, data about data) the ndarray needs to
interpret a chunk of memory as a particular type of data:
In[33]: arr1 = np.array([1, 2, 3], dtype = np.float64)
In[34]: arr2 = np.array([1, 2, 3], dtype = np.int32)
In[35]: arr1.dtype
Out[35]: dtype('float64')
In[36]: arr2.dtype
Out[36]: dtype('int32')
You can explicitly convert or cast an array
from one dtype to another using ndarray’s astype
method:
In[37]: arr = np.array([1, 2, 3, 4, 5])
In[38]: arr.dtype
Out[38]: dtype('int64')
In[39]: float_arr = arr.astype(np.float64)
In[40]: float_arr.dtype
Out[40]: dtype('float64')
In this example, integers were cast to floating point. If I cast some floating-point numbers to be of integer dtype, the decimal part will be truncated:
In[41]: arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
In[42]: arr
Out[42]: array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
In[43]: arr.astype(np.int32)
Out[43]: array([3, -1, -2, 0, 12, 10], dtype = int32)
In[46]: int_array = np.arange(10)
In[47]: calibers = np.array([.22, .270, .357, .380, .44, .50], dtype = np.float64)
In[48]: int_array.astype(calibers.dtype)
Out[48]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
Arrays are important because they enable you to express batch operations on
data without writing any for
loops.
NumPy users call this vectorization. Any arithmetic
operations between equal-size arrays applies the operation
element-wise:
In[51]: arr = np.array([
[1., 2., 3.],
[4., 5., 6.]
])
In[52]: arr
Out[52]:
array([
[1., 2., 3.],
[4., 5., 6.]
])
In[53]: arr * arr
Out[53]:
array([
[1., 4., 9.],
[16., 25., 36.]
])
In[54]: arr - arr
Out[54]:
array([
[0., 0., 0.],
[0., 0., 0.]
])
In[55]: 1 / arr
Out[55]:
array([
[1., 0.5, 0.3333],
[0.25, 0.2, 0.1667]
])
In[56]: arr ** 0.5
Out[56]:
array([
[1., 1.4142, 1.7321],
[2., 2.2361, 2.4495]
])
Comparisons between arrays of the same size yield boolean arrays:
In[57]: arr2 = np.array([
[0., 4., 1.],
[7., 2., 12.]
])
In[58]: arr2
Out[58]:
array([
[0., 4., 1.],
[7., 2., 12.]
])
In[59]: arr2 > arr
Out[59]:
array([
[False, True, False],
[True, False, True]
])
NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:
In[60]: arr = np.arange(10)
In[61]: arr
Out[61]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In[62]: arr[5]
Out[62]: 5
In[63]: arr[5: 8]
Out[63]: array([5, 6, 7])
In[64]: arr[5: 8] = 12
In[65]: arr
Out[65]: array([0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
To give an example of this, I first create a slice of
arr
:
In[66]: arr_slice = arr[5: 8]
In[67]: arr_slice
Out[67]: array([12, 12, 12])
Now, when I change values in arr_slice
, the
mutations are reflected in the original array
arr
:
In[68]: arr_slice[1] = 12345
In[69]: arr
Out[69]:
array([0, 1, 2, 3, 4, 12, 12345, 12, 8,
9
])
With higher dimensional arrays, you have many more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:
In[72]: arr2d = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
In[73]: arr2d[2]
Out[73]: array([7, 8, 9])
Thus, individual elements can be accessed recursively. But that is a bit too much work, so you can pass a comma-separated list of indices to select individual elements. So these are equivalent:
In[74]: arr2d[0][2]
Out[74]: 3
In[75]: arr2d[0, 2]
Out[75]: 3
Like one-dimensional objects such as Python lists, ndarrays can be sliced with the familiar syntax:
In[88]: arr
Out[88]: array([0, 1, 2, 3, 4, 64, 64, 64, 8, 9])
In[89]: arr[1: 6]
Out[89]: array([1, 2, 3, 4, 64])
Consider the two-dimensional array from before, arr2d
. Slicing this array is a bit
different:
In[90]: arr2d
Out[90]:
array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
In[91]: arr2d[: 2]
Out[91]:
array([
[1, 2, 3],
[4, 5, 6]
])
You can pass multiple slices just like you can pass multiple indexes:
In[92]: arr2d[: 2, 1: ]
Out[92]:
array([
[2, 3],
[5, 6]
])
Similarly, I can select the third column but only the first two rows like so:
In[94]: arr2d[: 2, 2]
Out[94]: array([3, 6])
See Figure 4-2 for an illustration. Note that a colon by itself means to take the entire axis, so you can slice only higher dimensional axes by doing:
In[95]: arr2d[: ,: 1]
Out[95]:
array([
[1],
[4],
[7]
])
Let’s consider an example where we have some data in an array
and an array of names with duplicates. I’m going to use here the randn
function in
numpy.random
to generate some random
normally distributed data:
In[98]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
In[99]: data = np.random.randn(7, 4)
In[100]: names
Out[100]: array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype = '<U4')
In[101]: data
Out[101]:
array([
[0.0929, 0.2817, 0.769, 1.2464],
[1.0072, -1.2962, 0.275, 0.2289],
[1.3529, 0.8864, -2.0016, -0.3718],
[1.669, -0.4386, -0.5397, 0.477],
[3.2489, -1.0212, -0.5771, 0.1241],
[0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]
])
Suppose each name corresponds to a row in the data
array and we wanted to select all the
rows with corresponding name 'Bob'
.
Like arithmetic operations, comparisons (such as ==
) with arrays are also vectorized. Thus,
comparing names
with
the string 'Bob'
yields a boolean
array:
In[102]: names == 'Bob'
Out[102]: array([True, False, False, True, False, False, False])
This boolean array can be passed when indexing the array:
In[103]: data[names == 'Bob']
Out[103]:
array([
[0.0929, 0.2817, 0.769, 1.2464],
[1.669, -0.4386, -0.5397, 0.477]
])
In[106]: names != 'Bob'
Out[106]: array([False, True, True, False, True, True, True])
In[107]: data[~(names == 'Bob')]
Out[107]:
array([
[1.0072, -1.2962, 0.275, 0.2289],
[1.3529, 0.8864, -2.0016, -0.3718],
[3.2489, -1.0212, -0.5771, 0.1241],
[0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]
])
In[108]: cond = names == 'Bob'
In[109]: data[~cond]
Out[109]:
array([
[1.0072, -1.2962, 0.275, 0.2289],
[1.3529, 0.8864, -2.0016, -0.3718],
[3.2489, -1.0212, -0.5771, 0.1241],
[0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]
])