For example, the choice is based on inputs, not result
>>> A = np.full((2, 2), 30000, 'i2') >>> >>> A array([ [30000, 30000], [30000, 30000] ], dtype = int16) # 1 >>> A + 30000 array([ [-5536, -5536], [-5536, -5536] ], dtype = int16) # 2 >>> A + 60000 array([ [90000, 90000], [90000, 90000] ], dtype = int32)
Also, and more directly related to your question, type promotion only applies out-of-place, not in-place:
# out - of - place >>> A_new = A + 60000 >>> A_new array([ [90000, 90000], [90000, 90000] ], dtype = int32) # in -place >>> A += 60000 >>> A array([ [24464, 24464], [24464, 24464] ], dtype = int16)
or
# out - of - place >>> A_new = np.where([ [0, 0], [0, 1] ], 60000, A) >>> A_new array([ [30000, 30000], [30000, 60000] ], dtype = int32) # in -place >>> A[1, 1] = 60000 >>> A array([ [30000, 30000], [30000, -5536] ], dtype = int16)
So, my first question is: How is this calculated? Does it make the datatype suitable for the maximum element as a datatype for all the elements? If that is the case, don’t you think it requires more space because it is unnecessarily storing excess memory to store 2 in the second array as a 64 bit integer? ,dtype : data-type, optional The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. This argument can only be used to ‘upcast’ the array. For downcasting, use the .astype(t) method.,Note that for numpy to be fast it is essential that all elements of an array be of the same size. Otherwise, how would you quickly locate the 1000th element, say? Also, mixing types wouldn’t save all that much space since you would have to store the types of every single element on top of the raw data.,It should be noted that this is not entirely accurate, for example for integer arrays the system (C) default integer is preferred over smaller integer types as is evident form your example.
t = np.array([2, 2]) t.dtype
t = np.array([2, 22222222222]) t.dtype
t = np.array([2, 2]) t[0] = 222222222222222
>>> A = np.full((2, 2), 30000, 'i2') >>> >>> A array([ [30000, 30000], [30000, 30000] ], dtype = int16) # 1 >>> A + 30000 array([ [-5536, -5536], [-5536, -5536] ], dtype = int16) # 2 >>> A + 60000 array([ [90000, 90000], [90000, 90000] ], dtype = int32)
# out - of - place >>> A_new = A + 60000 >>> A_new array([ [90000, 90000], [90000, 90000] ], dtype = int32) # in -place >>> A += 60000 >>> A array([ [24464, 24464], [24464, 24464] ], dtype = int16)
# out - of - place >>> A_new = np.where([ [0, 0], [0, 1] ], 60000, A) >>> A_new array([ [30000, 30000], [30000, 60000] ], dtype = int32) # in -place >>> A[1, 1] = 60000 >>> A array([ [30000, 30000], [30000, -5536] ], dtype = int16)
An ndarray is a (usually fixed-size) multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its shape, which is a tuple of N non-negative integers that specify the sizes of each dimension. The type of items in the array is specified by a separate data-type object (dtype), one of which is associated with each ndarray.,An array is considered aligned if the memory offsets for all elements and the base offset itself is a multiple of self.itemsize. Understanding memory-alignment leads to better performance on most hardware.,An array object represents a multidimensional, homogeneous array of fixed-size items.,Arithmetic and comparison operations on ndarrays are defined as element-wise operations, and generally yield ndarray objects as results.
>>> x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
>>> type(x)
<class 'numpy.ndarray'>
>>> x.shape
(2, 3)
>>> x.dtype
dtype('int32')
>>> # The element of x in the * second * row, * third * column, namely, 6. >>>
x[1, 2]
6
>>> y = x[: , 1] >>> y array([2, 5], dtype = int32) >>> y[0] = 9 # this also changes the corresponding element in x >>> y array([9, 5], dtype = int32) >>> x array([ [1, 9, 3], [4, 5, 6] ], dtype = int32)
>>> x = np.arange(27).reshape((3, 3, 3)) >>> x array([ [ [0, 1, 2], [3, 4, 5], [6, 7, 8] ], [ [9, 10, 11], [12, 13, 14], [15, 16, 17] ], [ [18, 19, 20], [21, 22, 23], [24, 25, 26] ] ]) >>> x.sum(axis = 0) array([ [27, 30, 33], [36, 39, 42], [45, 48, 51] ]) >>> # for sum, axis is the first keyword, so we may omit it, >>> # specifying only its value >>> x.sum(0), x.sum(1), x.sum(2) (array([ [27, 30, 33], [36, 39, 42], [45, 48, 51] ]), array([ [9, 12, 15], [36, 39, 42], [63, 66, 69] ]), array([ [3, 12, 21], [30, 39, 48], [57, 66, 75] ]))
An ndarray is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type. Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an object describing the data type of the array:,If casting were to fail for some reason (like a string that cannot be converted to float64), a ValueError will be raised. Here I was a bit lazy and wrote float instead of np.float64; NumPy aliases the Python types to its own equivalent data dtypes.,Let’s consider an example where we have some data in an array and an array of names with duplicates. I’m going to use here the randn function in numpy.random to generate some random normally distributed data:,When slicing like this, you always obtain array views of the same number of dimensions. By mixing integer indexes and slices, you get lower dimensional slices.
The easiest way to create an array is to use the array
function.
This accepts any sequence-like object (including other arrays) and
produces a new NumPy array containing the passed data. For example, a
list is a good candidate for conversion:
In[19]: data1 = [6, 7.5, 8, 0, 1]
In[20]: arr1 = np.array(data1)
In[21]: arr1
Out[21]: array([6., 7.5, 8., 0., 1.])
Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:
In[22]: data2 = [
[1, 2, 3, 4],
[5, 6, 7, 8]
]
In[23]: arr2 = np.array(data2)
In[24]: arr2
Out[24]:
array([
[1, 2, 3, 4],
[5, 6, 7, 8]
])
Since data2
was a list of lists, the NumPy
array arr2
has two dimensions with shape inferred
from the data. We can confirm this by inspecting the ndim
and
shape
attributes:
In[25]: arr2.ndim
Out[25]: 2
In[26]: arr2.shape
Out[26]: (2, 4)
In addition to np.array
, there
are a number of other functions for creating new arrays. As examples,
zeros
and ones
create arrays of 0s or 1s, respectively, with a given length or shape. empty
creates an array without initializing its values to any particular
value. To create a higher dimensional array with these methods, pass a
tuple for the shape:
In[29]: np.zeros(10)
Out[29]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
In[30]: np.zeros((3, 6))
Out[30]:
array([
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]
])
In[31]: np.empty((2, 3, 2))
Out[31]:
array([
[
[0., 0.],
[0., 0.],
[0., 0.]
],
[
[0., 0.],
[0., 0.],
[0., 0.]
]
])
arange
is an array-valued
version of the built-in Python range
function:
In[32]: np.arange(15)
Out[32]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
The data type or dtype
is a special object containing the information (or
metadata, data about data) the ndarray needs to
interpret a chunk of memory as a particular type of data:
In[33]: arr1 = np.array([1, 2, 3], dtype = np.float64)
In[34]: arr2 = np.array([1, 2, 3], dtype = np.int32)
In[35]: arr1.dtype
Out[35]: dtype('float64')
In[36]: arr2.dtype
Out[36]: dtype('int32')
You can explicitly convert or cast an array
from one dtype to another using ndarray’s astype
method:
In[37]: arr = np.array([1, 2, 3, 4, 5])
In[38]: arr.dtype
Out[38]: dtype('int64')
In[39]: float_arr = arr.astype(np.float64)
In[40]: float_arr.dtype
Out[40]: dtype('float64')
In this example, integers were cast to floating point. If I cast some floating-point numbers to be of integer dtype, the decimal part will be truncated:
In[41]: arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
In[42]: arr
Out[42]: array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
In[43]: arr.astype(np.int32)
Out[43]: array([3, -1, -2, 0, 12, 10], dtype = int32)
In[46]: int_array = np.arange(10)
In[47]: calibers = np.array([.22, .270, .357, .380, .44, .50], dtype = np.float64)
In[48]: int_array.astype(calibers.dtype)
Out[48]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
Arrays are important because they enable you to express batch operations on
data without writing any for
loops.
NumPy users call this vectorization. Any arithmetic
operations between equal-size arrays applies the operation
element-wise:
In[51]: arr = np.array([
[1., 2., 3.],
[4., 5., 6.]
])
In[52]: arr
Out[52]:
array([
[1., 2., 3.],
[4., 5., 6.]
])
In[53]: arr * arr
Out[53]:
array([
[1., 4., 9.],
[16., 25., 36.]
])
In[54]: arr - arr
Out[54]:
array([
[0., 0., 0.],
[0., 0., 0.]
])
In[55]: 1 / arr
Out[55]:
array([
[1., 0.5, 0.3333],
[0.25, 0.2, 0.1667]
])
In[56]: arr ** 0.5
Out[56]:
array([
[1., 1.4142, 1.7321],
[2., 2.2361, 2.4495]
])
Comparisons between arrays of the same size yield boolean arrays:
In[57]: arr2 = np.array([
[0., 4., 1.],
[7., 2., 12.]
])
In[58]: arr2
Out[58]:
array([
[0., 4., 1.],
[7., 2., 12.]
])
In[59]: arr2 > arr
Out[59]:
array([
[False, True, False],
[True, False, True]
])
NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:
In[60]: arr = np.arange(10)
In[61]: arr
Out[61]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In[62]: arr[5]
Out[62]: 5
In[63]: arr[5: 8]
Out[63]: array([5, 6, 7])
In[64]: arr[5: 8] = 12
In[65]: arr
Out[65]: array([0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
To give an example of this, I first create a slice of
arr
:
In[66]: arr_slice = arr[5: 8]
In[67]: arr_slice
Out[67]: array([12, 12, 12])
Now, when I change values in arr_slice
, the
mutations are reflected in the original array
arr
:
In[68]: arr_slice[1] = 12345
In[69]: arr
Out[69]:
array([0, 1, 2, 3, 4, 12, 12345, 12, 8,
9
])
With higher dimensional arrays, you have many more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:
In[72]: arr2d = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
In[73]: arr2d[2]
Out[73]: array([7, 8, 9])
Thus, individual elements can be accessed recursively. But that is a bit too much work, so you can pass a comma-separated list of indices to select individual elements. So these are equivalent:
In[74]: arr2d[0][2]
Out[74]: 3
In[75]: arr2d[0, 2]
Out[75]: 3
Like one-dimensional objects such as Python lists, ndarrays can be sliced with the familiar syntax:
In[88]: arr
Out[88]: array([0, 1, 2, 3, 4, 64, 64, 64, 8, 9])
In[89]: arr[1: 6]
Out[89]: array([1, 2, 3, 4, 64])
Consider the two-dimensional array from before, arr2d
. Slicing this array is a bit
different:
In[90]: arr2d
Out[90]:
array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
In[91]: arr2d[: 2]
Out[91]:
array([
[1, 2, 3],
[4, 5, 6]
])
You can pass multiple slices just like you can pass multiple indexes:
In[92]: arr2d[: 2, 1: ]
Out[92]:
array([
[2, 3],
[5, 6]
])
Similarly, I can select the third column but only the first two rows like so:
In[94]: arr2d[: 2, 2]
Out[94]: array([3, 6])
See Figure 4-2 for an illustration. Note that a colon by itself means to take the entire axis, so you can slice only higher dimensional axes by doing:
In[95]: arr2d[: ,: 1]
Out[95]:
array([
[1],
[4],
[7]
])
Let’s consider an example where we have some data in an array
and an array of names with duplicates. I’m going to use here the randn
function in
numpy.random
to generate some random
normally distributed data:
In[98]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
In[99]: data = np.random.randn(7, 4)
In[100]: names
Out[100]: array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype = '<U4')
In[101]: data
Out[101]:
array([
[0.0929, 0.2817, 0.769, 1.2464],
[1.0072, -1.2962, 0.275, 0.2289],
[1.3529, 0.8864, -2.0016, -0.3718],
[1.669, -0.4386, -0.5397, 0.477],
[3.2489, -1.0212, -0.5771, 0.1241],
[0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]
])
Suppose each name corresponds to a row in the data
array and we wanted to select all the
rows with corresponding name 'Bob'
.
Like arithmetic operations, comparisons (such as ==
) with arrays are also vectorized. Thus,
comparing names
with
the string 'Bob'
yields a boolean
array:
In[102]: names == 'Bob'
Out[102]: array([True, False, False, True, False, False, False])
This boolean array can be passed when indexing the array:
In[103]: data[names == 'Bob']
Out[103]:
array([
[0.0929, 0.2817, 0.769, 1.2464],
[1.669, -0.4386, -0.5397, 0.477]
])
In[106]: names != 'Bob'
Out[106]: array([False, True, True, False, True, True, True])
In[107]: data[~(names == 'Bob')]
Out[107]:
array([
[1.0072, -1.2962, 0.275, 0.2289],
[1.3529, 0.8864, -2.0016, -0.3718],
[3.2489, -1.0212, -0.5771, 0.1241],
[0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]
])
In[108]: cond = names == 'Bob'
In[109]: data[~cond]
Out[109]:
array([
[1.0072, -1.2962, 0.275, 0.2289],
[1.3529, 0.8864, -2.0016, -0.3718],
[3.2489, -1.0212, -0.5771, 0.1241],
[0.3026, 0.5238, 0.0009, 1.3438],
[-0.7135, -0.8312, -2.3702, -1.8608]
])