This is faster:
datafile['Prev_Price'] = datafile.groupby('OrderId')['Price'].shift(fill_value = 0)
It returns:
Price Qty OrderId Prev_Price 0 26690 3000 1213772 0 1 26700 3000 1215673 0 2 26705 6000 1216656 0 3 26700 3000 1213772 26690 4 26710 3000 1215673 26700
I have an large data set (number of rows in anycodings_python millions) which I read into a pandas anycodings_python DataFrame called datafile. ,Each row has an Order ID number - this is anycodings_python non-unique. So my datafile looks something anycodings_python like this,Note: fill_value is a valid argument of anycodings_pandas pandas.DataFrame.shift since pandas anycodings_pandas 0.24.0. For older version, don't pass anycodings_pandas the argument and replace NaN values anycodings_pandas later using datafile.fillna(0).,Now, on a short dataframe like the one anycodings_pandas you posted this method is actually anycodings_pandas slower. But I did a couple of tests with anycodings_pandas bigger dataframes:
Each row has an Order ID number - this is anycodings_python non-unique. So my datafile looks something anycodings_python like this
Price Qty OrderId 26690 3000 1213772 26700 3000 1215673 26705 6000 1216656 26700 3000 1213772 26710 3000 1215673
Now, what I want is, for each row - get the anycodings_python OrderID, find the previous occurrence of anycodings_python that OrderID in the DataFrame and get the anycodings_python corresponding price, and populate it in a anycodings_python new column "Prev_Price". If no previous anycodings_python occurrence is found, keep the value as 0. So anycodings_python my output should look like this
Price Qty OrderId Prev_Price 26690 3000 1213772 0 26700 3000 1215673 0 26705 6000 1216656 0 26700 3000 1213772 26690 26710 3000 1215673 26700
I tried using numpy and wrote this function
def getPrevPrice_np(x):
try:
return list(datanp[np.where(datanp[0: x, 2] == datanp[x, 2])][: , 0])[-1]
except:
return 0
This is faster:
datafile['Prev_Price'] = datafile.groupby('OrderId')['Price'].shift(fill_value = 0)
It returns:
Price Qty OrderId Prev_Price 0 26690 3000 1213772 0 1 26700 3000 1215673 0 2 26705 6000 1216656 0 3 26700 3000 1213772 26690 4 26710 3000 1215673 26700
Last Updated : 26 Jul, 2020
Syntax:
df.tail(n)
Use pandas.DataFrame.iloc to get last n rows. It is similar to the list slicing.
Syntax:
df.iloc[-n: ]
Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).,Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it).,Replace values where the condition is False.,The where method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is True the element is used; otherwise the corresponding element from the DataFrame other is used.
>>> s = pd.Series(range(5)) >>> s.where(s > 0) 0 NaN 1 1.0 2 2.0 3 3.0 4 4.0 dtype: float64 >>> s.mask(s > 0) 0 0.0 1 NaN 2 NaN 3 NaN 4 NaN dtype: float64
>>> s.where(s > 1, 10)
0 10
1 10
2 2
3 3
4 4
dtype: int64 >>>
s.mask(s > 1, 10)
0 0
1 1
2 10
3 10
4 10
dtype: int64
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns = ['A', 'B']) >>>
df
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
>>>
m = df % 3 == 0 >>>
df.where(m, -df)
A B
0 0 - 1
1 - 2 3
2 - 4 - 5
3 6 - 7
4 - 8 9 >>>
df.where(m, -df) == np.where(m, df, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
>>>
df.where(m, -df) == df.mask(~m, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
5. Select Cell Value from DataFrame Using df[‘col_name’].values[],We can use df['col_name'].values[] to get 1×1 DataFrame as a NumPy array, then access the first and only value of that array to get a cell value, for instance, df["Duration"].values[3].,Pandas – Select Rows Based on Column Values,1. Using DataFrame.loc[] to Get a Cell Value by Column Name
# Belwo are quick example # Using loc[].Get cell value by name & index print(df.loc['r4']['Duration']) print(df.loc['r4'][2]) # Using iloc[].Get cell value by index & name print(df.iloc[3]['Duration']) print(df.iloc[3, 2]) # Using DataFrame.at[] print(df.at['r4', 'Duration']) print(df.at[df.index[3], 'Duration']) # Using DataFrame.iat[] print(df.iat[3, 2]) #Get a cell value print(df["Duration"].values[3]) # Get cell value from last row print(df.iloc[-1, 2]) print(df.iloc[-1]['Duration']) print(df.at[df.index[-1], 'Duration'])
Now, let’s create a DataFrame with a few rows and columns and execute some examples and validate the results. Our DataFrame contains column names Courses
, Fee
, Duration
, Discount
.
import pandas as pd
technologies = {
'Courses': ["Spark", "PySpark", "Hadoop", "Python", "pandas"],
'Fee': [24000, 25000, 25000, 24000, 24000],
'Duration': ['30day', '50days', '55days', '40days', '60days'],
'Discount': [1000, 2300, 1000, 1200, 2500]
}
index_labels = ['r1', 'r2', 'r3', 'r4', 'r5']
df = pd.DataFrame(technologies, index = index_labels)
print(df)
Yields below output.
Courses Fee Duration Discount r1 Spark 24000 30 day 1000 r2 PySpark 25000 50 days 2300 r3 Hadoop 25000 55 days 1000 r4 Python 24000 40 days 1200 r5 pandas 24000 60 days 2500
Yields below output. From the above examples df.loc['r4']
returns a pandas Series.
40 days
If you wanted to get a cell value by column number or index position use DataFrame.iloc[]
, index position starts from 0 to length-1 (index starts from zero). In order to refer last column use -1 as the column position.
# Using iloc[].Get cell value by index & name
print(df.iloc[3]['Duration'])
print(df.iloc[3][2])
print(df.iloc[3, 2])
The easiest way to create an array is to use the array function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data. For example, a list is a good candidate for conversion:,Whenever you see “array”, “NumPy array”, or “ndarray” in the text, with few exceptions they all refer to the same thing: the ndarray object.,NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements. One-dimensional arrays are simple; on the surface they act similarly to Python lists:,As a simple example, suppose we wished to evaluate the function sqrt(x^2 + y^2) across a regular grid of values. The np.meshgrid function takes two 1D arrays and produces two 2D matrices corresponding to all pairs of (x, y) in the two arrays:
In[83]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
In[84]: data = np.random.randn(7, 4)
In[85]: names
Out[85]:
array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'],
dtype = '|S4')
In[86]: data
Out[86]:
array([
[-0.048, 0.5433, -0.2349, 1.2792],
[-0.268, 0.5465, 0.0939, -2.0445],
[-0.047, -2.026, 0.7719, 0.3103],
[2.1452, 0.8799, -0.0523, 0.0672],
[-1.0023, -0.1698, 1.1503, 1.7289],
[0.1913, 0.4544, 0.4519, 0.5535],
[0.5994, 0.8174, -0.9297, -1.2564]
])
To work the examples, you’ll need matplotlib installed in addition to NumPy.,NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.,When the indexed array a is multidimensional, a single array of indices refers to the first dimension of a. The following example shows this behavior by converting an image of labels into a color image using a palette.,To create sequences of numbers, NumPy provides the arange function which is analogous to the Python built-in range, but returns an array.
[ [1., 0., 0.], [0., 1., 2.] ]
>>> import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int64'
>>> a.itemsize
8
>>> a.size
15
>>> type(a)
<class 'numpy.ndarray'>
>>> b = np.array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
<class 'numpy.ndarray'>
>>>
import numpy as np
>>>
a = np.array([2, 3, 4]) >>>
a
array([2, 3, 4]) >>>
a.dtype
dtype('int64') >>>
b = np.array([1.2, 3.5, 5.1]) >>>
b.dtype
dtype('float64')
>>> a = np.array(1, 2, 3, 4) # WRONG Traceback(most recent call last): ... TypeError: array() takes from 1 to 2 positional arguments but 4 were given >>> a = np.array([1, 2, 3, 4]) # RIGHT
>>> b = np.array([(1.5, 2, 3), (4, 5, 6)]) >>> b array([ [1.5, 2., 3.], [4., 5., 6.] ])
>>> c = np.array([ [1, 2], [3, 4] ], dtype = complex) >>> c array([ [1. + 0. j, 2. + 0. j], [3. + 0. j, 4. + 0. j] ])