You need specify column colA
in second position in DataFrame.loc
:
print(example_df.loc[idx[: , '2019-01-3': ], 'colA'])
id date
id1 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id2 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id3 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
Name: colA, dtype: float64
If want onle column DataFrame
use one element list:
print(example_df.loc[idx[: , '2019-01-3': ], ['colA']])
colA
id date
id1 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id2 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id3 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
Same principe with boolean mask:
print(example_df.loc[inner_mask, 'colA'])
id date
id1 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id2 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id3 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
Name: colA, dtype: float64
print(example_df.loc[inner_mask, ['colA']])
colA
id date
id1 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id2 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id3 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
Multi-index slicing (involving a time series / date range) does not work with DataFrame but does for Series,Why does vector lookup in pandas DataFrame not work but it does work with a Series/lookup on date,Converting string date to epoch time not working with Cython and POSIX C libraries,Resample a time series with the index of another time series
You need specify column colA
in second position in DataFrame.loc
:
print(example_df.loc[idx[: , '2019-01-3': ], 'colA'])
id date
id1 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id2 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id3 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
Name: colA, dtype: float64
If want onle column DataFrame
use one element list:
print(example_df.loc[idx[: , '2019-01-3': ], ['colA']])
colA
id date
id1 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id2 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id3 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
Same principe with boolean mask:
print(example_df.loc[inner_mask, 'colA'])
id date
id1 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id2 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id3 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
Name: colA, dtype: float64
print(example_df.loc[inner_mask, ['colA']])
colA
id date
id1 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id2 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
id3 2019 - 01 - 03 1.0
2019 - 01 - 04 1.0
2019 - 01 - 05 1.0
You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers.,You can slice a MultiIndex by providing multiple indexers.,Using a boolean indexer you can provide selection related to the values.,Label based indexing via .loc along the edges of an interval works as you would expect, selecting that particular interval.
In[1]: arrays = [
...: ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
...: ["one", "two", "one", "two", "one", "two", "one", "two"],
...:
]
...:
In[2]: tuples = list(zip( * arrays))
In[3]: tuples
Out[3]: [('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('qux', 'two')
]
In[4]: index = pd.MultiIndex.from_tuples(tuples, names = ["first", "second"])
In[5]: index
Out[5]:
MultiIndex([('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('qux', 'two')
],
names = ['first', 'second'])
In[6]: s = pd.Series(np.random.randn(8), index = index)
In[7]: s
Out[7]:
first second
bar one 0.469112
two - 0.282863
baz one - 1.509059
two - 1.135632
foo one 1.212112
two - 0.173215
qux one 0.119209
two - 1.044236
dtype: float64
In[8]: iterables = [
["bar", "baz", "foo", "qux"],
["one", "two"]
]
In[9]: pd.MultiIndex.from_product(iterables, names = ["first", "second"])
Out[9]:
MultiIndex([('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('qux', 'two')
],
names = ['first', 'second'])
In[10]: df = pd.DataFrame(
....: [
["bar", "one"],
["bar", "two"],
["foo", "one"],
["foo", "two"]
],
....: columns = ["first", "second"],
....: )
....:
In[11]: pd.MultiIndex.from_frame(df)
Out[11]:
MultiIndex([('bar', 'one'),
('bar', 'two'),
('foo', 'one'),
('foo', 'two')
],
names = ['first', 'second'])
In[12]: arrays = [
....: np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
....: np.array(["one", "two", "one", "two", "one", "two", "one", "two"]),
....:
]
....:
In[13]: s = pd.Series(np.random.randn(8), index = arrays)
In[14]: s
Out[14]:
bar one - 0.861849
two - 2.104569
baz one - 0.494929
two 1.071804
foo one 0.721555
two - 0.706771
qux one - 1.039575
two 0.271860
dtype: float64
In[15]: df = pd.DataFrame(np.random.randn(8, 4), index = arrays)
In[16]: df
Out[16]:
0 1 2 3
bar one - 0.424972 0.567020 0.276232 - 1.087401
two - 0.673690 0.113648 - 1.478427 0.524988
baz one 0.404705 0.577046 - 1.715002 - 1.039268
two - 0.370647 - 1.157892 - 1.344312 0.844885
foo one 1.075770 - 0.109050 1.643563 - 1.469388
two 0.357021 - 0.674600 - 1.776904 - 0.968914
qux one - 1.294524 0.413738 0.276662 - 0.472035
two - 0.013960 - 0.362543 - 0.006154 - 0.923061
In[17]: df.index.names
Out[17]: FrozenList([None, None])
In[18]: df = pd.DataFrame(np.random.randn(3, 8), index = ["A", "B", "C"], columns = index)
In[19]: df
Out[19]:
first bar baz foo qux
second one two one two one two one two
A 0.895717 0.805244 - 1.206412 2.565646 1.431256 1.340309 - 1.170299 - 0.226169
B 0.410835 0.813850 0.132003 - 0.827317 - 0.076467 - 1.187678 1.130127 - 1.436737
C - 1.413681 1.607920 1.024180 0.569605 0.875906 - 2.211372 0.974466 - 2.006747
In[20]: pd.DataFrame(np.random.randn(6, 6), index = index[: 6], columns = index[: 6])
Out[20]:
first bar baz foo
second one two one two one two
first second
bar one - 0.410001 - 0.078638 0.545952 - 1.219217 - 1.226825 0.769804
two - 1.281247 - 0.727707 - 0.121306 - 0.097883 0.695775 0.341734
baz one 0.959726 - 1.110336 - 0.619976 0.149748 - 0.732339 0.687738
two 0.176444 0.403310 - 0.154951 0.301624 - 2.179861 - 1.369849
foo one - 0.954208 1.462696 - 1.743161 - 0.826591 - 0.345352 1.314232
two 0.690579 0.995761 2.396780 0.014871 3.357427 - 0.317441
With this indexing scheme, you can straightforwardly index or slice the series based on this multiple index:,Fortunately, Pandas provides a better way. Our tuple-based indexing is essentially a rudimentary multi-index, and the Pandas MultiIndex type gives us the type of operations we wish to have. We can create a multi-index from the tuples as follows:,These indexers provide an array-like view of the underlying two-dimensional data, but each individual index in loc or iloc can be passed a tuple of multiple indices. For example:,You could get around this by building the desired slice explicitly using Python's built-in slice() function, but a better way in this context is to use an IndexSlice object, which Pandas provides for precisely this situation. For example:
import pandas as pd
import numpy as np
index = [('California', 2000), ('California', 2010),
('New York', 2000), ('New York', 2010),
('Texas', 2000), ('Texas', 2010)
]
populations = [33871648, 37253956,
18976457, 19378102,
20851820, 25145561
]
pop = pd.Series(populations, index = index)
pop
(California, 2000) 33871648 (California, 2010) 37253956(New York, 2000) 18976457(New York, 2010) 19378102(Texas, 2000) 20851820(Texas, 2010) 25145561 dtype: int64
pop[('California', 2010): ('Texas', 2000)]
(California, 2010) 37253956 (New York, 2000) 18976457(New York, 2010) 19378102(Texas, 2000) 20851820 dtype: int64
pop[[i for i in pop.index if i[1] == 2010 ]]
May 20, 2021
Step 1: Create a simple DataFrame
import pandas as pd
import numpy as np
import random
import pandas as pd
import numpy as np
import random
# A dataframe with an initial index.The marks represented here are out of 50 df = pd.DataFrame({ 'Networking': [45, 34, 23, 8, 21], 'Web Engineering': [32, 43, 23, 50, 21], 'Complier Design': [14, 42, 21, 12, 45] }, index = ['Abhishek', 'Saumya', 'Ayushi', 'Saksham', 'Rajveer']) df
Step 2: Reset the index
df.reset_index()
Resetting the index in this case returns a dataframe with initial_index
as the column name for the old index:-
df.reset_index()
Consider a dataframe below, where the index has been reset:
# Create the dataframe df = pd.DataFrame({ 'Networking': [45, 34, 23, 8, 21], 'Web Engineering': [32, 43, 23, 50, 21], 'Complier Design': [14, 42, 21, 12, 45] }, index = ['Abhishek', 'Saumya', 'Ayushi', 'Saksham', 'Rajveer']) # reset the index df.reset_index()
Answer:
# Make a copy of dataframe ques1 = df.copy() # Reset the index, define the column level, name to fill in col_fill ques1.reset_index(level = 'Branch', col_level = 1, col_fill = 'Department', inplace = True) ques1
Answer:
# Make a copy of dataframe ques2 = ques1.copy() # Reset the index so that names are shifted to dataframe as column ques2.reset_index(inplace = True) # Set the index as Percentage ques2.set_index('Percentage', inplace = True) # Reset the index with column level and col_fill defined ques2.reset_index(col_level = 1, col_fill = 'Metric', inplace = True) # Set the index again as Name ques2.set_index('Name')
Answer:
# make a copy of dataframe que3 = df.copy() # Reset the index que3.reset_index(inplace = True) # filter the rows by Branch, and then sort by Percentage in decreasing order output = que3[que3.Branch == 'CSE'].sort_values(by = 'Percentage', ascending = False) # Reset the index output.reset_index(drop = True, inplace = True) print([(rank, name) for rank, name in zip(output.index.values + 1, output.Name.values)])