Pandas tries to keep the levels of a MultiIndex
unique. When you use loc
with a list that refers to values of the first level of the MultiIndex
it will keep things unique. If you want something different, you'll need to be explicit and use tuples.
specific_index_values = (
[('A', 'foo'), ('A', 'bar')] * 2 + [('B', 'foo'), ('B', 'bar')]
)
df_2.loc[specific_index_values,: ]
Col_1
Idx1 Idx2
A foo 0
bar 1
foo 0
bar 1
B foo 2
bar 3
I find this distasteful but...
pd.concat([df_2.loc[[x]]
for x in ['A', 'A', 'B']
])
Col_1
Idx1 Idx2
A foo 0
bar 1
foo 0
bar 1
B foo 2
bar 3
Pandas tries to keep the levels of a MultiIndex unique. When you use loc with a list that refers to values of the first level of the MultiIndex it will keep things unique. If you want something different, you'll need to be explicit and use tuples.,How to drop 1st level index and then merge the remaining index values with custom logic for a pd DataFrame?,Python Pandas concatenate two multi index dataframe into one with another level of multi index,Pandas loc: Multiindexing and selecting rows with specific index values
Pandas tries to keep the levels of a MultiIndex
unique. When you use loc
with a list that refers to values of the first level of the MultiIndex
it will keep things unique. If you want something different, you'll need to be explicit and use tuples.
specific_index_values = (
[('A', 'foo'), ('A', 'bar')] * 2 + [('B', 'foo'), ('B', 'bar')]
)
df_2.loc[specific_index_values,: ]
Col_1
Idx1 Idx2
A foo 0
bar 1
foo 0
bar 1
B foo 2
bar 3
I find this distasteful but...
pd.concat([df_2.loc[[x]]
for x in ['A', 'A', 'B']
])
Col_1
Idx1 Idx2
A foo 0
bar 1
foo 0
bar 1
B foo 2
bar 3
Indexing with __getitem__/.iloc/.loc works similarly to an Index with duplicates. The indexers must be in the category or the operation will raise a KeyError.,Syntactically integrating MultiIndex in advanced indexing with .loc is a bit challenging, but we’ve made every effort to do so. In general, MultiIndex keys take the form of tuples. For example, the following works as you would expect:,CategoricalIndex is a type of index that is useful for supporting indexing with duplicates. This is a container around a Categorical and allows efficient indexing and storage of an index with a large number of duplicated elements.,Hierarchical / Multi-level indexing is very exciting as it opens the door to some quite sophisticated data analysis and manipulation, especially for working with higher dimensional data. In essence, it enables you to store and manipulate data with an arbitrary number of dimensions in lower dimensional data structures like Series (1d) and DataFrame (2d).
In[1]: arrays = [
...: ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
...: ["one", "two", "one", "two", "one", "two", "one", "two"],
...:
]
...:
In[2]: tuples = list(zip( * arrays))
In[3]: tuples
Out[3]: [('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('qux', 'two')
]
In[4]: index = pd.MultiIndex.from_tuples(tuples, names = ["first", "second"])
In[5]: index
Out[5]:
MultiIndex([('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('qux', 'two')
],
names = ['first', 'second'])
In[6]: s = pd.Series(np.random.randn(8), index = index)
In[7]: s
Out[7]:
first second
bar one 0.469112
two - 0.282863
baz one - 1.509059
two - 1.135632
foo one 1.212112
two - 0.173215
qux one 0.119209
two - 1.044236
dtype: float64
In[8]: iterables = [
["bar", "baz", "foo", "qux"],
["one", "two"]
]
In[9]: pd.MultiIndex.from_product(iterables, names = ["first", "second"])
Out[9]:
MultiIndex([('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('qux', 'two')
],
names = ['first', 'second'])
In[10]: df = pd.DataFrame(
....: [
["bar", "one"],
["bar", "two"],
["foo", "one"],
["foo", "two"]
],
....: columns = ["first", "second"],
....: )
....:
In[11]: pd.MultiIndex.from_frame(df)
Out[11]:
MultiIndex([('bar', 'one'),
('bar', 'two'),
('foo', 'one'),
('foo', 'two')
],
names = ['first', 'second'])
In[12]: arrays = [
....: np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
....: np.array(["one", "two", "one", "two", "one", "two", "one", "two"]),
....:
]
....:
In[13]: s = pd.Series(np.random.randn(8), index = arrays)
In[14]: s
Out[14]:
bar one - 0.861849
two - 2.104569
baz one - 0.494929
two 1.071804
foo one 0.721555
two - 0.706771
qux one - 1.039575
two 0.271860
dtype: float64
In[15]: df = pd.DataFrame(np.random.randn(8, 4), index = arrays)
In[16]: df
Out[16]:
0 1 2 3
bar one - 0.424972 0.567020 0.276232 - 1.087401
two - 0.673690 0.113648 - 1.478427 0.524988
baz one 0.404705 0.577046 - 1.715002 - 1.039268
two - 0.370647 - 1.157892 - 1.344312 0.844885
foo one 1.075770 - 0.109050 1.643563 - 1.469388
two 0.357021 - 0.674600 - 1.776904 - 0.968914
qux one - 1.294524 0.413738 0.276662 - 0.472035
two - 0.013960 - 0.362543 - 0.006154 - 0.923061
In[17]: df.index.names
Out[17]: FrozenList([None, None])
In[18]: df = pd.DataFrame(np.random.randn(3, 8), index = ["A", "B", "C"], columns = index)
In[19]: df
Out[19]:
first bar baz foo qux
second one two one two one two one two
A 0.895717 0.805244 - 1.206412 2.565646 1.431256 1.340309 - 1.170299 - 0.226169
B 0.410835 0.813850 0.132003 - 0.827317 - 0.076467 - 1.187678 1.130127 - 1.436737
C - 1.413681 1.607920 1.024180 0.569605 0.875906 - 2.211372 0.974466 - 2.006747
In[20]: pd.DataFrame(np.random.randn(6, 6), index = index[: 6], columns = index[: 6])
Out[20]:
first bar baz foo
second one two one two one two
first second
bar one - 0.410001 - 0.078638 0.545952 - 1.219217 - 1.226825 0.769804
two - 1.281247 - 0.727707 - 0.121306 - 0.097883 0.695775 0.341734
baz one 0.959726 - 1.110336 - 0.619976 0.149748 - 0.732339 0.687738
two 0.176444 0.403310 - 0.154951 0.301624 - 2.179861 - 1.369849
foo one - 0.954208 1.462696 - 1.743161 - 0.826591 - 0.345352 1.314232
two 0.690579 0.995761 2.396780 0.014871 3.357427 - 0.317441
Given the following DataFrame:, Iterate over DataFrame with MultiIndex , Iterate over DataFrame with MultiIndex
Given the following DataFrame:
In[11]: df = pd.DataFrame(np.random.randn(6, 3), columns = ['A', 'B', 'C'])
In[12]: df.set_index(['A', 'B'], inplace = True)
In[13]: df
Out[13]:
C
A B
0.902764 - 0.259656 - 1.864541 -
0.695893 0.308893 0.125199
1.696989 - 1.221131 - 2.975839 -
1.132069 - 1.086189 - 1.945467
2.294835 - 1.765507 1.567853 -
1.788299 2.579029 0.792919
Get the values of A
, by name:
In[14]: df.index.get_level_values('A')
Out[14]:
Float64Index([0.902764041011, -0.69589264969, 1.69698924476, -1.13206872067,
2.29483481146, -1.788298829
],
dtype = 'float64', name = 'A')
Or by number of level:
In[15]: df.index.get_level_values(level = 0)
Out[15]:
Float64Index([0.902764041011, -0.69589264969, 1.69698924476, -1.13206872067,
2.29483481146, -1.788298829
],
dtype = 'float64', name = 'A')
Range can also include multiple columns:
In[17]: df.loc[(df.index.get_level_values('A') > 0.5) & (df.index.get_level_values('B') < 0)]
Out[17]:
C
A B
0.902764 - 0.259656 - 1.864541
1.696989 - 1.221131 - 2.975839
2.294835 - 1.765507 1.567853
To extract a specific value you can use xs (cross-section):
In[18]: df.xs(key = 0.9027639999999999)
Out[18]:
C
B
-
0.259656 - 1.864541
In[19]: df.xs(key = 0.9027639999999999, drop_level = False)
Out[19]:
C
A B
0.902764 - 0.259656 - 1.864541
Is there any way to select repeated values anycodings_pandas of a multi-index level using .loc?,Pandas tries to keep the levels of a anycodings_pandas MultiIndex unique. When you use loc anycodings_pandas with a list that refers to values of the anycodings_pandas first level of the MultiIndex it will anycodings_pandas keep things unique. If you want anycodings_pandas something different, you'll need to be anycodings_pandas explicit and use tuples.,Indexing the first element of a list of a list and add it to new list,Now lets suppose we have a dataframe with a anycodings_pandas multi-index. If I use .loc[] to select index anycodings_pandas 'A' twice, it will return a dataframe with anycodings_pandas index 'A' included only once:
First lets suppose I have a pandas dataframe anycodings_pandas with a single index. If I use .loc[] to anycodings_pandas select index 'A' twice, it will return a anycodings_pandas dataframe with index 'A' repeated twice:
df_1 = pd.DataFrame([1, 2, 3], index = ['A', 'B', 'C'], columns = ['Col_1'])
df_1
Col_1
A 1
B 2
C 3
df_1.loc[['A', 'A', 'B']]
Col_1
A 1
A 1
B 2
Now lets suppose we have a dataframe with a anycodings_pandas multi-index. If I use .loc[] to select index anycodings_pandas 'A' twice, it will return a dataframe with anycodings_pandas index 'A' included only once:
ix = pd.MultiIndex.from_product([
['A', 'B', 'C'],
['foo', 'bar']
], names = ['Idx1', 'Idx2'])
data = np.arange(len(ix))
df_2 = pd.DataFrame(data, index = ix, columns = ['Col_1'])
df_2
Col_1
Idx1 Idx2
A foo 0
bar 1
B foo 2
bar 3
C foo 4
bar 5
df_2.loc[['A', 'A', 'B']]
Col_1
Idx1 Idx2
A foo 0
bar 1
B foo 2
bar 3
Pandas tries to keep the levels of a anycodings_pandas MultiIndex unique. When you use loc anycodings_pandas with a list that refers to values of the anycodings_pandas first level of the MultiIndex it will anycodings_pandas keep things unique. If you want anycodings_pandas something different, you'll need to be anycodings_pandas explicit and use tuples.
specific_index_values = (
[('A', 'foo'), ('A', 'bar')] * 2 + [('B', 'foo'), ('B', 'bar')]
)
df_2.loc[specific_index_values,: ]
Col_1
Idx1 Idx2
A foo 0
bar 1
foo 0
bar 1
B foo 2
bar 3
I find this distasteful but...
pd.concat([df_2.loc[[x]]
for x in ['A', 'A', 'B']
])
Col_1
Idx1 Idx2
A foo 0
bar 1
foo 0
bar 1
B foo 2
bar 3
Updated: July 25, 2022
import pandas as pd
import numpy as np
x = np.round(np.random.uniform(1, 5, size = (9, 4)), 2)
rowIndx = pd.MultiIndex.from_product(
[
["East", "North", "South"],
["A", "B", "C"]
],
names = ["Region", "Division"],
)
colIndex = pd.MultiIndex.from_product(
[
["Q1", "Q2"],
["Buy", "Sell"]
]
)
multidf = pd.DataFrame(data = x, index = rowIndx, columns = colIndex)
multidf
multidf.loc[['East']]
multidf.loc['NORTH': ,: 'Q1': ]
multidf.loc[: , ('Q1', 'Sell'): ('Q2', 'Buy')]
multidf.iloc[::2, 3: 4]
multidf.loc[(slice('East', 'North'), slice('B', 'C')),: ]
We looked at some ways to select values from a multi-index DataFrame, specifically by label or position, building on that knowledge, we will take a look at extracting sequences of slices of values.,We have previously used the loc indexer to get the same result. However, now we want to select an element using the stock name instead. In this case, we will use the command mentioned below.,The sort index method is the same that we’ve seen in practice, but for multi-index DataFrames specifically, we can fine-tune the sort using the level parameter.,Previously, we had created a multi-index from our technology stocks dataset by first, reading in the data and then, using the set_index method to specify date and name as the two levels of our multi-index.
Importing the DataFrame:
import numpy as np
import pandas as pd
tech = df = pd.read_csv("C:/Users/BHAVYA/Documents/tech_giants.csv")
tech.head()
To calculate for how many years the data has been recorded, we can use the following approach:
tech.year.value_counts() / 5 # Output 2016 252.0 2014 252.0 2015 252.0 2017 251.0 2018 251.0 2019 163.0 Name: year, dtype: float64
# Setting date as the Index tech.set_index('date')
The resultant DataFrame is a multi-index. It means that a single index object has more than one level or component to it. When we promoted the date
and name
columns as the index, they were removed from our DataFrame and they no longer show up as regular columns.
type(tech.index) # Output pandas.core.indexes.multi.MultiIndex
Let us use the read_csv
method to read the data again, but this time we will rely on the index_col
parameter in the read_csv
method to indicate that we want a multi-index in the resulting DataFrame.
tech = pd.read_csv("C:/Users/Documents/tech_giants.csv", index_col = ['date', 'name'])