Possible solution is use merge
by all columns (no parameter on
) and then use isin
with subset:
print(ex2.merge(ex).isin(ex2)) col1 col2 col3 0 True True True 1 True True True print(ex2.merge(ex).isin(ex2).all().all()) True
Another idea is compare MultiIndex
es:
i1 = ex2.set_index(ex2.columns.tolist()).index i2 = ex.set_index(ex.columns.tolist()).index print(i1.isin(i2).all()) True
Find if a dataframe is a subset of an another dataframe, while ignoring index,Error while using index of one dataframe to acess the row of another dataframe,How do I subset the columns of a dataframe based on the index of another dataframe?,Sorting Pandas Dataframe by order of another index
Possible solution is use merge
by all columns (no parameter on
) and then use isin
with subset:
print(ex2.merge(ex).isin(ex2)) col1 col2 col3 0 True True True 1 True True True print(ex2.merge(ex).isin(ex2).all().all()) True
Another idea is compare MultiIndex
es:
i1 = ex2.set_index(ex2.columns.tolist()).index i2 = ex.set_index(ex.columns.tolist()).index print(i1.isin(i2).all()) True
I can compare for two dataframes when their index matches, but in my cases the rows have different indexes,Trying to find if a pandas df is a subset of a different pandas df or not, 5 days ago Jul 23, 2019 · Find if a dataframe is a subset of an another dataframe, while ignoring index. Ask Question Asked 3 years ago. Modified 3 years ago. Viewed 406 times 4 Trying to find if a pandas df is a subset of a different pandas df or not. I can compare for two dataframes when their index matches, but in my cases the rows have different indexes ... ,The display of third-party trademarks and trade names on this site does not necessarily indicate any affiliation or endorsement of FaqCode4U.com.
ex = pd.DataFrame({
"col1": ["banana", "tomato", "apple"],
"col2": ["cat", "dog", "kangoo"],
"col3": ["tv", "phone", "ps4"]
}) ex2 = pd.DataFrame({
"col1": ["tomato", "apple"],
"col2": ["dog", "kangoo"],
"col3": ["phone", "ps4"]
}) ex2.isin(ex).all().all() >>> False
ex = pd.DataFrame({
"col1": ["banana", "tomato", "apple"],
"col2": ["cat", "dog", "kangoo"],
"col3": ["tv", "phone", "ps4"]
}) ex2 = pd.DataFrame({
"col1": ["tomato", "apple"],
"col2": ["dog", "kangoo"],
"col3": ["phone", "ps4"]
}) ex2.isin(ex).all().all() >>> False
print(ex2.merge(ex).isin(ex2)) col1 col2 col3 0 True True True 1 True True True print(ex2.merge(ex).isin(ex2).all().all()) True
i1 = ex2.set_index(ex2.columns.tolist()).index i2 = ex.set_index(ex.columns.tolist()).index print(i1.isin(i2).all()) True
When concatenating DataFrames with named axes, pandas will attempt to preserve these index/column names whenever possible. In the case where all inputs share a common name, this name will be assigned to the result. When the input names do not all agree, the result will be unnamed. The same is true for MultiIndex, but the logic is applied separately on a level-by-level basis.,You can concatenate a mix of Series and DataFrame objects. The Series will be transformed to DataFrame with the column name as the name of the Series.,When DataFrames are merged on a string that matches an index level in both frames, the index level is preserved as an index level in the resulting DataFrame.,Strings passed as the on, left_on, and right_on parameters may refer to either column names or index level names. This enables merging DataFrame instances on a combination of index levels and columns without resetting indexes.
In[1]: df1 = pd.DataFrame(
...: {
...: "A": ["A0", "A1", "A2", "A3"],
...: "B": ["B0", "B1", "B2", "B3"],
...: "C": ["C0", "C1", "C2", "C3"],
...: "D": ["D0", "D1", "D2", "D3"],
...:
},
...: index = [0, 1, 2, 3],
...: )
...:
In[2]: df2 = pd.DataFrame(
...: {
...: "A": ["A4", "A5", "A6", "A7"],
...: "B": ["B4", "B5", "B6", "B7"],
...: "C": ["C4", "C5", "C6", "C7"],
...: "D": ["D4", "D5", "D6", "D7"],
...:
},
...: index = [4, 5, 6, 7],
...: )
...:
In[3]: df3 = pd.DataFrame(
...: {
...: "A": ["A8", "A9", "A10", "A11"],
...: "B": ["B8", "B9", "B10", "B11"],
...: "C": ["C8", "C9", "C10", "C11"],
...: "D": ["D8", "D9", "D10", "D11"],
...:
},
...: index = [8, 9, 10, 11],
...: )
...:
In[4]: frames = [df1, df2, df3]
In[5]: result = pd.concat(frames)
pd.concat(
objs,
axis = 0,
join = "outer",
ignore_index = False,
keys = None,
levels = None,
names = None,
verify_integrity = False,
copy = True,
)
In[6]: result = pd.concat(frames, keys = ["x", "y", "z"])
In[7]: result.loc["y"]
Out[7]:
A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
frames = [process_your_file(f) for f in files] result = pd.concat(frames)
Indexing, Slicing and Subsetting DataFrames in Python, Episodes Before we start Introduction to python Starting With Data Indexing, Slicing and Subsetting DataFrames in Python Manipulating DataFrames with pandas Data workflows and automation Making Plots With Matplotlib Accessing SQLite Databases Using Python & Pandas All in one page (Beta) ,Remember that Python indexing begins at 0. So, the index location [2, 6] selects the element that is 3 rows down and 7 columns over in the DataFrame.,We can also select a specific data value using a row and column location within the DataFrame and iloc indexing:
# Make sure pandas is loaded import pandas as pd # read in the rainfall csv rainfall_df = pd.read_csv("data/rainfall_combined.csv")
# TIP: use the.head() method we saw earlier to make output shorter
# Method 1: select a 'subset' of the data using the column name
rainfall_df['raingauges_id']
# Method 2: use the column name as an 'attribute';
gives the same output
rainfall_df.raingauges_id
# creates an object, rainfall_raingauges, that only contains the `raingauges_id` column rainfall_raingauges = rainfall_df['raingauges_id']
# select the raingauges and ward columns from the DataFrame rainfall_df[['raingauges_id', 'ward_id']] # what happens when you flip the order ? rainfall_df[['ward_id', 'raingauges_id']] #what happens if you ask for a column that doesn 't exist? rainfall_df['wards']
# Create a list of numbers:
a = [1, 2, 3, 4, 5]
a[0]