pandas: groupby column, merge rows of lists into a single column for group?

  • Last Update :
  • Techknowledgy :

You can GroupBy and aggregate on the column containing lists with sum to concatenate the lists within the group and on Feature 2 with first:

df.groupby('Groups').agg({
   'Feature 1': 'sum',
   'Feature 2': 'first'
}).reset_index()

Groups Feature 1 Feature 2
0 GROUP A[abc, def, ghi, jkl, mno, pqr] 1
1 GROUP B[stu, vwx, yz, xx, yx, zx] 2
2 GROUP C[text, more, stuff, here, last, one] 3

Suggestion : 2

You can GroupBy and aggregate on the column containing lists with sum to concatenate the lists within the group and on Feature 2 with first:,Split Column containing lists into different rows in pandas,Count how many rows within the same group have a larger value in a given column for each row in Pandas DataFrame,Moving a list half its size against other list to get the common elements

You can GroupBy and aggregate on the column containing lists with sum to concatenate the lists within the group and on Feature 2 with first:

df.groupby('Groups').agg({
   'Feature 1': 'sum',
   'Feature 2': 'first'
}).reset_index()

Groups Feature 1 Feature 2
0 GROUP A[abc, def, ghi, jkl, mno, pqr] 1
1 GROUP B[stu, vwx, yz, xx, yx, zx] 2
2 GROUP C[text, more, stuff, here, last, one] 3

Suggestion : 3

By using DataFrame.gropby() function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply(list).,You can group DataFrame rows into a list by using pandas.DataFrame.groupby() function on the column of interest, select the column you want as a list from group and then use Series.apply(list) to get the list for every group. In this article, I will explain how to group rows into the list using few examples.,Alternatively, you can also do group rows into list using df.groupby("Courses").agg({"Discount":lambda x:list(x)}) function. Use the groupby() method on the Courses and agg() method to apply the aggregation on every group of pandas.DataFrame.,In this article, you have learned how to group DataFrame rows into the list in the Pandas by using groupby() and using Series.apply(), Series.agg(). Also, you have learned to group rows into a list on all columns.

Below are some of the good examples to group rows into a list in pandas DataFrame.

# Group Rows on 'Courses'
column and get List
for 'Fee'
column
df2 = df.groupby('Courses')['Fee'].apply(list)

# Assign a Column Name to the groped list
df2 = df.groupby('Courses')['Fee'].apply(list).reset_index(name = "Course_Fee")

# Group Rows into List
df2 = df.groupby("Courses").agg({
   "Discount": lambda x: list(x)
})

# Group Rows into List on All columns
df2 = df.groupby("Courses").agg(list)

# Other way
df2 = df.groupby('Courses').agg(pd.Series.tolist)
2._
import pandas as pd
technologies = ({
   'Courses': ["Spark", "PySpark", "Hadoop", "Python", "pandas", "PySpark", "Python", "pandas"],
   'Fee': [24000, 25000, 25000, 24000, 24000, 25000, 25000, 24000],
   'Duration': ['30day', '40days', '35days', '40days', '60days', '50days', '55days', '35days'],
   'Discount': [1000, 2300, 1500, 1200, 2500, 2100, 2000, 2500]
})
df = pd.DataFrame(technologies)
print(df)

Yields below output.

Courses Fee Duration Discount
0 Spark 24000 30 day 1000
1 PySpark 25000 40 days 2300
2 Hadoop 25000 35 days 1500
3 Python 24000 40 days 1200
4 pandas 24000 60 days 2500
5 PySpark 25000 50 days 2100
6 Python 25000 55 days 2000
7 pandas 24000 35 days 2500
5._
Courses
Hadoop[25000]
PySpark[25000, 25000]
Python[24000, 25000]
Spark[24000]
pandas[24000, 24000]
Name: Fee, dtype: object

On groupby() list results use .reset_index(name="Course_Fee") to assign a column name to the list column.

# Assign a Column Name to the groped list
df2 = df.groupby('Courses')['Fee'].apply(list).reset_index(name = "Course_Fee")
print(df2)

Suggestion : 4

On a DataFrame, we obtain a GroupBy object by calling groupby(). We could naturally group by either the A or B columns, or both:,With the GroupBy object in hand, iterating through the grouped data is very natural and functions similarly to itertools.groupby():,Groupby also works with some plotting methods. For example, suppose we suspect that some features in a DataFrame may differ by group, in this case, the values in column 1 where the group is “B” are 3 higher on average.,For DataFrame objects, a string indicating either a column name or an index level name to be used to group.

SELECT Column1, Column2, mean(Column3), sum(Column4)
FROM SomeTable
GROUP BY Column1, Column2
In[1]: df = pd.DataFrame(
      ...: [
         ...: ("bird", "Falconiformes", 389.0),
         ...: ("bird", "Psittaciformes", 24.0),
         ...: ("mammal", "Carnivora", 80.2),
         ...: ("mammal", "Primates", np.nan),
         ...: ("mammal", "Carnivora", 58),
         ...:
      ],
      ...: index = ["falcon", "parrot", "lion", "monkey", "leopard"],
      ...: columns = ("class", "order", "max_speed"),
      ...: )
   ...:

   In[2]: df
Out[2]:
   class order max_speed
falcon bird Falconiformes 389.0
parrot bird Psittaciformes 24.0
lion mammal Carnivora 80.2
monkey mammal Primates NaN
leopard mammal Carnivora 58.0

#
default is axis = 0
In[3]: grouped = df.groupby("class")

In[4]: grouped = df.groupby("order", axis = "columns")

In[5]: grouped = df.groupby(["class", "order"])
In[6]: df = pd.DataFrame(
      ...: {
         ...: "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
         ...: "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
         ...: "C": np.random.randn(8),
         ...: "D": np.random.randn(8),
         ...:
      }
      ...: )
   ...:

   In[7]: df
Out[7]:
   A B C D
0 foo one 0.469112 - 0.861849
1 bar one - 0.282863 - 2.104569
2 foo two - 1.509059 - 0.494929
3 bar three - 1.135632 1.071804
4 foo two 1.212112 0.721555
5 bar two - 0.173215 - 0.706771
6 foo one 0.119209 - 1.039575
7 foo three - 1.044236 0.271860
In[8]: grouped = df.groupby("A")

In[9]: grouped = df.groupby(["A", "B"])
In[10]: df2 = df.set_index(["A", "B"])

In[11]: grouped = df2.groupby(level = df2.index.names.difference(["B"]))

In[12]: grouped.sum()
Out[12]:
   C D
A
bar - 1.591710 - 1.739537
foo - 0.752861 - 1.402938
In[13]: def get_letter_type(letter):
   ....: if letter.lower() in 'aeiou':
   ....: return 'vowel'
      ....:
      else:
         ....: return 'consonant'
            ....:

            In[14]: grouped = df.groupby(get_letter_type, axis = 1)