naming a column in pandas that is dependent on a function

  • Last Update :
  • Techknowledgy :

The function to apply f needs to accept either rows/columns, depending on axis=0,1, of df as an argument, not the column name. You can write a wrapper for this purpose.

def wrapper(x, A, B, C):
   return f(x[A], x[B], x[C])

df.apply(wrapper, axis = 1, args = ('A', 'B', 'C'))

Output:

0 3
1 6
2 9
dtype: int64

if you are interesting for "apply" function, here is the case

df = pd.DataFrame({
   'A': [1, 2, 3],
   'B': [1, 2, 3],
   'C': [1, 2, 3],
   'D': [1, 2, 3]
})

def func(row):
   row['result'] = row['A'] + row['B'] + row['C']
return row

df.apply(func, axis = 1)

Out[67]:
   A B C D result
0 1 1 1 1 3
1 2 2 2 2 6
2 3 3 3 3 9

If you have to use function "f" and don't want to change it, may be this:

df['res'] = f(df['A'], df['B'], df['C'])
df

Out[70]:
   A B C D res
0 1 1 1 1 3
1 2 2 2 2 6
2 3 3 3 3 9

So given:

>>>
import pandas as pd
   >>>
   df = pd.DataFrame({
      'A': [1, 2, 3],
      ...'B': [1, 2, 3],
      ...'C': [1, 2, 3],
      ...'D': [1, 2, 3]
   }) >>>
   df
A B C D
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
   >>>
   def f(A, B, C): return A + B + C
      ...

We could almost do:

>>> df.apply(lambda row: f(**row), axis=1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
      File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/frame.py", line 6014, in apply
      return op.get_result()
      File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 142, in get_result
      return self.apply_standard()
      File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 248, in apply_standard
      self.apply_series_generator()
      File "/Users/juan/anaconda3/envs/ecqm-catalog/lib/python3.7/site-packages/pandas/core/apply.py", line 277, in apply_series_generator
      results[i] = self.f(v)
      File "<stdin>", line 1, in <lambda>
            TypeError: ("f() got an unexpected keyword argument 'D'", 'occurred at index 0')

If you know what the columns you need, you can select/drop to get the correct series:

>>> df.drop('D', axis = 1).apply(lambda row: f( ** row), axis = 1)
0 3
1 6
2 9

Suggestion : 2

Naming a column in pandas that is dependent on a function,Applying a function to a column that takes another column as an argument in Python Pandas,Pandas dataframe: creating a new column that is a custom function using 2 other columns,How to write generic function that acts differently depending on passed pandas column dtype

Use format:

a = len(series) - 6
df['M{}'.format(a)] = ...

df['M{}'.format(len(series) - 6)] = ...

Or f-strings for python 3.6+:

df[f 'M{a}'] = ...

Suggestion : 3

The first method that we suggest is using Pandas Rename. Rename takes a dict with a key of your old column name and a key of your new column name. Amazingly, it also takes a function!,The most straight forward and explicit way to change your column names is via .rename(). I like this method the most because you can easily change one, or all of your column names via a dict.,Pandas Change Column names – Changing column names within pandas is easy. You only need to decide which method you want to use. Depending on your use case, you can pick the best one for you.,Lastly, you could also change your column names by setting your axis. This last method is great, but doesn't have many advantages (if any) over the first two.

The easiest and most popular one will be done via the .rename() method. But look below for 2 other ways.

pandas.DataFrame.rename(columns = {
   'old_column_name': 'new_column_name'
})

This means that you’re able to apply a string function to your column names and apply a transformation to all of your column names. For instance if you wanted to upper case your column names.

pandas.DataFrame.rename(columns = {
   "old_column_name": "new_column_name"
})
pandas.DataFrame.columns = ['your', 'new', 'column', 'names']

The last method (and our least favorite) is to set_axis on top of your DataFrame and specify axis=1. This will have similar functionality as setting .columns. In this case, Pandas will completely overwrite all of your column names with whatever you give it.

pandas.DataFrame.set_axis(['your', 'new', 'column', 'names'
   2
], axis = 1)
import pandas as pd
df = pd.DataFrame([('Foreign Cinema', 'Restaurant', 289.0),
      ('Liho Liho', 'Restaurant', 224.0),
      ('500 Club', 'bar', 80.5),
      ('The Square', 'bar', 25.30)
   ],
   columns = ('name', 'type', 'AvgBill')
)
df

Suggestion : 4

Method #1: Using rename () function. One way of renaming the columns in a Pandas dataframe is by using the rename () function. This method is quite useful when we need to rename some selected columns because we need to specify information only for the columns which are to be renamed. , The apply () method applies the function along a specified axis. It passes the columns as a dataframe to the custom function, whereas a transform () method passes individual columns as pandas Series to the custom function. , 1 week ago Mar 08, 2021  · When you want to rename some selected columns, the rename () function is the best choice. columns.str.replace () is useful only when you want to replace characters. Note, passing a custom function to rename () can do the same. Lastly, we could also change column names by setting axis via set_axis (). , 1 week ago Jul 01, 2022  · Method 4: Rename column names using DataFrame add_prefix () and add_suffix () functions. In this example, we will rename the column name using the add_Sufix and add_Prefix function, we will pass the prefix and suffix that should be added to the first and last name of the column name. Python3. import pandas as pd.


a = len(series) - 6 df['M20'] = ...

a = len(series) - 6 df['M{}'.format(a)] = ...
a = len(series) - 6 df['M20'] = ...
a = len(series) - 6 df['M{}'.format(a)] = ...
df['M{}'.format(len(series) - 6)] = ...
df[f 'M{a}'] = ...

Suggestion : 5

Use the Pandas dataframe rename() function to modify specific column names. ,You can use this function to rename specific columns. The following is the syntax to change column names using the Pandas rename() function.,You can use the Pandas dataframe rename() function to rename column names in Pandas. There are other methods as well. The following are some of the ways in which you can change the name of columns in Pandas –,The Pandas dataframe rename() function is a quite versatile function used not only to rename column names but also row indices.

You can use this function to rename specific columns. The following is the syntax to change column names using the Pandas rename() function.

df.rename(columns = {
   "OldName": "NewName"
})

Here, we will create a dataframe storing the category and color information of some pets in the columns “Category” and “Color” respectively.

import pandas as pd

# create a dataframe
data = {
   'Category': ['Dog', 'Cat', 'Rabbit', 'Parrot'],
   'Color': ['brown', 'black', 'white', 'green']
}
df = pd.DataFrame(data)

# print dataframe columns
print("Dataframe columns:", df.columns)

# change column name Category to Pet
df = df.rename(columns = {
   "Category": "Pet"
})

# print dataframe columns
print("Dataframe columns:", df.columns)

Output:

Dataframe columns: Index(['Category', 'Color'], dtype = 'object')
Dataframe columns: Index(['Pet', 'Color'], dtype = 'object')

Output:

Dataframe columns: Index(['Col1_Category', 'Col2_Color'], dtype = 'object')
Dataframe columns: Index(['Category', 'Color'], dtype = 'object')

The pandas dataframe set_axis() method can be used to rename a dataframe’s columns by passing a list of all columns with their new names. Note that the length of this list must be equal to the number of columns in the dataframe. The following is the syntax:

df.set_axis(new_column_list, axis = 1)

Suggestion : 6

 The rename() function returns a new DataFrame with renamed axis labels (i.e. the renamed columns or rows depending on usage). To modify the DataFrame in-place set the argument inplace to True.,The pandas DataFrame.rename() function is a quite versatile function used not only to rename column names but also row indices. The good thing about this function is that you can rename specific columns. The syntax to change column names using the rename function is-,Use the pandas DataFrame.rename() function to modify specific column names.,If the number of columns in the Pandas DataFrame is huge, say nearly 100, and we want to replace the space in all the column names (if it exists) by an underscore and it is not easy to provide a list or dictionary to rename all the columns. Then we use the following method-

If, you are in hurry below are some quick examples to change specific column names on DataFrame.

# Below are some quick examples.
# Syntax to change column name using rename()
function.
df.rename(columns = {
   "OldName": "NewName"
})

# Using rename()
function.
df.rename(columns = {
   'Fee': 'Fees'
}, inplace = True)

# Renaming Multiple columns.
df.rename({
      'Courses': 'Course_ Name',
      'Fee': 'CourseFee',
      'Duration': 'CourseDuration'
   },
   axis = "columns", inplace = True)

# Changing Column Attribute.
df.columns.values[0] = 'Course'

# errors parameter to 'raise'
when column not present.
df2 = df.rename(columns = {
   'Courses': 'EmpCourses'
}, errors = 'raise')
2._
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies = {
   'Courses': ["Spark", "PySpark", "Spark", "Python", "PySpark"],
   'Fee': [22000, 25000, 23000, 24000, 26000],
   'Duration': ['30days', '50days', '30days', '35days', '60days']
}
df = pd.DataFrame(technologies)
print(df)

Yields below output.

Courses Fee Duration
0 Spark 22000 30 days
1 PySpark 25000 50 days
2 Spark 23000 30 days
3 Python 24000 35 days
4 PySpark 26000 60 days
5._
# Using rename()
function.
df.rename(columns = {
   'Fee': 'Fees'
}, inplace = True)
print(df)

You can also update the DataFrame column by setting its columns attribute to your new list of columns. Access the index to change the specified column name.

# Changing Column Attribute.
df.columns.values[0] = 'Course'
print(df)

Suggestion : 7

Aggregation functions will not return the groups that you are aggregating over if they are named columns, when as_index=True, the default. The grouped columns will be the indices of the returned object.,Passing as_index=False will return the groups that you are aggregating over, if they are named columns.,The resulting dtype will reflect that of the aggregating function. If the results from different groups have different dtypes, then a common dtype will be determined in the same way as DataFrame construction.,Aggregating functions are the ones that reduce the dimension of the returned objects. Some common aggregating functions are tabulated below:

SELECT Column1, Column2, mean(Column3), sum(Column4)
FROM SomeTable
GROUP BY Column1, Column2
In[1]: df = pd.DataFrame(
      ...: [
         ...: ("bird", "Falconiformes", 389.0),
         ...: ("bird", "Psittaciformes", 24.0),
         ...: ("mammal", "Carnivora", 80.2),
         ...: ("mammal", "Primates", np.nan),
         ...: ("mammal", "Carnivora", 58),
         ...:
      ],
      ...: index = ["falcon", "parrot", "lion", "monkey", "leopard"],
      ...: columns = ("class", "order", "max_speed"),
      ...: )
   ...:

   In[2]: df
Out[2]:
   class order max_speed
falcon bird Falconiformes 389.0
parrot bird Psittaciformes 24.0
lion mammal Carnivora 80.2
monkey mammal Primates NaN
leopard mammal Carnivora 58.0

#
default is axis = 0
In[3]: grouped = df.groupby("class")

In[4]: grouped = df.groupby("order", axis = "columns")

In[5]: grouped = df.groupby(["class", "order"])
In[6]: df = pd.DataFrame(
      ...: {
         ...: "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
         ...: "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
         ...: "C": np.random.randn(8),
         ...: "D": np.random.randn(8),
         ...:
      }
      ...: )
   ...:

   In[7]: df
Out[7]:
   A B C D
0 foo one 0.469112 - 0.861849
1 bar one - 0.282863 - 2.104569
2 foo two - 1.509059 - 0.494929
3 bar three - 1.135632 1.071804
4 foo two 1.212112 0.721555
5 bar two - 0.173215 - 0.706771
6 foo one 0.119209 - 1.039575
7 foo three - 1.044236 0.271860
In[8]: grouped = df.groupby("A")

In[9]: grouped = df.groupby(["A", "B"])
In[10]: df2 = df.set_index(["A", "B"])

In[11]: grouped = df2.groupby(level = df2.index.names.difference(["B"]))

In[12]: grouped.sum()
Out[12]:
   C D
A
bar - 1.591710 - 1.739537
foo - 0.752861 - 1.402938
In[13]: def get_letter_type(letter):
   ....: if letter.lower() in 'aeiou':
   ....: return 'vowel'
      ....:
      else:
         ....: return 'consonant'
            ....:

            In[14]: grouped = df.groupby(get_letter_type, axis = 1)