pandas weird behavior using .replace() to swap values

  • Last Update :
  • Techknowledgy :

In summary, what you see is equivalent to

df.B.replace('a', 'b').replace('b', 'a')

0 a
1 a
Name: B, dtype: object

There is a workaround using str.replace with a lambda callback.

m = {
   'a': 'b',
   'b': 'a'
}
df.B.str.replace('|'.join(m.keys()), lambda x: m[x.group()])

0 b
1 a
Name: B, dtype: object

Suggestion : 2

In this post, you learned how to use the Pandas replace method to, well, replace values in a Pandas dataframe. The .replace () method is extremely powerful and lets you replace values across a single column, multiple columns, and an entire dataframe. The method also incorporates regular expressions to make complex replacements easier. , 2 days ago Aug 25, 2021  · The .replace () method is extremely powerful and lets you replace values across a single column, multiple columns, and an entire dataframe. The method also incorporates regular expressions to make complex replacements easier. To learn more about the Pandas .replace () method, check out the official documentation here. , You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame.loc [] property. The loc [] is used to access a group of rows and columns by label (s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame. , The loc [] is used to access a group of rows and columns by label (s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame. In the below example, I am replacing the values of Fee column to 15000 only for the rows where the condition of Fee column value is greater than 22000.


df = pd.DataFrame({
   'A': [0, 1]
}) df.A.replace({
   0: 1,
   1: 0
})

df.B.replace('a', 'b').replace('b', 'a') 0 a 1 a Name: B, dtype: object
df = pd.DataFrame({
   'A': [0, 1]
}) df.A.replace({
   0: 1,
   1: 0
})
df A 1 0
df = pd.DataFrame({
   'B': ['a', 'b']
}) df.B.replace({
   'a': 'b',
   'b': 'a'
})
df B 'a'
'a'

Suggestion : 3

August 31, 2021

Step 1: Import Pandas

import pandas as pd

Step 2: Read the CSV

# Read the csv file
df = pd.read_csv("data1.csv")

# First 5 rows
df.head()

Let’s load a file with | separator

# Read the csv file sep = '|'
df = pd.read_csv("data2.csv", sep = '|')
df

Note: Row numbering starts from 0 including column header

# Read the csv file with header parameter
df = pd.read_csv("data1.csv", header = 1)
df.head()

While reading the CSV file, you can rename the column headers by using the names parameter. The names parameter takes the list of names of the column header.

# Read the csv file with names parameter
df = pd.read_csv("data.csv", names = ['Ranking', 'ST Name', 'Pop', 'NS', 'D'])
df.head()

Answer: By using na_values parameter.

import pandas as pd

df = pd.read_csv("example1.csv", na_values = ['no', 'not available', '-100'])

Answer:

import pandas as pd

colnameWithVowels = lambda x: x.lower()[0] in ['a', 'e', 'i', 'o', 'u']

df = pd.read_csv("example2.csv", usecols = colnameWithVowels, header = 2, skipfooter = 5)

Suggestion : 4

Pandas weird behavior using .replace() to swap values,Pandas behavior of handling precision of 00,Pandas subtraction behavior having precision issues (even after casting),Python & Pandas: Strange behavior when Pandas plot histogram to a specific ax

I see that saving it as float or str still changes to a general format in Excel. So, you might as well change it to string to maintain the .00 decimals.

df['Active_Time_Spent'] = df['Active_Time_Spent'].astype(str)

Try this:

writer = pd.ExcelWriter('trial.xlsx', engine = 'xlsxwriter')
df.to_excel(writer, index = False, sheet_name = 'Sheet1')
worksheet = writer.sheets['Sheet1']
workbook = writer.book
format1 = workbook.add_format({
   'num_format': '0.00'
})
worksheet.set_column('A:A', None, format1)
writer.save()

Suggestion : 5

What if we want to change values while iterating over the rows of a Pandas Dataframe?,Loop over Rows of Pandas Dataframe using iterrows(),Dataframe got updated i.e. we changed the values while iterating over the rows of Dataframe. Bonus value for each row became double.,We learned about different ways to iterate over all rows of dataframe and change values while iterating.

Suppose we have a dataframe i.e

import pandas as pd

# List of Tuples
empoyees = [('jack', 34, 'Sydney', 5),
   ('Riti', 31, 'Delhi', 7),
   ('Aadi', 16, 'New York', 11)
]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(empoyees,
   columns = ['Name', 'Age', 'City', 'Experience'],
   index = ['a', 'b', 'c'])

print(df)

Contents of the created dataframe are,

   Name Age City Experience
   a jack 34 Sydney 5
   b Riti 31 Delhi 7
   c Aadi 16 New York 11

Let’s iterate over all the rows of above created dataframe using iterrows() i.e.

# Loop through all rows of Dataframe along with index label
for (index_label, row_series) in df.iterrows():
   print('Row Index label : ', index_label)
print('Row Content as Series : ', row_series.values)

For each row it yields a named tuple containing the all the column names and their value for that row. Let’s use it to iterate over all the rows of above created dataframe i.e.

# Iterate over the Dataframe rows as named tuples
for namedTuple in df.itertuples():
   print(namedTuple)

For every row in the dataframe a named tuple is returned. From named tuple you can access the individual values by indexing i.e.
To access the 1st value i.e. value with tag ‘index’ use,

print(namedTuple[0])