data['result'] = data['result'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC'))
Specify the substring/pattern to match, and the substring to replace it with.
pd.__version__ # '0.24.1' df time result 1 09: 00 + 52 A 2 10: 00 + 62 B 3 11: 00 + 44 a 4 12: 00 + 30 b 5 13: 00 - 110 a
df['result'] = df['result'].str.replace(r '\D', '')
df
time result
1 09: 00 52
2 10: 00 62
3 11: 00 44
4 12: 00 30
5 13: 00 110
If you need the result converted to an integer, you can use Series.astype
,
df['result'] = df['result'].str.replace(r '\D', '').astype(int)
df.dtypes
time object
result int64
dtype: object
Useful for extracting the substring(s) you want to keep.
df['result'] = df['result'].str.extract(r '(\d+)', expand = False)
df
time result
1 09: 00 52
2 10: 00 62
3 11: 00 44
4 12: 00 30
5 13: 00 110
Splitting works assuming all your strings follow this consistent structure.
# df['result'] = df['result'].str.split(r '\D').str[1] df['result'] = df['result'].str.split(r '\D').str.get(1) df time result 1 09: 00 52 2 10: 00 62 3 11: 00 44 4 12: 00 30 5 13: 00 110
def eumiro(df): return df.assign( result = df['result'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC'))) def coder375(df): return df.assign( result = df['result'].replace(r '\D', r '', regex = True)) def monkeybutter(df): return df.assign(result = df['result'].map(lambda x: x[1: -1])) def wes(df): return df.assign(result = df['result'].str.lstrip('+-').str.rstrip('aAbBcC')) def cs1(df): return df.assign(result = df['result'].str.replace(r '\D', '')) def cs2_ted(df): # `str.extract` based solution, similar to @Ted Petrou 's. so timing together. return df.assign(result = df['result'].str.extract(r '(\d+)', expand = False)) def cs1_listcomp(df): return df.assign(result = [p1.sub('', x) for x in df['result']]) def cs2_listcomp(df): return df.assign(result = [p2.search(x)[0] for x in df['result'] ]) def cs_eumiro_listcomp(df): return df.assign( result = [x.lstrip('+-').rstrip('aAbBcC') for x in df['result']]) def cs_mb_listcomp(df): return df.assign(result = [x[1: -1] for x in df['result'] ])
i'd use the pandas replace function, very simple and powerful as you can use regex. Below i'm using the regex \D to remove any non-digit characters but obviously you could get quite creative with regex.
data['result'].replace(regex = True, inplace = True, to_replace = r '\D', value = r '')
Last character:
data['result'] = data['result'].map(lambda x: str(x)[: -1])
First two characters:
data['result'] = data['result'].map(lambda x: str(x)[2: ])
EDIT: 2012-12-07 this works now on the dev branch:
In[8]: df['result'].str.lstrip('+-').str.rstrip('aAbBcC')
Out[8]:
1 52
2 62
3 44
4 30
5 110
Name: result
The string lstrip() function is used to remove leading characters from a string. Pass the substring that you want to be removed from the start of the string as the argument. ,The string rstrip() function is used to remove trailing characters from a string. Pass the substring that you want to be removed from the end of the string as the argument. ,Note that the string replace() function will replace every occurrence of the substring. This can be an issue if the prefix substring occurs later in the column name. Thus, it is recommended that you use the string lstrip() function to remove prefixes.,You can use the string lstrip() function or the string replace() function to remove prefix from column names. Let’s go over them with the help of examples. First, we will create a sample dataframe that we will be using throughout this tutorial.
You can use the string lstrip()
function or the string replace()
function to remove prefix from column names. Let’s go over them with the help of examples. First, we will create a sample dataframe that we will be using throughout this tutorial.
import pandas as pd # create a dataframe df = pd.DataFrame({ "tb1_Name": ["Emma", "Shivam", "Mike", "Noor"], "tb1_Age": [16, 17, 14, 16] }) # display the dataframe print(df)
Output:
tb1_Name tb1_Age 0 Emma 16 1 Shivam 17 2 Mike 14 3 Noor 16
To rename the columns, we will apply this function on each column name as follows.
# remove prefix df.columns = df.columns.str.lstrip("tb1_") # display the dataframe print(df)
The string rstrip()
function is used to remove trailing characters from a string. Pass the substring that you want to be removed from the end of the string as the argument.
# create a dataframe df = pd.DataFrame({ "Name_tb1": ["Emma", "Shivam", "Mike", "Noor"], "Age_tb1": [16, 17, 14, 16] }) # display the dataframe print(df)
Here the column names in the dataframe df have suffix “_tb1” which we want to remove. To rename the columns, we will apply the rstrip()
function on each column name as follows.
# remove suffix df.columns = df.columns.str.rstrip("_tb1") # display the dataframe print(df)
No Comments on How to remove unwanted parts from strings in a column with Python Pandas? ,To remove unwanted parts from strings in a column with Python Pandas, we can use the map method.,to call map with a lambda function that returns the original string with the unwanted parts removed with lstrip and rstrip., How to bin a column with Python Pandas?To bin a column with Python Pandas, we can use the cut method. For instance,…
data['result'] = data['result'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC'))
pandas.Series.str.replace , pandas.Series.replace , pandas.Series.str.slice_replace ,pandas.Series.str.repeat
>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f.', 'ba', regex = True)
0 bao
1 baz
2 NaN
dtype: object
>>> pd.Series(['f.o', 'fuz', np.nan]).str.replace('f.', 'ba', regex = False)
0 bao
1 fuz
2 NaN
dtype: object
>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f', repr, regex=True)
0 <re.Match object; span=(0, 1), match='f'>oo
1 <re.Match object; span=(0, 1), match='f'>uz
2 NaN
dtype: object
>>> repl = lambda m: m.group(0)[::-1] >>>
ser = pd.Series(['foo 123', 'bar baz', np.nan]) >>>
ser.str.replace(r '[a-z]+', repl, regex = True)
0 oof 123
1 rab zab
2 NaN
dtype: object
>>> pat = r"(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
>>> repl = lambda m: m.group('two').swapcase()
>>> ser = pd.Series(['One Two Three', 'Foo Bar Baz'])
>>> ser.str.replace(pat, repl, regex=True)
0 tWO
1 bAR
dtype: object
>>>
import re
>>>
regex_pat = re.compile(r 'FUZ', flags = re.IGNORECASE) >>>
pd.Series(['foo', 'fuz', np.nan]).str.replace(regex_pat, 'bar', regex = True)
0 foo
1 bar
2 NaN
dtype: object