Use the str.replace() method to remove \xa0 from a string, e.g. result = my_str.replace('\xa0', ' '). The str.replace() method will replace all occurrences of the \xa0 (non-breaking space) character with a space.,The str.replace method returns a copy of the string with all occurrences of a substring replaced by the provided replacement.,On each iteration, use the str.replace() method to replace occurrences of \xa0 with a space.,Use the unicodedata.normalize() method to remove \xa0 from a string, e.g. result = unicodedata.normalize('NFKD', my_str). The unicodedata.normalize method returns the normal form for the provided unicode string by replacing all compatibility characters with their equivalents.
Copied! import unicodedata my_str = 'hello\xa0world' #✅ remove\ xa0 from string using unicodedata.normalize() result = unicodedata.normalize('NFKD', my_str) print(result) #👉️ 'hello world' #-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- #✅ remove\ xa0 from string using str.replace() result = my_str.replace('\xa0', ' ') print(result) #👉️ 'hello world' #-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- #✅ remove\ xa0 from list of strings my_list = ['hello\xa0', '\xa0world'] result = [string.replace('\xa0', ' ') for string in my_list] print(result) #👉️['hello ', ' world']
Copied! import unicodedata my_str = 'hello\xa0world' result = unicodedata.normalize('NFKC', my_str) print(result) #👉️ 'hello world'
Copied!my_str = 'hello\xa0world' result = my_str.replace('\xa0', ' ') print(result) #👉️ 'hello world'
Copied!my_list = ['hello\xa0', '\xa0world'] result = [string.replace('\xa0', ' ') for string in my_list] print(result) #👉️['hello ', ' world']
Try:
new_str = unicodedata.normalize("NFKD", unicode_str)
Assume we have our raw html as following:
raw_html = '<p>Dear Parent, </p>
<p><span style="font-size: 1rem;">This is a test message, </span><span style="font-size: 1rem;">kindly ignore it. </span></p>
<p><span style="font-size: 1rem;">Thanks</span></p>'
So lets try to clean this HTML string:
from bs4 import BeautifulSoup
raw_html = '<p>Dear Parent, </p>
<p><span style="font-size: 1rem;">This is a test message, </span><span style="font-size: 1rem;">kindly ignore it. </span></p>
<p><span style="font-size: 1rem;">Thanks</span></p>'
text_string = BeautifulSoup(raw_html, "lxml").text
print text_string
#u'Dear Parent,\xa0This is a test message,\xa0kindly ignore it.\xa0Thanks'
Method # 1 (Recommended): The first one is BeautifulSoup's get_text method with strip argument as True So our code becomes:
clean_text = BeautifulSoup(raw_html, "lxml").get_text(strip = True) print clean_text # Dear Parent, This is a test message, kindly ignore it.Thanks
try this:
string.replace('\\xa0', ' ')
Try this code
import re
re.sub(r '[^\x00-\x7F]+', '', 'paste your string here').decode('utf-8', 'ignore').strip()
Python recognize it like a space character, so you can split
it without args and join by a normal whitespace:
line = ' '.join(line.split())
I end up here while googling for the problem with not printable character. I use MySQL UTF-8
general_ci
and deal with polish language. For problematic strings I have to procced as follows:
text = text.replace('\xc2\xa0', ' ')
\xa0 is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160). You should replace it with a space.,After trying several methods, to summarize it, this is how I did it. Following are two ways of avoiding/removing \xa0 characters from parsed HTML string.,Please note: this answer in from 2012, Python has moved on, you should be able to use unicodedata.normalize now,The above code produces these characters \xa0 in the string. To remove them properly, we can use two ways.
Try:
new_str = unicodedata.normalize("NFKD", unicode_str)
Assume we have our raw html as following:
raw_html = '<p>Dear Parent, </p>
<p><span style="font-size: 1rem;">This is a test message, </span><span style="font-size: 1rem;">kindly ignore it. </span></p>
<p><span style="font-size: 1rem;">Thanks</span></p>'
So lets try to clean this HTML string:
from bs4 import BeautifulSoup raw_html = '<p>Dear Parent, </p>
<p><span style="font-size: 1rem;">This is a test message, </span><span style="font-size: 1rem;">kindly ignore it. </span></p>
<p><span style="font-size: 1rem;">Thanks</span></p>' text_string = BeautifulSoup(raw_html, "lxml").text print text_string #u'Dear Parent,\xa0This is a test message,\xa0kindly ignore it.\xa0Thanks'
Method # 1 (Recommended): The first one is BeautifulSoup's get_text method with strip argument as True So our code becomes:
clean_text = BeautifulSoup(raw_html, "lxml").get_text(strip = True) print clean_text # Dear Parent, This is a test message, kindly ignore it.Thanks
try this:
string.replace('\\xa0', ' ')