python regex: removing all special characters and numbers not attached to words

  • Last Update :
  • Techknowledgy :

To match alphanumeric strings or only letter words you may use the following pattern with re:

import re
#...
   re.findall(r '(?:[^\W\d_]+\d|\d+[^\W\d_])[^\W_]*|[^\W\d_]+', text.lower())

Decomposition:

\
b # word boundary
   /
   d * # zero or more digits[ ^ \W\ d_] # one alphabetic character[ ^ \W_] * # zero or more alphanumeric characters\ b # word boundary

Try this RegEx instead:

([A - Za - z] + (\d) * [A - Za - z] * )

Suggestion : 2

In this article, we will discuss four different ways to delete special characters from a string in python.,We learned about different ways to delete the spcecial characters from a string in python.,Let’s see different ways to delete special characters from a string,,In Python, we can use the filter() function to filter out special characters from a string. Steps are as follows,

In python, string.punctuation from string module contains all the special characters i.e.

r ""
"!"
#$ % & '()*+,-./:;<=>[email protected][\]^_`{|}~"""
2._
import string
import re

sample_str = "Test&[88]%%$$$#$%-+String"

# Create a regex pattern to match all special characters in string
pattern = r '[' + string.punctuation + ']'

# Remove special characters from the string
sample_str = re.sub(pattern, '', sample_str)

print(sample_str)

Output:

Test88String

Using list comprehension, iterate over all the characters of string one by one and skip characters non alphanumeric characters. It returns a list of filtered characters. Combine these remaining characters using join() and assign it back to same variable. It will give an effect that we have deleted all special characters from the string. For example,

sample_str = "Test&[88]%%$$$#$%-+String"

# Remove special characters from a string
sample_str = ''.join(item
   for item in sample_str
   if item.isalnum())

print(sample_str)

For example,

sample_str = "Test&[88]%%$$$#$%-+String"

# Remove special characters from a string
sample_str = ''.join(filter(str.isalnum, sample_str))

print(sample_str)

Suggestion : 3

Given a string consisting of alphabets and others characters, remove all the characters other than alphabets and print the string so formed. ,Iterate over the characters of the string and if the character is an alphabet then add the character to the new string.Finally, the new string contains only the alphabets of the given string.,To remove all the characters other than alphabets(a-z) && (A-Z), we just compare the character with the ASCII value, and for the character whose value does not lie in the range of alphabets, we remove those characters using string erase function. ,Initialize an empty string, string with lowercase alphabets(la) and uppercase alphabets(ua). Iterate a for loop on string,  if the character is in la or ua using in and not in operators concatenate them to an empty string. Display the string after the end of the loop.

GeeksforGeeks

Suggestion : 4

A character which is not an alphabet or numeric character is called a special character. We should remove all the special characters from the string so that we can read the string clearly and fluently. Special characters are not readable, so it would be good to remove them before reading.,In the following example, the removeAll() method removes all the special characters from the string and puts a space in place of them.,In the following example, we are replacing all the special character with the space.,In the following example, we are defining logic to remove special characters from a string. We know that the ASCII value of capital letter alphabets starts from 65 to 90 (A-Z) and the ASCII value of small letter alphabet starts from 97 to 122 (a-z). Each character compare with their corresponding ASCII value. If both the specified condition return true it return true else return false. The for loop executes till the length of the string. When the string reaches its size, it terminates execution and we get the resultant string.

This string contains special characters
Hello Java Programmer!
String after removing special characters: ProgrammingLanguage

Suggestion : 5

Either escapes special characters (permitting you to match characters like '*', '?', and so forth), or signals a special sequence; special sequences are discussed below.,Some characters, like '|' or '(', are special. Special characters either stand for classes of ordinary characters, or affect how the regular expressions around them are interpreted.,Special characters lose their special meaning inside sets. For example, [(+*)] will match any of the literal characters '(', '+', '*', or ')'.,Escape special characters in pattern. This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it. For example:

>>>
import re
   >>>
   m = re.search('(?<=abc)def', 'abcdef') >>>
   m.group(0)
'def'
>>> m = re.search(r '(?<=-)\w+', 'spam-egg') >>>
   m.group(0)
'egg'
\
a\ b\ f\ n\ N\ r\ t\ u\ U\ v\ x\\
a = re.compile(r ""
      "\d +  # the integral part\.# the decimal point\ d * # some fractional digits ""
      ", re.X)
      b = re.compile(r "\d+\.\d*")
prog = re.compile(pattern)
result = prog.match(string)
result = re.match(pattern, string)

Suggestion : 6

Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "="; @ matches "@".,The @ matches itself. In regex, all characters other than those having special meanings matches itself, e.g., a matches a, b matches b, and etc.,Character: All characters, except those having special meaning in regex, matches themselves. E.g., the regex x matches substring "x"; regex 9 matches "9"; regex = matches "="; and regex @ matches "@".,The characters listed above have special meanings in regex. To match these characters, we need to prepend it with a backslash (\), known as escape sequence.  For examples, \+ matches "+"; \[ matches "["; and \. matches ".".

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class TestRegexNumbers {
   public static void main(String[] args) {

      String inputStr = "abc00123xyz456_0"; // Input String for matching
      String regexStr = "[0-9]+"; // Regex to be matched

      // Step 1: Compile a regex via static method Pattern.compile(), default is case-sensitive
      Pattern pattern = Pattern.compile(regexStr);
      // Pattern.compile(regex, Pattern.CASE_INSENSITIVE);  // for case-insensitive matching

      // Step 2: Allocate a matching engine from the compiled regex pattern,
      //         and bind to the input string
      Matcher matcher = pattern.matcher(inputStr);

      // Step 3: Perform matching and Process the matching results
      // Try Matcher.find(), which finds the next match
      while (matcher.find()) {
         System.out.println("find() found substring \"" + matcher.group() +
            "\" starting at index " + matcher.start() +
            " and ending at index " + matcher.end());
      }

      // Try Matcher.matches(), which tries to match the ENTIRE input (^...$)
      if (matcher.matches()) {
         System.out.println("matches() found substring \"" + matcher.group() +
            "\" starting at index " + matcher.start() +
            " and ending at index " + matcher.end());
      } else {
         System.out.println("matches() found nothing");
      }

      // Try Matcher.lookingAt(), which tries to match from the START of the input (^...)
      if (matcher.lookingAt()) {
         System.out.println("lookingAt() found substring \"" + matcher.group() +
            "\" starting at index " + matcher.start() +
            " and ending at index " + matcher.end());
      } else {
         System.out.println("lookingAt() found nothing");
      }

      // Try Matcher.replaceFirst(), which replaces the first match
      String replacementStr = "**";
      String outputStr = matcher.replaceFirst(replacementStr); // first match only
      System.out.println(outputStr);

      // Try Matcher.replaceAll(), which replaces all matches
      replacementStr = "++";
      outputStr = matcher.replaceAll(replacementStr); // all matches
      System.out.println(outputStr);
   }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/usr/bin/env perl

use strict;
use warnings;

my $inStr = 'abc00123xyz456_0';
# input string
my $regex = '[0-9]+';
# regex pattern string in non - interpolating string

# Try match / regex / modifiers(or m / regex / modifiers)
my @matches = ($inStr = ~/$regex/g);
# Match $inStr with regex with global modifier
# Store all matches in an array
print "@matches\n";
# Output: 00123 456 0

while ($inStr = ~/$regex/g) {
   # The built - in array variables @ - and @ + keep the start and end positions
   # of the matches, where $ - [0] and $ + [0] is the full match, and
   # $ - [n] and $ + [n]
   for back references $1, $2, etc.
   print substr($inStr, $ - [0], $ + [0] - $ - [0]), ', ';
   # Output: 00123, 456, 0,
}
print "\n";

# Try substitute s / regex / replacement / modifiers
$inStr = ~s / $regex /**/ g;
# with global modifier
print "$inStr\n";
# Output: abc ** xyz ** _ **
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
<!DOCTYPE html>
<!-- JSRegexNumbers.html -->
<html lang="en">
<head>
<meta charset="utf-8">
<title>JavaScript Example: Regex</title>
<script>
var inStr = "abc123xyz456_7_00";

// Use RegExp.test(inStr) to check if inStr contains the pattern
console.log(/[0-9]+/.test(inStr));  // true

// Use String.search(regex) to check if the string contains the pattern
// Returns the start position of the matched substring or -1 if there is no match
console.log(inStr.search(/[0-9]+/));  // 3

// Use String.match() or RegExp.exec() to find the matched substring,
//   back references, and string index
console.log(inStr.match(/[0-9]+/));  // ["123", input:"abc123xyz456_7_00", index:3, length:"1"]
console.log(/[0-9]+/.exec(inStr));   // ["123", input:"abc123xyz456_7_00", index:3, length:"1"]

// With g (global) option
console.log(inStr.match(/[0-9]+/g));  // ["123", "456", "7", "00", length:4]

// RegExp.exec() with g flag can be issued repeatedly.
// Search resumes after the last-found position (maintained in property RegExp.lastIndex).
var pattern = /[0-9]+/g;
var result;
while (result = pattern.exec(inStr)) {
   console.log(result);
   console.log(pattern.lastIndex);
      // ["123"],  6
      // ["456"], 12
      // ["7"],   14
      // ["00"],  17
}

// String.replace(regex, replacement):
console.log(inStr.replace(/\d+/, "**"));   // abc**xyz456_7_00
console.log(inStr.replace(/\d+/g, "**"));  // abc**xyz**_**_**
</script>
</head>
<body>
  <h1>Hello,</h1>
</body>
</html>

Suggestion : 7

In the below example, we take a pattern as r'[0-9]’ and an empty string as a replacement string. This pattern matches with all the numbers in the given string and the sub() function replaces all the matched digits with an empty string. It then deletes all the matched numbers.,The below example creates a translation table and replaces characters in string based on this table, so it will delete all numbers from the string,The below example skips all numbers from the string while iterating and joins all remaining characters to print a new string.,This example uses the filter() and lambda in the generating expression. It filters or deletes all the numbers from the given string and joins the remaining characters of the string to create a new string.

The string is a type in python language just like integer, float, boolean, etc. Data surrounded by single quotes or double quotes are said to be a string. A string is also known as a sequence of characters.

string1 = "apple"
string2 = "Preeti125"
string3 = "12345"
string4 = "pre@12"

In the below example, we take a pattern as r'[0-9]’ and an empty string as a replacement string. This pattern matches with all the numbers in the given string and the sub() function replaces all the matched digits with an empty string. It then deletes all the matched numbers.

#regex module
import re

#original string
string1 = "Hello!James12,India2020"

pattern = r '[0-9]'

# Match all digits in the string and replace them with an empty string
new_string = re.sub(pattern, '', string1)

print(new_string)

The below example skips all numbers from the string while iterating and joins all remaining characters to print a new string.

string1 = "Hello!James12,India2020"

#iterating over each element
new_string = ''.join((x
   for x in string1
   if not x.isdigit()))

print(new_string)

The filter() function uses the original string and lambda expression as its arguments. First, we filtered all digit characters from a string and then joined all the remaining characters.

#original string
string1 = "Hello!James12,India2020"

#Filters all digits
new_string = ''.join(filter(lambda x: not x.isdigit(), string1))

print(new_string)