regex statement to check for three capital letters

  • Last Update :
  • Techknowledgy :

Regex

( ? < ![A - Z])[A - Z] {
   3
}( ? ![A - Z])

Using re.match:

>>>
import re
   >>>
   p = re.compile(r '(?<![A-Z])[A-Z]{3}(?![A-Z])') >>>
   s = ''
'ABC
...aABC
   ...abcABCabcABCDabcABCDEDEDEDa
   ...ABCDE ''
' >>>
result = p.match(s) >>>
   result.group()
'ABC'

Using re.search:

>>>
import re
   >>>
   p = re.compile(r '(?<![A-Z])[A-Z]{3}(?![A-Z])') >>>
   s = 'ABcABCde' >>>
   p.search(s).group()
'ABC'

Try this:

( ^ | [ ^ A - Z])[A - Z] {
   3
}([ ^ A - Z] | $)

Tested in ruby, here is what we have:

regexp = /(^|[^A-Z])[A-Z]{3}([^A-Z]|$)/
'ABC'.match(regexp) # returns a match 'aABC'.match(regexp) # returns a match 'abcABC'.match(regexp) # returns a match 'AaBbBc'.match(regexp) # returns nil 'ABCDE'.match(regexp) # returns nil

You have to keep in mind that when you're using regexes, they will try as much as they can to get a match (that is also one of the biggest weakness of regex and this is what often causes catastrophic backtracking). What this implies is that in your current regex:

[ ^ A - Z] * [A - Z] {
   3
} [ ^ A - Z] *

[A-Z]{3} is matching 3 uppercase letters, and both [^A-Z]* are matching nothing (or empty strings). You can see how by using capture groups:

import re
theString = "ABCDE"
pattern = re.compile(r "([^A-Z]*)([A-Z]{3})([^A-Z]*)")
result = pattern.search(theString)

if result:
   print("Matched string: {" + result.group(0) + "}")
print("Sub match 1: {" + result.group(1) + "} 2. {" + result.group(2) + "} 3. {" + result.group(3) + "}")
else:
   print("No match")

Prints:

Matched string: {
   ABC
}
Sub match 1: {}
2. {
   ABC
}
3. {}

It will match a string containing three consecutive uppercase letters when there is no more uppercase letters around it (the |^ means OR at the beginning and |$ means OR at the end). If you use that regex in the little script above, you will not get any match in ABCDE which is what you wanted. If you use it on the string abcABC, you get:

import re
theString = "abcABC"
pattern = re.compile(r "([^A-Z]|^)([A-Z]{3})([^A-Z]|$)")
result = pattern.search(theString)

if result:
   print("Matched string: {" + result.group(0) + "}")
print("Sub match 1: {" + result.group(1) + "} 2. {" + result.group(2) + "} 3. {" + result.group(3) + "}")

Actually, if you turn some capture groups into non-capture groups...

( ? : [ ^ A - Z] | ^ )([A - Z] {
   3
})( ? : [ ^ A - Z] | $)

You can use this:

    '^(?:.*[^A-Z])?[A-Z]{3}(?:[^A-Z].*)?$'

Your regex has all explanation that what you are doing wrong

'[^A-Z]*[A-Z]{3}[^A-Z]*'

Example in Perl

#!/usr/bin/perl

use strict;
use warnings;

my @arr = qw(AaBsCc abCDE ABCDE AbcDE abCDE ABC aABC abcABC);

foreach my $string(@arr) {
   if ($string = ~m / [A - Z] {
         3
      }
      /){
      print "Matched $string\n";
   }
   else {
      print "Didn't match $string \n";
   }
}

Output:

Didn 't match AaBsCc
Matched abCDE
Matched ABCDE
Didn 't match AbcDE
Matched abCDE
Matched ABC
Matched aABC
Matched abcABC

Suggestion : 2

Followed by "a non-capital letter" or the end of my string.,It doesn't matter how many non-capital anycodings_regex letters precede or follow your 3 capital anycodings_regex letters, as long as there is at least 1. anycodings_regex So, you just need to look for 1.,Either "a non-capital letter" or the start of my string.,Followed by "exactly 3 capital letters."

At the moment this is my statement:

'[^A-Z]*[A-Z]{3}[^A-Z]*'

Regex

( ? < ![A - Z])[A - Z] {
   3
}( ? ![A - Z])

Using re.match:

>>>
import re
   >>>
   p = re.compile(r '(?<![A-Z])[A-Z]{3}(?![A-Z])') >>>
   s = ''
'ABC
...aABC
   ...abcABCabcABCDabcABCDEDEDEDa
   ...ABCDE ''
' >>>
result = p.match(s) >>>
   result.group()
'ABC'

Using re.search:

>>>
import re
   >>>
   p = re.compile(r '(?<![A-Z])[A-Z]{3}(?![A-Z])') >>>
   s = 'ABcABCde' >>>
   p.search(s).group()
'ABC'

Try this:

( ^ | [ ^ A - Z])[A - Z] {
   3
}([ ^ A - Z] | $)

Tested in ruby, here is what we have:

regexp = /(^|[^A-Z])[A-Z]{3}([^A-Z]|$)/
'ABC'.match(regexp) # returns a match 'aABC'.match(regexp) # returns a match 'abcABC'.match(regexp) # returns a match 'AaBbBc'.match(regexp) # returns nil 'ABCDE'.match(regexp) # returns nil

You have to keep in mind that when anycodings_regex you're using regexes, they will try as anycodings_regex much as they can to get a match (that is anycodings_regex also one of the biggest weakness of anycodings_regex regex and this is what often causes anycodings_regex catastrophic backtracking). What this anycodings_regex implies is that in your current regex:

[ ^ A - Z] * [A - Z] {
   3
} [ ^ A - Z] *

[A-Z]{3} is matching 3 uppercase anycodings_regex letters, and both [^A-Z]* are matching anycodings_regex nothing (or empty strings). You can see anycodings_regex how by using capture groups:

import re
theString = "ABCDE"
pattern = re.compile(r "([^A-Z]*)([A-Z]{3})([^A-Z]*)")
result = pattern.search(theString)

if result:
   print("Matched string: {" + result.group(0) + "}")
print("Sub match 1: {" + result.group(1) + "} 2. {" + result.group(2) + "} 3. {" + result.group(3) + "}")
else:
   print("No match")

Prints:

Matched string: {
   ABC
}
Sub match 1: {}
2. {
   ABC
}
3. {}

It will match a string containing three anycodings_regex consecutive uppercase letters when there anycodings_regex is no more uppercase letters around it anycodings_regex (the |^ means OR at the beginning and |$ anycodings_regex means OR at the end). If you use that anycodings_regex regex in the little script above, you anycodings_regex will not get any match in ABCDE which is anycodings_regex what you wanted. If you use it on the anycodings_regex string abcABC, you get:

import re
theString = "abcABC"
pattern = re.compile(r "([^A-Z]|^)([A-Z]{3})([^A-Z]|$)")
result = pattern.search(theString)

if result:
   print("Matched string: {" + result.group(0) + "}")
print("Sub match 1: {" + result.group(1) + "} 2. {" + result.group(2) + "} 3. {" + result.group(3) + "}")

Actually, if you turn some capture anycodings_regex groups into non-capture groups...

( ? : [ ^ A - Z] | ^ )([A - Z] {
   3
})( ? : [ ^ A - Z] | $)

You can use this:

    '^(?:.*[^A-Z])?[A-Z]{3}(?:[^A-Z].*)?$'

Your regex has all explanation that what anycodings_regex you are doing wrong

'[^A-Z]*[A-Z]{3}[^A-Z]*'

Example in Perl

#!/usr/bin/perl

use strict;
use warnings;

my @arr = qw(AaBsCc abCDE ABCDE AbcDE abCDE ABC aABC abcABC);

foreach my $string(@arr) {
   if ($string = ~m / [A - Z] {
         3
      }
      /){
      print "Matched $string\n";
   }
   else {
      print "Didn't match $string \n";
   }
}

Output:

Didn 't match AaBsCc
Matched abCDE
Matched ABCDE
Didn 't match AbcDE
Matched abCDE
Matched ABC
Matched aABC
Matched abcABC

Suggestion : 3

A regular expression is a pattern which is used to match characters in a string. ,[A-Z]{3} Look for three consecutive uppercase letters. , FIELD1 uses a regular expression, which specifies one or more uppercase letters, followed by a space, followed by a single uppercase letter, followed by a space, followed by one or more uppercase letters. The characters "MARY R SMITH", "W A DOE", or "LARRY G W" would match this regular expression.,There are many excellent online resources which explain the syntax rules of regular expressions. The following are examples of some of the most common:

TRIGGER1 = UL(1.00, 3.89), LR(2.52, 4.17), *, REGEX = 'PAGE 1'
TRIGGER2 = UL(1.02, 4.60), LR(2.11, 4.95), 0, REGEX = '[0-9]{5} [a-z]{4}'
FIELD1 = UL(1.44, 0.00), LR(2.75, 0.30), 0, (TRIGGER = 2, BASE = TRIGGER,
   REGEX = '[A-Z]+ [A-Z] [A-Z]+')
INDEX1 = 'Name', FIELD1, (TYPE = GROUP)

Suggestion : 4

Last Updated : 11 Nov, 2021,GATE CS 2021 Syllabus

Yes

Suggestion : 5

Asked May 19, 2014 , Updated March 8, 2018

 

//Converts text to uppercase if fields CSS Class is set to upperCaseMe
$(document).ready(function() {
   $('.upperCaseMe input').focusout(function() {
      $(this).val($(this).val().toUpperCase());
   });
});

Suggestion : 6

As the characters/digits can be anywhere within the string, we require lookaheads. Lookaheads are of zero width meaning they do not consume any string. In simple words the position of checking resets to the original position after each condition of lookahead is met.,The position of checking is being reset to the starting after condition of lookahead is met., A password containing at least 1 uppercase, 1 lowercase, 1 digit, 1 special character and have a length of at least of 10 , A password containing at least 1 uppercase, 1 lowercase, 1 digit, 1 special character and have a length of at least of 10

Assumption :- Considering non-word characters as special

^ ( ? = . {
      10,
   }
   $)( ? = .*[a - z])( ? = .*[A - Z])( ? = .*[0 - 9])( ? = .*\W).*$

Regex Breakdown

^ #Starting of string( ? = . {
      10,
   }
   $) #Check there is at least 10 characters in the string.
#As this is lookahead the position of checking will reset to starting again( ? = .*[a - z]) #Check
if there is at least one lowercase in string.
#As this is lookahead the position of checking will reset to starting again( ? = .*[A - Z]) #Check
if there is at least one uppercase in string.
#As this is lookahead the position of checking will reset to starting again( ? = .*[0 - 9]) #Check
if there is at least one digit in string.
#As this is lookahead the position of checking will reset to starting again( ? = .*\W) #Check
if there is at least one special character in string.
#As this is lookahead the position of checking will reset to starting again
   .*$ #Capture the entire string
if all the condition of lookahead is met.This is not required
if only validation is needed

We can also use the non-greedy version of the above regex

^ ( ? = . {
      10,
   }
   $)( ? = .* ? [a - z])( ? = .* ? [A - Z])( ? = .* ? [0 - 9])( ? = .* ? \W).*$

Suggestion : 7

The \s (lowercase s) matches a whitespace (blank, tab \t, and newline \r or \n). On the other hand, the \S+ (uppercase S) matches anything that is NOT matched by \s, i.e., non-whitespace. In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, \w for word character and \W for non-word character; \d for digit and \D or non-digit.,In regex, the uppercase metacharacter is always the inverse of the lowercase counterpart.,The @ matches itself. In regex, all characters other than those having special meanings matches itself, e.g., a matches a, b matches b, and etc.,\s (space) matches any single whitespace (same as [ \t\n\r\f], blank, tab, newline, carriage-return and form-feed). The uppercase counterpart \S (non-space) matches any single character that doesn't match by \s (same as [^ \t\n\r\f]).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class TestRegexNumbers {
   public static void main(String[] args) {

      String inputStr = "abc00123xyz456_0"; // Input String for matching
      String regexStr = "[0-9]+"; // Regex to be matched

      // Step 1: Compile a regex via static method Pattern.compile(), default is case-sensitive
      Pattern pattern = Pattern.compile(regexStr);
      // Pattern.compile(regex, Pattern.CASE_INSENSITIVE);  // for case-insensitive matching

      // Step 2: Allocate a matching engine from the compiled regex pattern,
      //         and bind to the input string
      Matcher matcher = pattern.matcher(inputStr);

      // Step 3: Perform matching and Process the matching results
      // Try Matcher.find(), which finds the next match
      while (matcher.find()) {
         System.out.println("find() found substring \"" + matcher.group() +
            "\" starting at index " + matcher.start() +
            " and ending at index " + matcher.end());
      }

      // Try Matcher.matches(), which tries to match the ENTIRE input (^...$)
      if (matcher.matches()) {
         System.out.println("matches() found substring \"" + matcher.group() +
            "\" starting at index " + matcher.start() +
            " and ending at index " + matcher.end());
      } else {
         System.out.println("matches() found nothing");
      }

      // Try Matcher.lookingAt(), which tries to match from the START of the input (^...)
      if (matcher.lookingAt()) {
         System.out.println("lookingAt() found substring \"" + matcher.group() +
            "\" starting at index " + matcher.start() +
            " and ending at index " + matcher.end());
      } else {
         System.out.println("lookingAt() found nothing");
      }

      // Try Matcher.replaceFirst(), which replaces the first match
      String replacementStr = "**";
      String outputStr = matcher.replaceFirst(replacementStr); // first match only
      System.out.println(outputStr);

      // Try Matcher.replaceAll(), which replaces all matches
      replacementStr = "++";
      outputStr = matcher.replaceAll(replacementStr); // all matches
      System.out.println(outputStr);
   }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/usr/bin/env perl

use strict;
use warnings;

my $inStr = 'abc00123xyz456_0';
# input string
my $regex = '[0-9]+';
# regex pattern string in non - interpolating string

# Try match / regex / modifiers(or m / regex / modifiers)
my @matches = ($inStr = ~/$regex/g);
# Match $inStr with regex with global modifier
# Store all matches in an array
print "@matches\n";
# Output: 00123 456 0

while ($inStr = ~/$regex/g) {
   # The built - in array variables @ - and @ + keep the start and end positions
   # of the matches, where $ - [0] and $ + [0] is the full match, and
   # $ - [n] and $ + [n]
   for back references $1, $2, etc.
   print substr($inStr, $ - [0], $ + [0] - $ - [0]), ', ';
   # Output: 00123, 456, 0,
}
print "\n";

# Try substitute s / regex / replacement / modifiers
$inStr = ~s / $regex /**/ g;
# with global modifier
print "$inStr\n";
# Output: abc ** xyz ** _ **
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
<!DOCTYPE html>
<!-- JSRegexNumbers.html -->
<html lang="en">
<head>
<meta charset="utf-8">
<title>JavaScript Example: Regex</title>
<script>
var inStr = "abc123xyz456_7_00";

// Use RegExp.test(inStr) to check if inStr contains the pattern
console.log(/[0-9]+/.test(inStr));  // true

// Use String.search(regex) to check if the string contains the pattern
// Returns the start position of the matched substring or -1 if there is no match
console.log(inStr.search(/[0-9]+/));  // 3

// Use String.match() or RegExp.exec() to find the matched substring,
//   back references, and string index
console.log(inStr.match(/[0-9]+/));  // ["123", input:"abc123xyz456_7_00", index:3, length:"1"]
console.log(/[0-9]+/.exec(inStr));   // ["123", input:"abc123xyz456_7_00", index:3, length:"1"]

// With g (global) option
console.log(inStr.match(/[0-9]+/g));  // ["123", "456", "7", "00", length:4]

// RegExp.exec() with g flag can be issued repeatedly.
// Search resumes after the last-found position (maintained in property RegExp.lastIndex).
var pattern = /[0-9]+/g;
var result;
while (result = pattern.exec(inStr)) {
   console.log(result);
   console.log(pattern.lastIndex);
      // ["123"],  6
      // ["456"], 12
      // ["7"],   14
      // ["00"],  17
}

// String.replace(regex, replacement):
console.log(inStr.replace(/\d+/, "**"));   // abc**xyz456_7_00
console.log(inStr.replace(/\d+/g, "**"));  // abc**xyz**_**_**
</script>
</head>
<body>
  <h1>Hello,</h1>
</body>
</html>