what about this solution in pseudo python:
def classify(journaltext):
prio_list = ["FOO", "BAR", "UPS", ...] # "..."
is a placeholder: you have to give the full list here.
# dictionary:
# - key is the name of the category, must match the name in the above prio_list
# - value is the regex that identifies the category
matchers = {
"FOO": "the regex for FOO",
"BAR": "the regex for BAR",
"UPS": "...",
...
}
for category in prio_list:
if re.match(matchers[category], journaltext):
return category
return "UNKOWN"
# or you can "return None"
Without any kind of extra fluff:
categories = [
('cat1', ['foo']),
('cat2', ['football']),
('cat3', ['abc', 'aba', 'bca'])
]
def classify(text):
for category, matches in categories:
if any(match in text
for match in matches):
return category
return None
Being able to match varying sets of characters is the first thing regular expressions can do that isn’t already possible with the methods available on strings. However, if that was the only additional capability of regexes, they wouldn’t be much of an advance. Another capability is that you can specify that portions of the RE must be repeated a certain number of times.,We’ll start by learning about the simplest possible regular expressions. Since regular expressions are used to operate on strings, we’ll begin with the most common task: matching characters.,Up to this point, we’ve simply performed searches against a static string. Regular expressions are also commonly used to modify strings in various ways, using the following pattern methods:,Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.
. ^ $ * + ? {} []\ | ()
>>>
import re
>>>
p = re.compile('ab*') >>>
p
re.compile('ab*')
>>> p = re.compile('ab*', re.IGNORECASE)
>>>
import re
>>>
p = re.compile('[a-z]+') >>>
p
re.compile('[a-z]+')
>>> p.match("") >>>
print(p.match(""))
None
>>> m = p.match('tempo')
>>> m
<re.Match object; span=(0, 5), match='tempo'>
Last Updated : 10 Mar, 2022
Output:
List: ['GeeksForGeeks', 'VenD', 5, 9.2]