string parsing using python?

  • Last Update :
  • Techknowledgy :
stringlist = '[ "A","B","C" , " D"]'


['[ "A"', '"B"', '"C" ', ' " D"]']
stringlist = '[ "A","B","C" , " D"]'
['A', 'B', 'C', ' D']

One other method to solve our specific problem is the ast module. The ast.literal_eval() function takes a string representation of a Python literal structure like tuples, dictionaries, lists, and sets. If we pass the string into that literal structure, it returns the results. In our case, we have a string representation of a list. So, the ast.literal_eval() function takes this string, parses it into a list, and returns the results. The following code snippet shows us how to parse a string representation of a list into an actual list with the ast.literal_eval() function.

import ast
stringlist = '[ "A","B","C" , " D"]'

Suggestion : 2

Last Updated : 08 Jul, 2022

Output :

['geeks', 'for', 'geeks']
['geeks', ' for', ' geeks']
['geeks', 'for', 'geeks']
['Ca', 'Ba', 'Sa', 'Fa', 'Or']

Suggestion : 3

The split() method takes a maximum of 2 parameters:,Python String rsplit(), Python String rsplit() ,Python String split()


text = 'Python is a fun programming language'

# split the text from space
print(text.split(' '))

# Output: ['Python', 'is', 'a', 'fun', 'programming', 'language']

The syntax of split() is:

str.split(separator, maxsplit)

Example 1: How split() works in Python?

text = 'Love thy neighbor'

# splits at space

grocery = 'Milk, Chicken, Bread'

# splits at ','
print(grocery.split(', '))

# Splits at ':'

Example 2: How split() works when maxsplit is specified?

grocery = 'Milk, Chicken, Bread, Butter'

# maxsplit: 2
print(grocery.split(', ', 2))

# maxsplit: 1
print(grocery.split(', ', 1))

# maxsplit: 5
print(grocery.split(', ', 5))

# maxsplit: 0
print(grocery.split(', ', 0))

Suggestion : 4

Parse strings using a specification based on the Python format() syntax.,The module is set up to only export parse(), search(), findall(), and with_pattern() when import \* is used:,The conversion of fields to types other than strings is done based on the type in the format specification, which mirrors the format() behaviour. There are no “!” field conversions like format() has.,A basic version of the Format String Syntax is supported with anonymous (fixed-position), named and formatted fields:

The module is set up to only export parse(), search(), findall(), and with_pattern() when import \* is used:

>>> from parse
import *

From there it’s a simple thing to parse a string:

>>> parse("It's {}, I love it!", "It's spam, I love it!")
<Result ('spam',) {}>
>>> _[0]

Or to search a string for some pattern:

>>> search('Age: {:d}\n', 'Name: Rufus\nAge: 42\nColor: red\n')
<Result (42,) {}>

If you’re going to use the same pattern to match lots of strings you can compile it once:

>>> from parse import compile
>>> p = compile("It's {}, I love it!")
>>> print(p)
<Parser "It's {}, I love it!">
>>> p.parse("It's spam, I love it!")
<Result ('spam',) {}>

The default behaviour is to match strings case insensitively. You may match with case by specifying case_sensitive=True:

>>> parse('SPAM', 'spam', case_sensitive = True) is None

A basic version of the Format String Syntax is supported with anonymous (fixed-position), named and formatted fields:

   [field name]: [format spec]

Some simple parse() format string examples:

>>> parse("Bring me a {}", "Bring me a shrubbery")
<Result ('shrubbery',) {}>
>>> r = parse("The {} who {} {}", "The knights who say Ni!")
>>> print(r)
<Result ('knights', 'say', 'Ni!') {}>
>>> print(r.fixed)
('knights', 'say', 'Ni!')
>>> print(r[0])
>>> print(r[1:])
('say', 'Ni!')
>>> r = parse("Bring out the holy {item}", "Bring out the holy hand grenade")
>>> print(r)
<Result () {'item': 'hand grenade'}>
>>> print(r.named)
{'item': 'hand grenade'}
>>> print(r['item'])
hand grenade
>>> 'item' in r

Dotted names and indexes are possible with some limits. Only word identifiers are supported (ie. no numeric indexes) and the application must make additional sense of the result:

>>> r = parse("Mmm, {food.type}, I love it!", "Mmm, spam, I love it!")
>>> print(r)
<Result () {'food.type': 'spam'}>
>>> print(r.named)
{'food.type': 'spam'}
>>> print(r['food.type'])
>>> r = parse("My quest is {quest[name]}", "My quest is to seek the holy grail!")
>>> print(r)
<Result () {'quest': {'name': 'to seek the holy grail!'}}>
>>> print(r['quest'])
{'name': 'to seek the holy grail!'}
>>> print(r['quest']['name'])
to seek the holy grail!

Some examples of typed parsing with None returned if the typing does not match:

>>> parse('Our {:d} {:w} are...', 'Our 3 weapons are...')
<Result (3, 'weapons') {}>
>>> parse('Our {:d} {:w} are...', 'Our three weapons are...')
>>> parse('Meet at {:tg}', 'Meet at 1/2/2011 11:00 PM')
<Result (datetime.datetime(2011, 2, 1, 23, 0),) {}>

And messing about with alignment:

>>> parse('with {:>} herring', 'with     a herring')
<Result ('a',) {}>
>>> parse('spam {:^} spam', 'spam    lovely     spam')
<Result ('lovely',) {}>

Width and precision may be used to restrict the size of matched text from the input. Width specifies a minimum size and precision specifies a maximum. For example:

>>> parse('{:.2}{:.2}', 'look')           # specifying precision
<Result ('lo', 'ok') {}>
>>> parse('{:4}{:4}', 'look at that')     # specifying width
<Result ('look', 'at that') {}>
>>> parse('{:4}{:.4}', 'look at that')    # specifying both
<Result ('look at ', 'that') {}>
>>> parse('{:2d}{:2d}', '0440')           # parsing two contiguous numbers
<Result (4, 40) {}>

Your custom type conversions may override the builtin types if you supply one with the same identifier:

>>> def shouty(string):
...    return string.upper()
>>> parse('{:shouty} world', 'hello world', dict(shouty=shouty))
<Result ('HELLO',) {}>

If the type converter has the optional pattern attribute, it is used as regular expression for better pattern matching (instead of the default one):

>>> def parse_number(text):
...    return int(text)
>>> parse_number.pattern = r'\d+'
>>> parse('Answer: {number:Number}', 'Answer: 42', dict(Number=parse_number))
<Result () {'number': 42}>
>>> _ = parse('Answer: {:Number}', 'Answer: Alice', dict(Number=parse_number))
>>> assert _ is None, "MISMATCH"

You can also use the with_pattern(pattern) decorator to add this information to a type converter function:

>>> from parse import with_pattern
>>> @with_pattern(r'\d+')
... def parse_number(text):
...    return int(text)
>>> parse('Answer: {number:Number}', 'Answer: 42', dict(Number=parse_number))
<Result () {'number': 42}>

If the type converter pattern uses regex-grouping (with parenthesis), you should indicate this by using the optional regex_group_count parameter in the with_pattern() decorator:

>>> @with_pattern(r'((\d+))', regex_group_count=2)
... def parse_number2(text):
...    return int(text)
>>> parse('Answer: {:Number2} {:Number2}', 'Answer: 42 43', dict(Number2=parse_number2))
<Result (42, 43) {}>

parse() will always match the shortest text necessary (from left to right) to fulfil the parse pattern, so for example:

>>> pattern = '{dir1}/{dir2}' >>>
   data = 'root/parent/subdir' >>>
   sorted(parse(pattern, data).named.items())[('dir1', 'root'), ('dir2', 'parent/subdir')]