multiple matches using file's startswith

  • Last Update :
  • Techknowledgy :

You can use glob to find all your files:

from glob
import glob
path = "path_to/"
files = glob(path + "[1,5,8]*")

You don't need a regex here. I suggest the usage of plain str.startswith with a tuple of accepted prefixes (tuple prefix accepted since Python 2.5) while iterating over your files. Here's a small demo:

>>> start_list = ('1', '6', '8') >>>
   file_list = ['1001_filename', '1004_filename', '0000_filename'] >>>
   for filename in file_list:
   ...
   if filename.startswith(start_list):
   ...print(filename)
   ...
   1001_ filename
1004_ filename

You can use list comprehension based on timgeb's answer.

start_list = ['1', '6', '8']
file_list = ['1001_filename', '1004_filename', '0000_filename']
c = [filename
   for filename in file_list
   if any(filename.startswith(start) for start in start_list)
]

For the record I agree that this isn't a regex necessary question, but I do love regex so here's how to do it with regex

from re
import findall, escape

start_list = ['1', '6', '8']
file_list = ['1001_filename', '1004_filename', '0000_filename']

print findall(r '^(%s)' % escape('|'.join(start_list)), file_list)

Suggestion : 2

The startswith() method returns True if a string starts with the specified prefix(string). If not, it returns False.,If the string starts with any item of the tuple, startswith() returns True. If not, it returns False,It returns True if the string starts with the specified prefix.,It returns False if the string doesn't start with the specified prefix.

Example

message = 'Python is fun'

# check
if the message starts with Python
print(message.startswith('Python'))

# Output: True

Example

message = 'Python is fun'

# check
if the message starts with Python
print(message.startswith('Python'))

# Output: True

The syntax of startswith() is:

str.startswith(prefix[, start[, end]])

Example 1: startswith() Without start and end Parameters

text = "Python is easy to learn."

result = text.startswith('is easy')
# returns False
print(result)

result = text.startswith('Python is ')
# returns True
print(result)

result = text.startswith('Python is easy to learn.')
# returns True
print(result)

Example 2: startswith() With start and end Parameters

text = "Python programming is easy."

# start parameter: 7
# 'programming is easy.'
string is searched
result = text.startswith('programming is', 7)
print(result)

# start: 7, end: 18
# 'programming'
string is searched
result = text.startswith('programming is', 7, 18)
print(result)

result = text.startswith('program', 7, 18)
print(result)

Example 3: startswith() With Tuple Prefix

text = "programming is easy"
result = text.startswith(('python', 'programming'))

# prints True
print(result)

result = text.startswith(('is', 'easy', 'java'))

# prints False
print(result)

# With start and end parameter
# 'is easy'
string is checked
result = text.startswith(('programming', 'easy'), 12, 19)

# prints False
print(result)

Suggestion : 3

Determines if entries of x start or end with string (entries of) prefix or suffix respectively, where strings are recycled to common lengths. ,A logical vector, of “common length” of x and prefix (or suffix), i.e., of the longer of the two lengths unless one of them is zero when the result is also of zero length. A shorter input is recycled to the output length. ,The code has an optimized branch for the most common usage in which prefix or suffix is of length one, and is further optimized in a UTF-8 or 8-byte locale if that is an ASCII string. ,where prefix is not to contain special regular expression characters (and for grepl, x does not contain missing values, see below).

Usage

startsWith(x, prefix)
endsWith(x, suffix)

startsWith() is equivalent to but much faster than

  substring(x, 1, nchar(prefix)) == prefix

or also

  grepl("^<prefix>", x)

Suggestion : 4

Determine if strings start with pattern,Create a character vector that contains the name of a file. Determine if the name starts with different substrings.,Create a string array that contains file names. Determine which file names start with either abstract or data.,If pat is an array containing multiple patterns, then startsWith returns 1 if it finds that str starts with any element of pat.

str = ["abstract.docx", "data.tar", "code.m";...
   "data-analysis.ppt", "results.ptx", "summary.ppt"
]
str = 2 x3 string "abstract.docx"
"data.tar"
"code.m"
"data-analysis.ppt"
"results.ptx"
"summary.ppt"
pat = "data";
TF = startsWith(str, pat)
TF = 2 x3 logical array

0 1 0
1 0 0
str(TF)
ans = 2 x1 string "data-analysis.ppt"
"data.tar"

Suggestion : 5

Find in Files tab: Allows you to search and replace in multiple files with one action. The files used for the operation are specified by a directory. It can be invoked directly with Search > Find in Files or the keyboard shortcut Ctrl+Shift+F.,There are multiple methods to search (and replace) text in files. You can also mark search results with a bookmark on their lines, or highlight the textual results themselves. Generating a count of matches is also possible.,Find All in Current Document: Lists all the search-results in a new Search results window; only searches the active document buffer,Find All in All Opened Documents: Lists all the search-results in a new Search results window; searches through all the file buffers currently open in Notepad++

When regex “.*” is run against the text “abc”x :

“
matches“
   .*matches abc” x” cannot match $(End of line) => Backtracking

“ matches“
   .*matches abc”” cannot match letter x => Backtracking

“ matches“
   .*matches abc” matches” => 1 overall match“ abc”

When regex “.*+”, with a possessive quantifier, is run against the text “abc”x :

“
matches“
   .*+matches abc” x(catches all remaining characters)” cannot match $(End of line)

For example, you get the following subexpression counter values:

# before-- -- -- -- -- -- -- - branch - reset-- -- -- -- -- - after /
   ( ? x)(a)( ? | x(y) z | (p(q) r) | (t) u(v))(z)
# 1 2 2 3 2 3 4

Hit Replace All

[Final Data]
AS, AF, AFG, 004, Afghanistan
EU, AX, ALA, 248, Åland Islands
EU, AL, ALB, 008, Albania, People 's Socialist Republic of
AF, DZ, DZA, 012, Algeria, People 's Democratic Republic of
OC, AS, ASM, 016, American Samoa
EU, AD, AND, 020, Andorra, Principality of
   AF, AO, AGO, 024, Angola, Republic of
   NA, AI, AIA, 660, Anguilla
AN, AQ, ATA, 010, Antarctica(the territory South of 60 deg S)
NA, AG, ATG, 028, Antigua and Barbuda
SA, AR, ARG, 032, Argentina, Argentine Republic
AS, AM, ARM, 051, Armenia
NA, AW, ABW, 533, Aruba
OC, AU, AUS, 036, Australia, Commonwealth of

This Bb block can be represented by the symbolic regex, below ( Blank chars are ignored, for readability ) :

Sp(Ac + | R0) * Ep