how to efficiently remove consecutive duplicate words or phrases in a string [duplicate]

  • Last Update :
  • Techknowledgy :

Last Updated : 20 Jul, 2022

Examples: 

Input: aaaaabbbbbb
Output: ab

Input: geeksforgeeks
Output: geksforgeks

Input: aabccba
Output: abcba

The recursion tree for the string S = aabcca is shown below.  

        aabcca S = aabcca /
           abcca S = abcca /
           bcca S = abcca /
           cca S = abcca /
           ca S = abca /
           a S = abca(Output String) /
           empty string

geksforgeks
abca

geksforgeks
abca

Suggestion : 2

I'd go for this creative method of looking for duplicates of growing length:

input = "what type of people were most likely to be able to be able to be able to be able to be 1.35 ?"
def combine_words(input, length):
   combined_inputs = []
if len(splitted_input) > 1:
   for i in range(len(input) - 1):
   combined_inputs.append(input[i] + " " + last_word_of(splitted_input[i + 1], length)) #add the last word of the right - neighbour(overlapping) sequence(before it has expanded), which is the next word in the original sentence
return combined_inputs, length + 1

def remove_duplicates(input, length):
   bool_broke = False #this means we didn 't find any duplicates here
for i in range(len(input) - length):
   if input[i] == input[i + length]: #found a duplicate piece of sentence!
   for j in range(0, length): #remove the overlapping sequences in reverse order
del input[i + length - j]
bool_broke = True
break #break the
for loop as the loop length does not matches the length of splitted_input anymore as we removed elements
if bool_broke:
   return remove_duplicates(input, length) #if we found a duplicate, look
for another duplicate of the same length
return input

def last_word_of(input, length):
   splitted = input.split(" ")
if len(splitted) == 0:
   return input
else:
   return splitted[length - 1]

#make a list of strings which represent every sequence of word_length adjacent words
splitted_input = input.split(" ")
word_length = 1
splitted_input, word_length = combine_words(splitted_input, word_length)

intermediate_output = False

while len(splitted_input) > 1:
   splitted_input = remove_duplicates(splitted_input, word_length) #look whether two sequences of length n(with distance n apart) are equal.If so, remove the n overlapping sequences
splitted_input, word_length = combine_words(splitted_input, word_length) #make even bigger sequences
if intermediate_output:
   print(splitted_input)
print(word_length)
output = splitted_input[0] #In the end you have a list of length 1, with all possible lengths of repetitive words removed

which outputs a fluent

what type of people were most likely to be able to be 1.35 ?

I am pretty sure in this approach the order is maintained in Python 3.7, I am not exactly sure of older versions.

String = "what type of people were most likely to be able to be able to be able to be able to be 1.35 ?"
unique_words = dict.fromkeys(String.split())
print(' '.join(unique_words)) >>>
   what type of people were most likely to be able 1.35 ?

Suggestion : 3

I'd go for this creative method of anycodings_python looking for duplicates of growing anycodings_python length:,Tried various methods but unable to find a anycodings_python better approach that is time and anycodings_python space-efficient.,I have a string which has reoccurring anycodings_python phrases or it might even be a single word anycodings_python which is occurring multiple times anycodings_python continuously.,Even though it is not the desired ouput, anycodings_python I don't see how it would recognize to anycodings_python remove "to be" (of length 2) which anycodings_python occured 3 places earlier.

  1. groupby()
  2. re
String = "what type of people were most likely to be able to be able to be able to be able to be 1.35 ?"
s1 = " ".join([k
   for k, v in groupby(String.replace("</Sent>", "").split())
])
s2 = re.sub(r '\b(.+)(\s+\1\b)+', r '\1', String)

I'd go for this creative method of anycodings_python looking for duplicates of growing anycodings_python length:

input = "what type of people were most likely to be able to be able to be able to be able to be 1.35 ?"
def combine_words(input, length):
   combined_inputs = []
if len(splitted_input) > 1:
   for i in range(len(input) - 1):
   combined_inputs.append(input[i] + " " + last_word_of(splitted_input[i + 1], length)) #add the last word of the right - neighbour(overlapping) sequence(before it has expanded), which is the next word in the original sentence
return combined_inputs, length + 1

def remove_duplicates(input, length):
   bool_broke = False #this means we didn 't find any duplicates here
for i in range(len(input) - length):
   if input[i] == input[i + length]: #found a duplicate piece of sentence!
   for j in range(0, length): #remove the overlapping sequences in reverse order
del input[i + length - j]
bool_broke = True
break #break the
for loop as the loop length does not matches the length of splitted_input anymore as we removed elements
if bool_broke:
   return remove_duplicates(input, length) #if we found a duplicate, look
for another duplicate of the same length
return input

def last_word_of(input, length):
   splitted = input.split(" ")
if len(splitted) == 0:
   return input
else:
   return splitted[length - 1]

#make a list of strings which represent every sequence of word_length adjacent words
splitted_input = input.split(" ")
word_length = 1
splitted_input, word_length = combine_words(splitted_input, word_length)

intermediate_output = False

while len(splitted_input) > 1:
   splitted_input = remove_duplicates(splitted_input, word_length) #look whether two sequences of length n(with distance n apart) are equal.If so, remove the n overlapping sequences
splitted_input, word_length = combine_words(splitted_input, word_length) #make even bigger sequences
if intermediate_output:
   print(splitted_input)
print(word_length)
output = splitted_input[0] #In the end you have a list of length 1, with all possible lengths of repetitive words removed

which outputs a fluent

what type of people were most likely to be able to be 1.35 ?

I am pretty sure in this approach the anycodings_python order is maintained in Python 3.7, I am anycodings_python not exactly sure of older versions.

String = "what type of people were most likely to be able to be able to be able to be able to be 1.35 ?"
unique_words = dict.fromkeys(String.split())
print(' '.join(unique_words)) >>>
   what type of people were most likely to be able 1.35 ?

Suggestion : 4

To find the duplicate words from the string, we first split the string into words. We count the occurrence of each word in the string. If count is greater than 1, it implies that a word has duplicate in the string. ,In this program, we need to find out the duplicate words present in the string and display those words.,After the inner loop, if count of a word is greater than 1 which signifies that the word has duplicates in the string.,If a match found, then increment the count by 1 and set the duplicates of word to '0' to avoid counting it again.

 Duplicate words in a given string:
    big
 black
Duplicate words in a given string:
   big
black
Duplicate words in a given string:
   big
Black