regex - python regular expression remove matching brackets file -
i have latex file lot of text marked \red{}
, there may brackets inside \red{}
, \red{here \underline{underlined} text}
. want remove red color , after googling wrote python script:
import os, re, sys #start program in terminal #python redremover.py filename #sys.argv[1] has value filename ifn = sys.argv[1] #open file , read f = open(ifn, "r") c = f.read() #the whole file content stored in string c #remove occurences of \red{...} in c c=re.sub(r'\\red\{(?:[^\}|]*\|)?([^\}|]*)\}', r'\1', c) #write c new file nf=open("redremoved_"+ifn,"w") nf.write(c) f.close() nf.close()
but convert
\red{here \underline{underlined} text}
to
here \underline{underlined text}
which not want. want
here \underline{underlined} text
you can't match undetermined level of nested brackets re module since doesn't support recursion. solve that, can use new regex module:
import regex c = r'\red{here \underline{underlined} text}' c = regex.sub(r'\\red({((?>[^{}]+|(?1))*)})', r'\2', c)
where (?1)
recursive call capture group 1.
Comments
Post a Comment