python - re.sub() not working as I expect -


i have string given below.

appcodename: mozilla<br>appversion: 5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, gecko) ubuntu chromium/41.0.2272.76 chrome/41.0.2272.76 safari/537.36<br> 

i want extract mozilla above string. use following python program.

import re import json  open('data.txt','rb') f:                                                                                                                                                                    data = json.load(f)     message = data['message']     appcodename = re.sub('.+appcodename: ([^<br>])(.*)',r'\1',message,1)     print('appcode name {}'.format(appcodename)) 

the output

appcode name m 

what wrong regex.

the problem regex twofold:

  1. you using negated class [^<br>] matches character except <, b, r , > (their order irrelevant). not cause problem particular case, not advised use negated class prevent matches specific sequence of characters.

  2. you want ([^<br>]) can match 1 character match mozilla several characters long.

quick & dirty fix:

appcodename = re.sub('.*appcodename: ([^<br>]+)(.*)',r'\1',message,1) 

.* allows matches if string begins appcodename , ([^<br>]+) allows matching of more 1 character.

as mentioned above, negated character class not advised. thus, next step make above better:

appcodename = re.sub(r'.*appcodename: ((?:(?!<br>).)+).*',r'\1',message,1) 

(?:(?!<br>).)+ bit slow (this uses negative lookahead (?! ... )), match number of characters long <br> not within characters. checking each character, , each time, makes sure there no <br> @ character before attempting match it. next, rawing regex string advised avoid unexpected behaviours.

finally, replacing before , after not practical; matching make things simpler:

appcodename = re.search(r'appcodename: ((?:(?!<br>).)+)', message).group(1) 

at point, might use instead, not use negative lookahead , simpler read believe:

appcodename = re.search(r'appcodename: (.+?)<br>', message).group(1) 

Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -