python - re.sub() not working as I expect -

i have string given below.

appcodename: mozilla<br>appversion: 5.0 (x11; linux x86_64) applewebkit/537.36 (khtml, gecko) ubuntu chromium/41.0.2272.76 chrome/41.0.2272.76 safari/537.36<br>

i want extract mozilla above string. use following python program.

import re import json  open('data.txt','rb') f:                                                                                                                                                                    data = json.load(f)     message = data['message']     appcodename = re.sub('.+appcodename: ([^<br>])(.*)',r'\1',message,1)     print('appcode name {}'.format(appcodename))

the output

appcode name m

what wrong regex.

the problem regex twofold:

you using negated class [^ ] matches character except <, b, r , > (their order irrelevant). not cause problem particular case, not advised use negated class prevent matches specific sequence of characters.
you want ([^ ]) can match 1 character match mozilla several characters long.

quick & dirty fix:

appcodename = re.sub('.*appcodename: ([^<br>]+)(.*)',r'\1',message,1)

.* allows matches if string begins appcodename , ([^ ]+) allows matching of more 1 character.

as mentioned above, negated character class not advised. thus, next step make above better:

appcodename = re.sub(r'.*appcodename: ((?:(?!<br>).)+).*',r'\1',message,1)

(?:(?! ).)+ bit slow (this uses negative lookahead (?! ... )), match number of characters long   not within characters. checking each character, , each time, makes sure there no   @ character before attempting match it. next, rawing regex string advised avoid unexpected behaviours.

finally, replacing before , after not practical; matching make things simpler:

appcodename = re.search(r'appcodename: ((?:(?!<br>).)+)', message).group(1)

at point, might use instead, not use negative lookahead , simpler read believe:

appcodename = re.search(r'appcodename: (.+?)<br>', message).group(1)

Search This Blog

Politics

python - re.sub() not working as I expect -

Comments

Post a Comment

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -