bash - Sed command to parse xml from log -
i have log file has embedded xml in , trying parse out using sed. happening getting desired xml line after desired xml being picked up. here sample file
2015-05-06 04:07:37.386 [info]process:102 - application submitted ==== 1 <application> <name> test </name> </application> 2015-05-06 04:07:39.386 [info] process:103 - application completed ==== 1
the sed command using
sed -n '/<application>/,/<\/application>/p' batchlog.txt >> np.out
as mentioned above, getting desired xml getting line after it. how avoid this?
the second part question is, if use in shell script, efficient way process each xml chunk (there can many instances of "application" blocks in file. idea replace values within xml tags contained in each block , re-write file, maintaining original content new values within tags. part, example follows. within shell script, parsing log file , come across application tags, want mask values of tag has ssn in *. example:
<application><firstname>test<firstname><studentssn>123456789</studentssn><address>123 test street</address><parentssn>123456780</parentssn></application>
now, when script runs against log file, needs within *ssn tag , replace values *. doing following sed command on command line able grab studentssn tag.
sed -n 's:.*<\studentssn>\(.*\)</studentssn.*:\1:p'
but hoping make generic both parentssn , studentssn picked up, replaced , written file old non xml lines , within xml, these new values. modified file this:
2015-05-06 04:07:37.386 [info]process:102 - application submitted ==== 1 <application><firstname>test<firstname><studentssn>*********</studentssn><address>123 test street</address><parentssn>*********</parentssn></application> 2015-05-06 04:07:39.386 [info] process:103 - application completed ==== 1
this might work (gnu sed):
sed -r '/<(application>).*<\/\1/!b;:a;s/(<((student|parent)ssn>)\**)[^*](.*<\/\2)/\1*\4/;ta' file
this restricts sed processing lines contain both start , end application
tag.
each character within student
or parent
ssn
tags not *
replaced *
. accomplished checking successful substitution using ta
command , looping :a
placeholder until no more substitutions occur.
n.b. regexp uses coupious references , references within references (hence need -r
switch make final solution more visibly tolerable).
Comments
Post a Comment