java - Avoiding indexing HTML tags as search keywords -


i'm indexing keywords in html document, don't want index html tags.

for example:

<div>  <!-- html code --> <span>you welcome</span>  <!-- simple message searching --> <div> <h1>testing text</h1>  <!-- second message --> </div> </div> 

expected keywords:

keywords:you 

how can avoid html tags becoming keywords?

i think need parse html , extract inner text of each tag.


Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -