java - Avoiding indexing HTML tags as search keywords -
i'm indexing keywords in html document, don't want index html tags.
for example:
<div> <!-- html code --> <span>you welcome</span> <!-- simple message searching --> <div> <h1>testing text</h1> <!-- second message --> </div> </div>
expected keywords:
keywords:you
how can avoid html tags becoming keywords?
i think need parse html , extract inner text of each tag.
Comments
Post a Comment