parsing - Processing input before giving input to parser -


what kind of processing should done input given parser.

as of know using stanford parser.jar there stanford corenlp.jar difference between parser.jar , corenlp.jar parsing method

as per corenlp documentation can pass operation want input in annotators

command:

java -cp "*" -xmx2g edu.stanford.nlp.pipeline.stanfordcorenlp -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file input.txt 

to use parsing in corenlp can pass parse or should pass annotators except dcoref

i.e.)

java -cp "*" -xmx2g edu.stanford.nlp.pipeline.stanfordcorenlp -annotators tokenize,ssplit,parse -file input.txt                                       or java -cp "*" -xmx2g edu.stanford.nlp.pipeline.stanfordcorenlp -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file input.txt 

does parser.jar has sentence splitting in built in it's jar

can give paragraph input , sentence , parsed data out

or should give 1 sentence @ time
thank you,

the corenlp annotators can thought of dependency graph. parser annotator depends on tokenization (tokenize) , sentence splitting (ssplit) only. so, run parser first command:

java -cp "*" -xmx2g edu.stanford.nlp.pipeline.stanfordcorenlp -annotators tokenize,ssplit,parse -file input.txt 

if know text pre-tokenized, easiest thing set options tokenize.whitespace = "true" in properties file (or pass in flag: -tokenize.whitespace). sentence split @ end of line, can set option (ssplit.eolonly).

but, default, yes corenlp tokenize , split sentence you. can feed in pile of text, , output parsed sentences.


Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -