parsing - Processing input before giving input to parser -
what kind of processing should done input given parser.
as of know using stanford parser.jar there stanford corenlp.jar difference between parser.jar , corenlp.jar parsing method
as per corenlp documentation can pass operation want input in annotators
command:
java -cp "*" -xmx2g edu.stanford.nlp.pipeline.stanfordcorenlp -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file input.txt
to use parsing in corenlp can pass parse or should pass annotators except dcoref
i.e.)
java -cp "*" -xmx2g edu.stanford.nlp.pipeline.stanfordcorenlp -annotators tokenize,ssplit,parse -file input.txt or java -cp "*" -xmx2g edu.stanford.nlp.pipeline.stanfordcorenlp -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref -file input.txt
does parser.jar has sentence splitting in built in it's jar
can give paragraph input , sentence , parsed data out
or should give 1 sentence @ time
thank you,
the corenlp annotators can thought of dependency graph. parser annotator depends on tokenization (tokenize
) , sentence splitting (ssplit
) only. so, run parser first command:
java -cp "*" -xmx2g edu.stanford.nlp.pipeline.stanfordcorenlp -annotators tokenize,ssplit,parse -file input.txt
if know text pre-tokenized, easiest thing set options tokenize.whitespace = "true"
in properties file (or pass in flag: -tokenize.whitespace
). sentence split @ end of line, can set option (ssplit.eolonly
).
but, default, yes corenlp tokenize , split sentence you. can feed in pile of text, , output parsed sentences.
Comments
Post a Comment