machine learning - difference between Latent and Explicit Semantic Analysis -
i'm trying analyse paper ''computing semantic relatedness using wikipedia-based explicit semantic analysis''.
one component of system described therein i'm grappling difference between latent , explicit semantic analysis.
i've been writing document encapsulate understanding it's somewhat, "cobbled together", sources don't 100% understand, i'd know if i've come accurate, here is:
when implementing process singular value decomposition (svd) or markov chain monte carlo machines, corpus of documents can partitioned on basis of inherent characteristics , assigned categories applying different weights features constitute each singular data index. in highdimensional space difficult determine combination of factors leading outcome or result, variables of interest “hidden” or latent. defining set of humanly intelligible categories, i.e. wikipedia article pages basis comparison [gabrilovich et al. 2007] have devised system whereby criteria used distinguish datum readily comprehensible, text note “semantic analysis explicit in sense manipulate manifest concepts grounded in human cognition, rather ‘latent concepts’ used latent semantic analysis”. have established explicit semantic analysis in opposition latent semantic analysis.
is accurate?
info on topic sparse. this question ostensibly deals similar issue, though not really.
the difference between latent semantic analysis , so-called explicit semantic analysis lies in corpus used , in dimensions of vectors model word meaning.
latent semantic analysis starts document-based word vectors, capture association between each word , documents in appears, typically weighting function such tf-idf. reduces dimensionality of these word vectors (generally) 300, using singular value decomposition. in contrast original dimensions (which corresponded documents), these 300 new dimensions have no straightforward interpretation. therefore called "latent". lsa can used classify texts combining vectors of words in text.
from paper mention, understand explicit semantic analysis document-based model well: models words in terms of wikipedia articles in appear. differs latent semantic analysis, however, in (a) corpus (wikipedia) cannot chosen freely , (b) there no dimensionality reduction involved. again, vectors of words in text can combined classify or otherwise interpret text.
Comments
Post a Comment