performance - Elasticsearch - many small documents vs fewer large documents? -


i'm creating search image system(similar google's reverse image search) cataloging system used internally @ co.. we've been using elasticsearch success our regular search functionality, i'm planning on hashing our images, creating separate index them, , using searching. there many items in system , each item may have multiple images associated it, , item should able find-able reverse image searching of related images.

there 2 possible schema we've thought of:

making document each image, containing hash of image , item id related to. result in ~7m documents, small since contain single hash , id.

making document each item, , storing hashes of images associated in array on document. result in around ~100k documents, each document large, items have hundreds of images associated them.

which of these schema more performant?

having attended recent under hood talk alexander reelsen, "it depends" , "benchmark it".

as @science_fiction hinted:

  1. are images updated? come @ negative cost factor
  2. otoh, overhead 7m documents maybe shouldn't neglected whereas in second scenario not_analyzed terms in field.

if 1. low factor start second approach first.


Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -