performance - Elasticsearch - many small documents vs fewer large documents? -
i'm creating search image system(similar google's reverse image search) cataloging system used internally @ co.. we've been using elasticsearch success our regular search functionality, i'm planning on hashing our images, creating separate index them, , using searching. there many items in system , each item may have multiple images associated it, , item should able find-able reverse image searching of related images.
there 2 possible schema we've thought of:
making document each image, containing hash of image , item id related to. result in ~7m documents, small since contain single hash , id.
making document each item, , storing hashes of images associated in array on document. result in around ~100k documents, each document large, items have hundreds of images associated them.
which of these schema more performant?
having attended recent under hood talk alexander reelsen, "it depends" , "benchmark it".
as @science_fiction hinted:
- are images updated? come @ negative cost factor
- otoh, overhead 7m documents maybe shouldn't neglected whereas in second scenario
not_analyzed
terms in field.
if 1. low factor start second approach first.
Comments
Post a Comment