Different block size Hadoop -
what need have smaller/larger blocks in hadoop?
concretely, want have larger number of mappers, gets smaller piece of data work on. seems need decrease block size, i'm confused (i'm new hadoop) - need while putting file on hdfs, or need specify related input split size, or both?
i'm sharing cluster, cannot perform global settings, need on per-job basis, if possible? , i'm running job code (later oozie, possibly).
what mapper runs controlled input split, , how specify it. hdfs block size has nothing (other fact splitters use block size basic 'block' creating input splits in order achieve data locality). can write own splitter takes hdfs block , splits in 100 splits, if fancy. aslo @ change file split size in hadoop.
now being said, wisdom of doing ('many mappers small splits') highly questionable. else trying opposite (create few mappers aggregated splits). see dealing hadoop's small files problem, the small files problem, amazon elastic mapreduce deep dive , best practices , on.
Comments
Post a Comment