Why mapPartitionsWithIndex cause a shuffle in Spark? -


i'm new in spark. i'm checking shuffling issues in test application , don't know why in program mappartitionswithindex method cause shuffle! can see in picture initial rdd has 2 16mb partition , shuffle write 49.8 mb. know map or mappartition or mappartitionswithindex not shuffling transformation groupbykey see cause shuffle in spark. why?

enter image description here enter image description here

i think performing join/group operation after mappartitionswithindex , causing shuffle.

you can establish modifying code.

current code

val rdd = inputrdd1.mappartitionswithindex(....) val outrdd = rdd.join(inputrdd2) 

modified code

val rdd = inputrdd1.mappartitionswithindex(....) println(rdd.count) 

Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -