What's the meaning of the "Stages" on Spark UI for Streaming Scenarios -


i'm working on spark streaming , trying monitor , improve performance streaming apps. i'm confusing following questions.

  1. what's meaning each stages on spark portal "spark streaming" apps.
  2. not "transformation" mapped tasks. , how target "transformation" mapped tasks.

streaming code snapshot:

val transformed = input.flatmap(i => processinput(i)) val aggregated = transformed.reducebykeyandwindow(reduce(_, _), seconds(aggregatewindowsizeinseconds), seconds(slidingintervalinseconds)) val finalized = aggregated.mapvalues(finalize(_)) finalized 

(only flatmap stages occurred on portal.)

spark streaming portal spark streaming, spark portal

thanks,

tao

spark takes individual commands source , optimizes plan of tasks executed on cluster. 1 example of 1 such optimization map-fusion: 2 calls map come in, 1 single map task comes out. stage higher-level boundary between groups of tasks, defined such cross boundary have perform shuffle.

so:

  • each of operators call on rdd result in actions , transformations.
  • these result in dag of operators.
  • the dag compiled stages.
  • each stage executed series of tasks.

Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -