What's the meaning of the "Stages" on Spark UI for Streaming Scenarios -
i'm working on spark streaming , trying monitor , improve performance streaming apps. i'm confusing following questions.
- what's meaning each stages on spark portal "spark streaming" apps.
- not "transformation" mapped tasks. , how target "transformation" mapped tasks.
streaming code snapshot:
val transformed = input.flatmap(i => processinput(i)) val aggregated = transformed.reducebykeyandwindow(reduce(_, _), seconds(aggregatewindowsizeinseconds), seconds(slidingintervalinseconds)) val finalized = aggregated.mapvalues(finalize(_)) finalized
(only flatmap stages occurred on portal.)
spark streaming portal
thanks,
tao
spark takes individual commands source , optimizes plan of tasks executed on cluster. 1 example of 1 such optimization map-fusion: 2 calls map come in, 1 single map task comes out. stage higher-level boundary between groups of tasks, defined such cross boundary have perform shuffle.
so:
- each of operators call on rdd result in actions , transformations.
- these result in dag of operators.
- the dag compiled stages.
- each stage executed series of tasks.
Comments
Post a Comment