Spark Kafka WordCount Python -
i've started playing apache spark , trying kafka wordcount work in python. i've decided use python language i'll able use other big data tech , databricks offering courses through spark.
my question: i'm running basic wordcount example here: https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/kafka_wordcount.py seems kick off , connect kafka logs can't see produce word count. added below lines write text file , produces bunch of empty text file. connecting kafka topic , there data in topic, how can see doing data if anything? timing thing? cheers.
code processing kafka data
counts = lines.flatmap(lambda line: line.split("|")) \ .map(lambda word: (word, 1)) \ .reducebykey(lambda a, b: a+b) \ .saveastextfiles("sparkfiles")
data in kafka topic
16|16|mr|joe|t|bloggs
sorry, being idiot. when produced data topic while spark app running can see following in output
(u'a', 29) (u'count', 29) (u'this', 29) (u'is', 29) (u'so', 29) (u'words', 29) (u'spark', 29) (u'the', 29) (u'can', 29) (u'sentence', 29)
this represents how many times each word represented in block processed spark.
Comments
Post a Comment