Spark Kafka WordCount Python -

i've started playing apache spark , trying kafka wordcount work in python. i've decided use python language i'll able use other big data tech , databricks offering courses through spark.

my question: i'm running basic wordcount example here: https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/kafka_wordcount.py seems kick off , connect kafka logs can't see produce word count. added below lines write text file , produces bunch of empty text file. connecting kafka topic , there data in topic, how can see doing data if anything? timing thing? cheers.

code processing kafka data

                counts = lines.flatmap(lambda line: line.split("|")) \                     .map(lambda word: (word, 1)) \                     .reducebykey(lambda a, b: a+b) \                     .saveastextfiles("sparkfiles")

data in kafka topic

                    16|16|mr|joe|t|bloggs

sorry, being idiot. when produced data topic while spark app running can see following in output

                (u'a', 29)                 (u'count', 29)                 (u'this', 29)                 (u'is', 29)                 (u'so', 29)                 (u'words', 29)                 (u'spark', 29)                 (u'the', 29)                 (u'can', 29)                 (u'sentence', 29)

this represents how many times each word represented in block processed spark.

Search This Blog

Politics

Spark Kafka WordCount Python -

Comments

Post a Comment

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -