hadoop - Streaming HDFS data to Storm (aka HDFS spout) -


i know if there spout implementation streaming data hdfs storm (something similar spark streaming hdfs). know there bolt implementation write data hdfs (https://github.com/ptgoetz/storm-hdfs , http://docs.hortonworks.com/hdpdocuments/hdp2/hdp-2.1.3/bk_user-guide/content/ch_storm-using-hdfs-connector.html), other way around not find. appreciate suggestions , hints.

an option use hadoop hdfs java api. assuming using maven, include hadoop-common in pom.xml:

<dependency>    <groupid>org.apache.hadoop</groupid>    <artifactid>hadoop-common</artifactid>    <version>2.6.0.2.2.0.0-2041</version> </dependency> 

then, in spout implementation use hdfs filesystem object. example, here pseudo code emitting each line in file string:

@override public void nexttuple() {    try {       path pt=new path("hdfs://servername:8020/user/hdfs/file.txt");       filesystem fs = filesystem.get(new configuration());       bufferedreader br = new bufferedreader(new inputstreamreader(fs.open(pt)));       string line = br.readline();       while (line != null){          system.out.println(line);          line=br.readline();          // emit line read hdfs file          // _collector private member variable of type spoutoutputcollector set in open method;          _collector.emit(new values(line));       }    } catch (exception e) {       _collector.reporterror(e);       log.error("hdfs spout error {}", e);    } } 

Comments

Popular posts from this blog

apache - PHP Soap issue while content length is larger -

asynchronous - Python asyncio task got bad yield -

javascript - Complete OpenIDConnect auth when requesting via Ajax -