hadoop - Streaming HDFS data to Storm (aka HDFS spout) -
i know if there spout implementation streaming data hdfs storm (something similar spark streaming hdfs). know there bolt implementation write data hdfs (https://github.com/ptgoetz/storm-hdfs , http://docs.hortonworks.com/hdpdocuments/hdp2/hdp-2.1.3/bk_user-guide/content/ch_storm-using-hdfs-connector.html), other way around not find. appreciate suggestions , hints.
an option use hadoop hdfs java api. assuming using maven, include hadoop-common in pom.xml:
<dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-common</artifactid> <version>2.6.0.2.2.0.0-2041</version> </dependency>
then, in spout implementation use hdfs filesystem object. example, here pseudo code emitting each line in file string:
@override public void nexttuple() { try { path pt=new path("hdfs://servername:8020/user/hdfs/file.txt"); filesystem fs = filesystem.get(new configuration()); bufferedreader br = new bufferedreader(new inputstreamreader(fs.open(pt))); string line = br.readline(); while (line != null){ system.out.println(line); line=br.readline(); // emit line read hdfs file // _collector private member variable of type spoutoutputcollector set in open method; _collector.emit(new values(line)); } } catch (exception e) { _collector.reporterror(e); log.error("hdfs spout error {}", e); } }
Comments
Post a Comment