Flume - Twitter source error message

flume

#1

Please see the details below and help me in solving the issue.

[cloudera@quickstart bin]$ ./flume-ng agent -n TwitterAgent -Dtwitter4j.streamBaseURL=https://stream.twitter.com/1.1/ -c conf -f /usr/lib/flume-ng/conf/flume.conf

Error Message

19/01/11 13:29:35 INFO node.AbstractConfigurationProvider: Created channel MemChannel
19/01/11 13:29:35 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type com.cloudera.flume.source.TwitterSource
19/01/11 13:29:35 ERROR node.PollingPropertiesFileConfigurationProvider: Unhandled error
java.lang.NoSuchMethodError: twitter4j.conf.Configuration.getRequestHeaders()Ljava/util/Map;
at twitter4j.StreamingReadTimeoutConfiguration.getRequestHeaders(TwitterStreamImpl.java:735)
at twitter4j.internal.http.HttpClientWrapper.(HttpClientWrapper.java:43)
at twitter4j.TwitterStreamImpl.(TwitterStreamImpl.java:59)
at twitter4j.TwitterStreamFactory.(TwitterStreamFactory.java:40)
at com.cloudera.flume.source.TwitterSource.configure(TwitterSource.java:97)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:326)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

flume.conf

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = ***********
TwitterAgent.sources.Twitter.consumerSecret = ************
TwitterAgent.sources.Twitter.accessToken = **********
TwitterAgent.sources.Twitter.accessTokenSecret = ***********
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:50070/tweets
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

flume-env.sh

export twitter4j.streamBaseURL=https://stream.twitter.com/1.1/

Enviroment variables can be set here.

export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
export FLUME_HOME=/usr/lib/flume-ng
export CLASSPATH=$CLASSPATH:/FLUME_HOME/lib/*

Note that the Flume conf directory is always included in the classpath.

FLUME_CLASSPATH= β€œ/usr/lib/flume-ng/lib/twitter4j-core-3.0.3.jar”
FLUME_CLASSPATH="/usr/lib/flume-ng/lib/twitter4j-media-support-3.0.3.jar"
FLUME_CLASSPATH="/usr/lib/flume-ng/lib/twitter4j-stream-3.0.3.jar"
FLUME_CLASSPATH="/usr/lib/flume-ng/lib/flume-sources-1.0-SNAPSHOT.jar"

twitter4j.properties

twitter4j.streamBaseURL=https://stream.twitter.com/1.1/


#2

This is literally telling you that it cannot find the method twitter4j.conf.Configuration.getRequestHeaders() so I can only assume you have a class path issue, or are using the wrong version of twitter4j.


#3

Thanks Andy for your prompt response. I have gone through below steps and resolved the issue. I can stream the twitter data successfully now.

  1. I have removed the below 3 jar files from /usr/lib/flume-ng/lib/ directory.
    twitter4j-core-3.0.3.jar
    twitter4j-media-support-3.0.3.jar
    twitter4j-stream-3.0.3.jar

I have added the below jar files to /usr/lib/flume-ng/lib/ directory.
flume-sources-1.0-SNAPSHOT.jar
twitter4j-core-4.0.7.jar
twitter4j-async-4.0.7.jar
twitter4j-stream-4.0.7.jar

  1. Some corrections to my list of files mentioned in my question(flume.conf,flume-env.sh,twitter4j.properties)
    flume.conf - I did one correction.
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/tweet
    flume-env.sh - I have commented all the export, FLUME_CLSSPATH - In simple nothing is accessing from this file.
    twitter4j.properties - I deleted this from /usr/lib/flume-ng/conf/ directory.

3) Success Message.
[cloudera@quickstart bin]$ ./flume-ng agent -n TwitterAgent -Dtwitter4j.streamBaseURL=https://stream.twitter.com/1.1/ -c conf -f /usr/lib/flume-ng/conf/flume.conf

19/01/12 03:41:54 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
19/01/12 03:41:54 INFO node.AbstractConfigurationProvider: Created channel MemChannel
19/01/12 03:41:54 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type com.cloudera.flume.source.TwitterSource
19/01/12 03:41:54 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
19/01/12 03:41:54 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [Twitter, HDFS]
19/01/12 03:41:54 INFO node.Application: Starting new configuration:{ sourceRunners:{Twitter=EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twitter,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@6416298a counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
19/01/12 03:41:55 INFO node.Application: Starting Channel MemChannel
19/01/12 03:41:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
19/01/12 03:41:55 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
19/01/12 03:41:55 INFO node.Application: Starting Sink HDFS
19/01/12 03:41:55 INFO node.Application: Starting Source Twitter
19/01/12 03:41:55 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
19/01/12 03:41:55 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
19/01/12 03:41:55 INFO twitter4j.TwitterStreamImpl: Establishing connection.
19/01/12 03:42:08 INFO twitter4j.TwitterStreamImpl: Connection established.
19/01/12 03:42:08 INFO twitter4j.TwitterStreamImpl: Receiving status stream.
19/01/12 03:42:10 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
19/01/12 03:42:11 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/Tweets3/FlumeData.1547293330321.tmp
19/01/12 03:42:48 INFO hdfs.BucketWriter: Closing hdfs://localhost:8020/Tweets3/FlumeData.1547293330321.tmp
19/01/12 03:42:48 INFO hdfs.BucketWriter: Renaming hdfs://localhost:8020/Tweets3/FlumeData.1547293330321.tmp to hdfs://localhost:8020/Tweets3/FlumeData.1547293330321
19/01/12 03:42:48 INFO hdfs.HDFSEventSink: Writer callback called.
19/01/12 03:42:52 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
19/01/12 03:42:52 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/Tweets3/FlumeData.1547293372322.tmp
19/01/12 03:43:22 INFO hdfs.BucketWriter: Closing hdfs://localhost:8020/Tweets3/FlumeData.1547293372322.tmp
19/01/12 03:43:22 INFO hdfs.BucketWriter: Renaming hdfs://localhost:8020/Tweets3/FlumeData.1547293372322.tmp to hdfs://localhost:8020/Tweets3/FlumeData.1547293372322
19/01/12 03:43:22 INFO hdfs.HDFSEventSink: Writer callback called.

Many Thanks !!!


closed #4

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.