Streaming API without location/coordinates?


#1

Hello guys,

some days ago i started a project with twitter, hadoop and flume.
I want to collect tweets and want to create an map like an density map but with additional information from another source.

My flume agent runs and collects tweets but no tweet has coordinate only some of them have an information about the location but i need coordiantes.
I tried to add the parameter “location” to my configuration with the value -180,-90,180,90 like described in the API but also the same problem.
Can anybode what to do to get tweets with some coordinates to work with an map. If its possible it would be great if theres an possibilty only for tweets for coordinates. But i would be happy if there would be coordinates.

Here you can see my configuration.

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxx
TwitterAgent.sources.Twitter.consumerSecret = xxx
TwitterAgent.sources.Twitter.accessToken = xxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxx

TwitterAgent.sources.Twitter.keywords = hadoop, big data, apache foundation, flume, mahout
TwitterAgent.source.Twitter.locations=-180,-90,180,90

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/examples/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

Thanks for your help

Frank


#2

Don’t publish your application’s secrets! You should change those.

Only a small percentage (maybe 2%) have coordinates. That’s just a fact.


#3

Info: Including Hadoop libraries found via (/usr/local/hadoop/bin/hadoop) for HDFS access
Warning: $HADOOP_HOME is deprecated.

Warning: $HADOOP_HOME is deprecated.

Info: Excluding /usr/local/hadoop/libexec/…/lib/slf4j-api-1.4.3.jar from classpath
Info: Excluding /usr/local/hadoop/libexec/…/lib/slf4j-log4j12-1.4.3.jar from classpath

  • exec /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Xmx20m -Dflume.root.logger=DEBUG, -cp ‘/usr/local/flume/conf:/usr/local/flume/lib/*:/usr/local/flume/lib/flume-sources-1.0-SNAPSHOT.jar:/usr/local/hadoop/libexec/…/conf:/usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar:/usr/local/hadoop/libexec/…:/usr/local/hadoop/libexec/…/hadoop-core-1.0.4.jar:/usr/local/hadoop/libexec/…/lib/asm-3.2.jar:/usr/local/hadoop/libexec/…/lib/aspectjrt-1.6.5.jar:/usr/local/hadoop/libexec/…/lib/aspectjtools-1.6.5.jar:/usr/local/hadoop/libexec/…/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/libexec/…/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/libexec/…/lib/commons-cli-1.2.jar:/usr/local/hadoop/libexec/…/lib/commons-codec-1.4.jar:/usr/local/hadoop/libexec/…/lib/commons-collections-3.2.1.jar:/usr/local/hadoop/libexec/…/lib/commons-configuration-1.6.jar:/usr/local/hadoop/libexec/…/lib/commons-daemon-1.0.1.jar:/usr/local/hadoop/libexec/…/lib/commons-digester-1.8.jar:/usr/local/hadoop/libexec/…/lib/commons-el-1.0.jar:/usr/local/hadoop/libexec/…/lib/commons-httpclient-3.0.1.jar:/usr/local/hadoop/libexec/…/lib/commons-io-2.1.jar:/usr/local/hadoop/libexec/…/lib/commons-lang-2.4.jar:/usr/local/hadoop/libexec/…/lib/commons-logging-1.1.1.jar:/usr/local/hadoop/libexec/…/lib/commons-logging-api-1.0.4.jar:/usr/local/hadoop/libexec/…/lib/commons-math-2.1.jar:/usr/local/hadoop/libexec/…/lib/commons-net-1.4.1.jar:/usr/local/hadoop/libexec/…/lib/core-3.1.1.jar:/usr/local/hadoop/libexec/…/lib/hadoop-capacity-scheduler-1.0.4.jar:/usr/local/hadoop/libexec/…/lib/hadoop-fairscheduler-1.0.4.jar:/usr/local/hadoop/libexec/…/lib/hadoop-thriftfs-1.0.4.jar:/usr/local/hadoop/libexec/…/lib/hive-beeline-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-cli-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-common-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-contrib-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-exec-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-hbase-handler-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-hwi-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-jdbc-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-metastore-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-serde-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-service-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hive-shims-0.11.0.jar:/usr/local/hadoop/libexec/…/lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop/libexec/…/lib/jackson-core-asl-1.8.8.jar:/usr/local/hadoop/libexec/…/lib/jackson-mapper-asl-1.8.8.jar:/usr/local/hadoop/libexec/…/lib/jasper-compiler-5.5.12.jar:/usr/local/hadoop/libexec/…/lib/jasper-runtime-5.5.12.jar:/usr/local/hadoop/libexec/…/lib/jdeb-0.8.jar:/usr/local/hadoop/libexec/…/lib/jersey-core-1.8.jar:/usr/local/hadoop/libexec/…/lib/jersey-json-1.8.jar:/usr/local/hadoop/libexec/…/lib/jersey-server-1.8.jar:/usr/local/hadoop/libexec/…/lib/jets3t-0.6.1.jar:/usr/local/hadoop/libexec/…/lib/jetty-6.1.26.jar:/usr/local/hadoop/libexec/…/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/libexec/…/lib/jsch-0.1.42.jar:/usr/local/hadoop/libexec/…/lib/junit-4.5.jar:/usr/local/hadoop/libexec/…/lib/kfs-0.2.2.jar:/usr/local/hadoop/libexec/…/lib/log4j-1.2.15.jar:/usr/local/hadoop/libexec/…/lib/mockito-all-1.8.5.jar:/usr/local/hadoop/libexec/…/lib/oro-2.0.8.jar:/usr/local/hadoop/libexec/…/lib/servlet-api-2.5-20081211.jar:/usr/local/hadoop/libexec/…/lib/xmlenc-0.52.jar:/usr/local/hadoop/libexec/…/lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop/libexec/…/lib/jsp-2.1/jsp-api-2.1.jar’ -Djava.library.path=:/usr/local/hadoop/libexec/…/lib/native/Linux-amd64-64 org.apache.flume.node.Application -f conf/flume.conf console -n TwitterAgent
    log4j:WARN No appenders could be found for logger (org.apache.flume.node.PollingPropertiesFileConfigurationProvider).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

above error comes when after configuring flume , i test it by command :
bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG, console -n TwitterAgent