I am trying to create external Hive table on top of Twitter JSON file. I have used a python script which returns tweets on the basis of longitude and latitude. Script is like this:
from twitter import *
import sys
import csv
import json
latitude = 33.729388 # geographical centre of search
longitude = 73.093146 # geographical centre of search
max_range = 1000 # search range in kilometres
config = {}
execfile("/home/oracle/Desktop/twitter-1.17.1/config.py", config)
twitter = Twitter(
auth = OAuth(config["access_key"], config["access_secret"],
config["consumer_key"], config["consumer_secret"]))
result_count = 1
last_id = None
while result_count > 0 :
query = twitter.search.tweets(q = "", geocode = "%f,%f,%dkm" % (latitude, longitude, max_range), count = 100, max_id = last_id)
for result in query["statuses"]:
print result
result_count += 1
The scripts runs successfully and returns a JSON file like this.
{u’contributors’: None, u’truncated’: False, u’text’: u’RT @TajinderBagga: Sulag gai bhai sulag gai, Kejriwal ki Sulag gai \U0001f603\U0001f603\U0001f603’, u’is_quote_status’: True, u’in_reply_to_status_id’: None, u’id’: 744806287758614528, u’favorite_count’: 0, u’entities’: {u’symbols’: [], u’user_mentions’: [{u’id’: 85462891, u’indices’: [3, 17], u’id_str’: u’85462891’, u’screen_name’: u’TajinderBagga’, u’name’: u’Tajinder Pal S Bagga’}], u’hashtags’: [], u’urls’: [{u’url’: u’daslkjsajkdas’, u’indices’: [71, 94]
Now the problem is that when i create a Hive External Table on this JSON file Hive donot accept the column name with single quotes and script returns JSON file with every column name starting with " u’ " and ending with " ’ ". Can anyone Help me with this.