Error while reading Hive table on top of Twitter JSON file in python

geo
api

#1

I am trying to create external Hive table on top of Twitter JSON file. I have used a python script which returns tweets on the basis of longitude and latitude. Script is like this:

from twitter import *
import sys
import csv
import json

latitude = 33.729388	# geographical centre of search
longitude = 73.093146	# geographical centre of search
max_range = 1000			# search range in kilometres
config = {}
execfile("/home/oracle/Desktop/twitter-1.17.1/config.py", config)
twitter = Twitter(
	        auth = OAuth(config["access_key"], config["access_secret"],     
            config["consumer_key"], config["consumer_secret"]))
result_count = 1
last_id = None
while result_count >  0 :
   query = twitter.search.tweets(q = "", geocode = "%f,%f,%dkm" % (latitude, longitude, max_range), count = 100, max_id = last_id)

for result in query["statuses"]:
	print result
	result_count += 1

The scripts runs successfully and returns a JSON file like this.

{u’contributors’: None, u’truncated’: False, u’text’: u’RT @TajinderBagga: Sulag gai bhai sulag gai, Kejriwal ki Sulag gai \U0001f603\U0001f603\U0001f603’, u’is_quote_status’: True, u’in_reply_to_status_id’: None, u’id’: 744806287758614528, u’favorite_count’: 0, u’entities’: {u’symbols’: [], u’user_mentions’: [{u’id’: 85462891, u’indices’: [3, 17], u’id_str’: u’85462891’, u’screen_name’: u’TajinderBagga’, u’name’: u’Tajinder Pal S Bagga’}], u’hashtags’: [], u’urls’: [{u’url’: u’daslkjsajkdas’, u’indices’: [71, 94]

Now the problem is that when i create a Hive External Table on this JSON file Hive donot accept the column name with single quotes and script returns JSON file with every column name starting with " u’ " and ending with " ’ ". Can anyone Help me with this.


#2

The script is writing a printable string representation of a Python dictionary instead of a JSON object. Changing it to print out a JSON string might work:

print json.dumps(result)


#3

yes working thanks alot.