Pagignation


#1

Hi been trying to retrieve more tweets than the api provides, it’s for a data analysis project. The code I’ve came up with seems to have limited success it’s stop retrieving tweets after the 2nd requests but still runs with a 200 status so no error just the statuses object is empty :frowning:

<?
	require_once('twitteroauth/twitteroauth.php');
	require_once('config.php');
	
	$oauth 				= new TwitterOAuth(CONSUMER_KEY,CONSUMER_SECRET,$accessToken, $accessTokenSecret); // create Oauth request

	$cursor				= ""; //next results
	$count				= 2; // number of tweets to return
	$ittReults			= 4; // number of pages to return
			
	/*
		Should return 8 results
		
		2*4 = 8 
		just a small example as the actual plan 
		is to return 400 tweets (max you can get per search)
		
		whic would require 4 searches as the maximum number of
		tweets you can fetch with a search is 100
	*/		
			
	$query = array(
		"q" 			=> "#indyref",
		"count" 		=> $count,
		"result_type" 	=> "popular",
		"max_id"		=> "",
		"include_entities" => 0
	);

	for($i = 1; $i < $ittReults; $i++){
		$twitter_search		= $oauth->get("/search/tweets",$query);

		if($oauth->http_code == 200){
			foreach ($twitter_search->statuses as $result) {
			  echo $result->user->created_at . " " . $result->user->screen_name . ": " . $result->text . "\n";
			}
			
			/************* Build query string for the next set of results **************/
			$cursor		= ltrim ($twitter_search->search_metadata->next_results,'?');
			$qtArr		= explode("&", $cursor);
			
			foreach ($qtArr as $qVal) {
				$qtArr	=	explode("=", $qVal);
				if($qtArr[0]) $query[$qtArr[0]]	= $qtArr[1];	
			}
		}else{
			echo "SOMETHING WENT WRONG" . "\n";
		}
		
		echo "**************************************" . "\n";
	}
?>

OUTPUT

Mon Jan 12 18:57:19 +0000 2009 halina1979: #EverydaySexism it appears we have regressed to the late 19th century. Get back in the kitchen ladies #indyref http://t.co/9NDcl45WvO
Tue Apr 13 14:32:52 +0000 2010 BlairJenkinsYes: Very proud of our first #indyref campaign broadcast - 5.55 on BBC2, 6.25 on STV and 6.55 on BBC1. Please watch and tell family and friends!


Thu Apr 16 13:01:01 +0000 2009 robertflorence: I’ve clarified the latest campaign video from @UK_Together in case anyone was in doubt. Take a look. #indyref http://t.co/lONiYuBq7w
Thu Apr 19 13:36:09 +0000 2012 YesScotland: Three things we learned from the BBC #indyref debate http://t.co/hyPFWWCRfT http://t.co/TMY8znGDsR



So like I said above it works fine for two round trips before failing to retrieve tweets after that regardless if I put the iteration count up! Has any one had any success with pagination?

Thanks
Simon


#2

I guess the reason it doesn’t work is because from search_metadata->next_results you are getting a value of q that is URL-encoded (%23indyref instead of #indyref). $query[$qtArr[0]] = $qtArr[1]; should be $query[$qtArr[0]] = rawurldecode($qtArr[1]);

In fact you only need to get max_id from search_metadata->next_results. You can also compute it (it’s the biggest tweet id in the results + 1). But even if you don’t use its value, search_metadata->next_results can still be useful: it’s empty if you are at the last page (not getting any tweet would also tell you that).

And by the way you can get more than 400 tweets using the search API (but you can only search against the last week of tweets so there might be fewer results of course).


#3

Thanks for the reply this bit here breaks down the querystring in to its component parts

$cursor = ltrim ($twitter_search->search_metadata->next_results,’?’);
$qtArr = explode("&", $cursor);

foreach ($qtArr as $qVal) {
$qtArr = explode("=", $qVal);
if($qtArr[0]) $query[$qtArr[0]] = $qtArr[1];

thanks for the information, you’ve still given me another route to explorer.