A list of all userids with protected accounts?


#1

I two slightly different but related issues but first I should give a bit of background.

We have four months of geo-located data collected from the Streaming API. After some correspondence with Twitter we were warned that we could be in breech of the T&Cs. We stopped the application and then purchased a further 3 months of data. Twitter agreed that the T&Cs for purchasing the data covered the use of the data that we had already collected ourselves giving a dataset covering 7 months.

One of our quality checks involves looking at the date range for first and last tweets as a distribution for all users. We have noted a discontinuity that corresponds to the cutover between the self-collected and the purchased data We have established that this is mostly explained by protected accounts. Specifically, tweets from accounts we have collected ourselves but are subsequently protected are in the self-collected data. However, tweets from protected accounts are not in the purchased data (at the time the purchased data was extracted).

My first question is, is there any way of identifying the userids for protected accounts? The idea would be to use this information to strip out protected accounts. This would not only help with respecting the privacy of users who having retrospectively protected their accounts, it would also provide a more consistent dataset.

Another issue related to this is a second discontinuity that occurs on 16 September? My second question is, was there anything odd that happened with Twitter around that date that might have caused users to stop geolocating tweets?


#2

Hi @NigelSwier, if you have received data through our paid products (Gnip or otherwise), I would suggest contacting your Twitter account manager and they can put you in touch with the right people to work with you directly.

With that said, you can look up 100 users at a time with users/lookup. In each user object returned there is a boolean “protected” property. Additionally, there is a “protected” property in the Tweet object of every Tweet.

You could potentially pull unique user IDs from your Tweet data set and run them against users/lookup. This will tell you if the user is protected at this current time. Depending on how big your data set is, it may take some time while staying within rate limits.

In regards to geo Tweets around 9/16, there was a publicized issue with the iOS8 release on the Twitter side and there was a dip in mobile usage starting on 9/17. Not sure if it’s related, but it could be.