I’m using a regex (python) to filter and pull tweets that contain URLs in them. However certain tweets, during streaming, show up with incomplete URLs. For example: http://t.co/a7sh9… (they trail off with ‘…’, resulting in a broken URL). Is there any way I can obtain the complete URL in this case? Or do I just have to ignore it?
Rather than using a regular expression, you should use the tweet’s urls entities (https://dev.twitter.com/docs/platform-objects/entities) to have a list of the t.co URLs and what original URL they point to.
Without a few tweets it’s hard to be sure, but the incomplete URLs you talk about seem to be URLs that have been partly cut by some client. In that case it would be hard to find the original URL.