My question is very simple. I would like to know how exactly Twitter matches Tweets with the “track” parameter of the statuses/filter endpoint. In particular I’m interested to know what methods/regexs (if applicable) are used to tokenize the Tweets textual content to match later against the “track” terms. I hope this information is not confidential. It’s important for my research to understand precisely how Twitter filters Tweets when using statuses/filter.
I have read many times the official statuses/filter “track” parameter documentation but it’s not 100% clear, there are some examples that doesn’t show all cases. For example I discovered by manual inspection that Twitter also matches the “retweeted_status” object inside a matched Tweet.
I know this is asking too much, but having the pseudocode snippet of the matching algorithm Twitter uses would be awesome.