We are currently developing a school project where our goal is to stream and store Twitter data relevant to our city (Uppsala, Sweden) and present it in different ways. We are now trying to setup a single stream using the Phirehose library based on the track word “uppsala” and a set of coordinates surrounding the city.
Our concern is how to distinguish the tweets that have coordinates that fall into our city “box” from those that are let through because of keyword and “place”. It seems that you can get tweets with our track word but with coordinates in another location. We also want to remove all tweets that is let through because of their “place” attribute intersecting our city coordinates but are not actually made in Uppsala ( e.g. getting rid of the all tweets tagged with place “Sweden” but not actually made in Uppsala).
“If you would like to exclude place matches or only include places which fall completely within the bounding box, your code will have to perform an additional filtering step after reading the filtered stream”
We find it hard to actually understand how the filtering and tagging works and thus it is hard to set up a code that filters the tweets properly. Could any of you help us with either explaining how it works or proposing a solution?
Would be very helpful!