Can a search in the API with hashtags (using #) return nonhashtagged tweets?


#1

Hello Everybody,
I ran a search using hashtags but I noticed in my output tweets without hashtags in the text.
Is that expected?
Thanks!


#2

is this legacy / standard search, or premium?

can you share any code?


#3

We are not paying anything for access, what I guess would be the premium search?

Here it goes part of the code. For more check https://github.com/hsharrison/tweetstash/blob/master/src/tweetstash/search.py:

search_terms = [’#’ + hashtag for hashtag in hashtags]

    # filter.list
    filter_path = config_path / 'filter.list'
    if filter_path.is_file():
        with filter_path.open(encoding='utf-8') as filter_file:
            filters = filter_file.read().splitlines()
        filter_terms = ['#' + hashtag for hashtag in filters]
    else:
        filter_terms = []

    return cls(stash, auth_data, search_terms, filter_terms=filter_terms)

def search_api(self):
    auth = tweepy.AppAuthHandler(*self.auth_data[:2])
    api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
    if not api:
        raise ValueError('Authentication failed')
    return api

def stream_auth(self):
    auth = tweepy.OAuthHandler(*self.auth_data[:2])
    auth.set_access_token(*self.auth_data[2:])
    return auth

def search(self, **stop_after):
    if not stop_after:
        stop_after['days'] = 36500
    stop_delta = timedelta(**stop_after)

    for hashtags in partition_all(max_hashtags_per_search, self.search_terms):
        tweets = search_twitter(self.search_api(), query=' OR '.join(hashtags))

#4

Do you have examples of tweets + rules that didn’t match up like you expected?


#5

Also FWIW your link is bad


#6

Sure. For example:

  1. For some tweets I get the empty text box in the generated excel sheet. All info about the tweet is there but no text, no hashtags, no content at all is showing…
  2. Back workout
  3. Fav song and fav place all in one
  4. RT @----------: Isn’t She Lovely

#7

Can you paste the literal rules not an approximation?


#8

I’m the developer. Not sure what the problem is with the link to the source code, I can see it fine and it even embedded on the page here.

The search rule was very simple, just a bunch of hashtags concatenated with OR, like “#hashtag1 OR #hashtag2 OR #hashtag3”. We did not save all the information on every tweet so we are not sure if the tweets at issue are replies/retweets or something. They are at least relevant to the topic of the hashtags.

These tweets were collected back in 2016 using the streaming interface provided by the Python library tweepy.


#9

There’s not much to go on here but to the best of my knowledge that using the hashtag rule should never include tweets that don’t contain that hashtag. (FWIW it also takes into quoted retweets). I’ve been using GNIP for years and I can’ t remember an instance of this happening.