GET user API's track feature has unexpected behavior with UTF-8 characters


#1

I’m using the GET user API to track certain keywords, one of which contains an umlaut.
Using Mezgrman’s TweetPony Python library, I start the stream like this:
api.user_stream(processor = processor, replies = True, track = “Katze, Kätzchen”)
My script gets tweets with “Katze”, “Katze,”, and “Katze.” just fine, but it doesn’t get tweets with “Kätzchen,” or “Kätzchen.”, just those with “Kätzchen” without anything at the end.
Is this expected behavior?


Streaming API: punctuation symbols with unicode strings work incorrect
#2

I’d just like to bump this up, because either no one has seen this or no one cared.
This is an actual issue I’m having and I’d like to know whether this is an error on Twitter’s end or on mine.


#3

We are having the same issues with the streaming api (with the encoding issues).

If I search for 障害, I get a lot of tweets per minute. If I start to track that word via the streaming api, I only get one or two per minute.

I’m streaming with the tracking parameter:
track=%E9%9A%9C%E5%AE%B3&language=ja
(eg: the same character set as twitter search)

I’m only receiving tweets with the given characters surrounded by spaces or newlines, like:

https://twitter.com/amaama69/status/357754192057733120
or

It looks like Twitter is filtering the hashtags, because sometimes I do receive a tweet with a hashtag:

But I never receive tweets which have the two characters in the middle of a sentence. As JA doesn’t use spaces the way latin languages do, it’s kind of a big deal.

I’m not a native JA speaker, but I seem to have the same problem with the German encoding and other languages (eg: Russian).


#4

@svdgraaf The streaming API doesn’t support non-space languages such as Chinese, Japanese, Korean. Please refer to https://dev.twitter.com/docs/streaming-apis/parameters#track for more info


#5

Ah, I just noticed a small line in the API, stating non-space languages are currently not supported in the streaming api :frowning:

Non-space separated languages, such as CJK and Arabic, are currently unsupported.

You can test your other terms here:
https://dev.twitter.com/docs/streaming-apis/keyword-matching