401 on .NET with certain combinations of languages/characters

dotnet
encoding

#1

We have spotted a 401 error which seems to be related to the combination of:

  • Using .NET to make HTTP Web Requests
  • Having a mix of non ascii (or non latin - not 100% sure the correct term here) characters and certain special characters like (, * and : in a query parameter in a GET request.

We think this might be a quirk or even bug in the .NET framework and we also have a workaround which appears to work but I would like to get somebody at Twitter to comment on this.

Take this example searching for the russian word for cat with a bracket afterward:
var q = “кот (”;

If we do the following:
var encoded = Uri.EscapeDataString(q);
var url = $“https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q={encoded}”;

We get:
https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=кот%20(

I believe this is the correctly percent encoded value as per Twitter docs. We use this same encoded value in the building of the OAuth header.

However, when we actually come to send this URL over the wire via a HttpWebRequest object in .NET the the final bracket is observed (in Fiddler) to not be encoded as %28 - we also observe this in the Uri object:

var uri = new Uri(url);
// uri.AbsoluteUri = https://api.twitter.com/1.1/search/tweets.json?count=100&include_entities=true&q=кот%20(

And the request returns a 401 error. The problem - I assume - is that the signing has been based on a %28 being in the encoded parameter value elsewhere in our OAuth code,

Now - in Fiddler I can even edit the failing request, change only the ( to a %28 and replay the request and it runs OK. No 401 and we get results back.

The same things happens in other tests mixing for example Arabic or German (containing umlauts) and these ‘special’ characters such as brackets.

Workaround:
Our work around to this at the moment is after running:
var encoded = Uri.EscapeDataString(q);

To ‘manually’ convert the encoded version of a small set of characters (e.g. (, * and : back to these original values and to then use this in both the building of the OAuth request and the HTTP GET request.

This then runs OK and we get results back which look correct.

A few other points:

  • I have reproduced this exact same issue using another .NET Twitter OAuth library and also a Twitter client built using .NET.
  • If the query contains for example just the english ‘CAT (’ the closing bracket is encoded as %28 and the OAuth all runs fine. It only appears to be if there is a mixture of these certain characters and some other non ascii character set.

My questions are

  • Can anybody spot a flaw in our thinking or process here or know how to force the .NET web request to send the actual URL we are requesting?
  • Is there an issue with our workaround? I am assuming Twitters servers are processing the bracket or other special values correctly and do not strictly expect it to be percent encoded (otherwise the requests would fail).