I regularly write for Microsoft’s ScriptJunkie site, where we use the Twitter bookmarklet to count retweets of new articles. We were surprised recently to discover an interesting issue with the bookmarklet that we thought we would share.
There are cases where users will share http://msdn.microsoft.com/en-US/scriptjunkie/hh273390.aspx for a post, as opposed to http://msdn.microsoft.com/en-us/scriptjunkie/hh273390.aspx, both of which have very different tweet counts. If you haven’t noticed the difference between the URLs, the first includes a capital ‘US’.
As the retweet paradigm seems to consider each URL unique, regardless of casing, I was wondering if there are or may be plans to consider normalizing URLs so that even if a capitalized URL or URL with capitalized variants in the URL was shared, all variations of that URL would have the same tweet count (or an aggregated tweet count).
From a technical perspective, this could be as simple as lowercasing URLs that come through or storing a lowercase ‘normalized’ version in a separate field, however I’m certain that something more intelligent could be possible.