Search/tweets is occasionally returning with bad content-length header



Starting around 5:30pm Eastern Time yesterday (Nov 7, 2016) we started to see an uncharacteristically high number of returns from calls to the search/tweets endpoint that contained no data. The vast majority of calls to that endpoint are continuing to returning data as expected, but the errors persisted so I investigated.

It turns out that in every case in which we’ve seen a failure, the http headers returned by the call contain 2 “content-length” length headers. The first contains a very plausible numeric value for the size of the content, but the second is either:

content-length: macaw_search

This causes libcurl, which is what we’re using to fetch data, to think the size of the body is 0 and so we get no data.

Is anybody else seeing this?

Here’s a full set of headers included in a response we received earlier today (note the 2 consecutive content-length headers:

HTTP/1.1 200 OK
cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
content-disposition: attachment; filename=json.json
content-encoding: gzip
content-length: 48348
content-length: macaw_search
content-type: application/json;charset=utf-8
date: Tue, 08 Nov 2016 16:16:34 GMT
expires: Tue, 31 Mar 1981 05:00:00 GMT
last-modified: Tue, 08 Nov 2016 16:16:34 GMT
pragma: no-cache
server: tsa_b
set-cookie: lang=en; Path=/
set-cookie: guest_id=v1%3A147862179430166399;; Path=/; Expires=Thu, 08-Nov-2018 16:16:34 UTC
status: 200 OK
strict-transport-security: max-age=631138519
x-access-level: read
x-connection-hash: eb39594668daf971a3e40dfe87b3f3e5
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-rate-limit-limit: 450
x-rate-limit-remaining: 365
x-rate-limit-reset: 1478622136
x-response-time: 272
x-transaction: 0040d4b100742a6c
x-twitter-response-tags: BouncerCompliant
x-xss-protection: 1; mode=block

Inconsistent REST API rate limit headers

Sounds really weird. Anyone else able to reproduce?


I’ve been noticing similar problems, though I haven’t had the time to investigate. I’m now logging for non-numeric content-length values, and will report if I find anything.


Also seeing the same issue, but its not just affecting the content-length header. I have received “macaw_search, 92” in the x-rate-limit-remaining header. We’re also seeing random data in the x-rate-limit-reset and x-rate-limit-limit headers - (we don’t read any of the others).

Examples of data we have received (from my error logs) - all in different requests to the search API, there’s no obvious pattern, but it’s pretty frequent:

x-rate-limit-remaining: 'macaw_search, 92’
x-rate-limit-remaining: '51,
x-rate-limit-remaining: '
x-rate-limit-reset: '1478663054,
x-rate-limit-reset: '008a65fe00d3124b, 1478650828’
x-rate-limit-reset: '1478658420, 1478658393’
x-rate-limit-reset: '1478660684,
x-rate-limit-limit: ‘009c664f00eef814’


OK thanks - this gives me a little more to go on. I’ll do some digging internally, but I’m a little busy right now so this is not going to be a high priority. Apologies to those of you encountering errors in these areas.


I’m still seeing this too. I will re-iterate that the vast majority of the API calls return with correct header, but I’m still consistently seeing problem responses frequently.

As @markunsworth pointed out above, I’ve now also detected errors in other headers. Here are some snippets of problem headers that came in overnight:

content-encoding: gzip
content-encoding: macaw_search
content-length: 67529

content-encoding: 00644bdb0068deb5
content-length: 43
content-length: 37
content-length: 46093
content-type: text/html; charset=ISO-8859-1

content-encoding: macaw_search
content-length: 59076

It is also really surprising that one of the header sets above included “charset=ISO-8859-1”, as my understanding is that the twitter api “always” returned utf-8 data?

I do get notified by my s/w whenever an unparseable response is returned from a Twitter API call and in the past that happened, maybe, once every couple of months or so - so infrequently that I never bothered to look into it. That changed abruptly at 5:30pm Eastern Time on Nov 7 and since then these errors have been occurring frequently.

I’ve been capturing the “full response - including headers” (like the one I pasted into the first post on this thread) for unparsable responses on a couple of my servers since about 10am ET yesterday (Nov 8). @andypiper if you want me to package these up and send them along, just ask.


Thanks for the additional context! All good information. I can’t promise that we can address this immediately, but given information like date of change in behaviour etc, it may be possible to identify a deploy on our side that affected the headers.


I am also seeing it.

This is coming from the statuses/user_timeline endpoint:

HTTP/1.1 200 OK
cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
content-disposition: attachment; filename=json.json
content-encoding: gzip
content-length: 93211
content-length: timelines_api


The first time I see errors for this in our logs is November 3 00:34:30.670 UTC, the distribution of errors is very low up until the 8th November at 12:00 UTC at which point the increase is very noticeable. However it’s difficult for me to tell if this increase in errors was just down to an increase in the number of requests to the search API as we don’t store that information in a readily accessible format.


Here are some more.

content-encoding: gzip
content-length: 103654
content-length: timelines_api

content-encoding: gzip
content-length: 72681
content-length: timelines_api

content-encoding: 00e15ced0001a855
content-encoding: gzip
content-length: 200 OK
content-length: 64345

content-encoding: gzip
content-length: 87297
content-length: timelines_api

content-encoding: gzip
content-length: 1; mode=block
content-length: 105
content-length: 27150
content-length: 1444
content-type: text/javascript; charset=utf-8
content-type: text/html;charset=utf-8

set-cookie: _twitter_sess={redacted}; Path=/;; Secure; HTTPOnly
set-cookie: guest_id={redacted};; Path=/; Expires=Sat, 10-Nov-2018 03:20:21 UTC

content-encoding: gzip
content-length: 200 OK
content-length: 35897
content-type: application/json;charset=utf-8
content-type: 1; mode=block

last-modified: Thu, 10 Nov 2016 03:57:28 GMT
last-modified: 1478750636

set-cookie: 1500
set-cookie: guest_id={redacted};; Path=/; Expires=Sat, 10-Nov-2018 03:57:28 UTC

content-encoding: gzip
content-length: 38401
content-length: timelines_api


Just saw this myself for the first time on 2016-11-09T19:42:31.888Z. Response to a search api request. First time I remember seeing “macaw_search”. Also note the weird content-encoding and unexpected location.

"headers": {
  "content-encoding": "",
  "content-length": "229",
  "date": "Wed, 09 Nov 2016 19:42:31 GMT",
  "last-modified": "Wed, 09 Nov 2016 19:42:31 GMT",
  "location": "",
  "server": "tsa_b",
  "status": "macaw_search",
  "strict-transport-security": "max-age=631138519",
  "x-connection-hash": "3b3ba3fbdbdf2852aff80afa7a5451e1",
  "x-rate-limit-limit": "450"


OK thanks for all these reports. I’m trying to track down the correct folks to help me look into it!


I’m trying to limit what I show to different examples, and not just repeat examples of essentially the same issue.

Here’s an interesting one.

cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 <-- Not Included!
content-length: 31
content-length: 00f01cf000530739
content-length: 43212
content-type: text/html; charset=ISO-8859-1
content-type: application/json;charset=utf-8

x-adtype: clear

x-orientation: p


x-refreshtime: 60

x-xss-protection: timelines_api

I’ve never seen x_failurl before, and have no other examples of many of the other headers in this example. I have 3 recorded instances of x-refreshtime, but the other 2 are 30.


Curious where this one is coming from since it includes a mopub URL? What API endpoint is this?


All of the snippets I’ve been providing are coming from statuses/user_timeline


I’ve also seen a mopub url in a header. This response was returned from a call to search/tweets and in addition to the multiple, and bogus, content-length value(s) also includes a mopub url in an x-imptracker header:

HTTP/1.1 200 OK
content-disposition: attachment; filename=json.json
content-encoding: gzip
content-length: 297
content-length: 48
content-type: application/json;charset=utf-8
date: Wed, 09 Nov 2016 00:59:27 GMT
last-modified: Wed, 09 Nov 2016 00:59:27 GMT
server: tsa_b
set-cookie: lang=en; Path=/
set-cookie: guest_id=v1%3A147865316707649398;; Path=/; Expires=Fri, 09-Nov-2018 00:59:27 UTC
status: 200 OK
strict-transport-security: max-age=631138519
x-access-level: read
x-connection-hash: 38f55d862d190b61eecf58e87012bf2d
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-fulladtype: millennial_full
x-nativeparams: {“adHeight”:480,“adUnitID”:“133415”,“adWidth”:320}
x-orientation: p
x-rate-limit-limit: 450
x-rate-limit-remaining: 270
x-rate-limit-reset: 1478653325
x-response-time: 39
x-scrollable: 0012fb9e001702c9
x-xss-protection: 1; mode=block


We are also experiencing the same issues above. One additional datapoint I haven’t seen mentioned yet: in some responses, the x-rate-limit-remaining and x-rate-limit-reset headers are present (though they contain junk data), but the x-rate-limit-limit header is totally absent.


Thank you all for these data points. Working on a couple of other tasks but we have this on our radar.


Well, the good news is that I haven’t detected an API response with a bad/duplicate content length header since 22:37UTC on Friday. Has anybody else detected any problems since that time?


I was just about to report the same thing.