TON api multi-chunk upload errors HTTP/400

ton

#1

I am constantly encountering this issue with TON upload dealing with multi-chunks upload where the intial upload works fine gets back a HTTP/200 but all the chunk uploads are getting back a HTTP/400 (bad request) even though i seem to be setting all the headers correctly.
The chunk size is set to 8Mb and the file size is 4Gb.
What is causing this?

here’s a quick trace of the logging i’m seeing:

2015-06-08 03:03:54,127 INFO  com.netflix.twitter.api.ton.runnables.ChunksRunnable:130 [TON-001] [run] Uploading chunk to https://ton.twitter.com/1.1/ton/data/ta_partner/757216850/KBZDPDdGXmFwzdW.csv?resumable=true&resumeId=703169
2015-06-08 03:03:54,127 INFO  com.netflix.twitter.api.ton.runnables.ChunksRunnable:138 [TON-001] [run] HTTP headers:{X-TON-Expires=Sat, 13 Jun 2015 03:03:54 UTC, Content-Range=bytes 70368744177664-70368752566271/4350085740, X-TON-Content-Type=text/csv, Content-Type=text/csv}
2015-06-08 03:03:54,130 INFO  com.netflix.twitter.api.ton.runnables.ChunksRunnable:138 [TON-002] [run] HTTP headers:{X-TON-Expires=Sat, 13 Jun 2015 03:03:54 UTC, Content-Range=bytes 140737488355328-140737496743935/4350085740, X-TON-Content-Type=text/csv, Content-Type=text/csv}
2015-06-08 03:03:54,141 INFO  com.netflix.twitter.api.ton.runnables.ChunksRunnable:138 [TON-003] [run] HTTP headers:{X-TON-Expires=Sat, 13 Jun 2015 03:03:54 UTC, Content-Range=bytes 211106232532992-211106240921599/4350085740, X-TON-Content-Type=text/csv, Content-Type=text/csv}
2015-06-08 03:03:54,146 INFO  com.netflix.twitter.api.ton.runnables.ChunksRunnable:138 [TON-004] [run] HTTP headers:{X-TON-Expires=Sat, 13 Jun 2015 03:03:54 UTC, Content-Range=bytes 281474976710656-281474985099263/4350085740, X-TON-Content-Type=text/csv, Content-Type=text/csv}
2015-06-08 03:03:54,150 INFO  com.netflix.twitter.api.ton.runnables.ChunksRunnable:138 [TON-005] [run] HTTP headers:{X-TON-Expires=Sat, 13 Jun 2015 03:03:54 UTC, Content-Range=bytes 351843720888320-351843729276927/4350085740, X-TON-Content-Type=text/csv, Content-Type=text/csv}



2015-06-08 03:04:43,027 INFO  com.netflix.twitter.api.ton.runnables.ChunksRunnable:154 [TON-016] [run] Got response=HttpResponseProxy{HTTP/1.1 400 Bad Request [cache-control: no-cache, content-length: 0, date: Mon, 08 Jun 2015 03:04:42 GMT, server: tsa_b, set-cookie: guest_id=v1%3A143373268289484626; Domain=.twitter.com; Path=/; Expires=Wed, 07-Jun-2017 03:04:43 UTC, strict-transport-security: max-age=631138519, x-connection-hash: 665c5a2d34fa48133af5a31b287c9290, x-rate-limit-limit: 90000, x-rate-limit-remaining: 89998, x-rate-limit-reset: 1433733534, x-response-time: 112, x-tsa-request-body-time: 48442] [Content-Length: 0,Chunked: false]}


2015-06-08 03:04:43,588 INFO  com.netflix.twitter.api.ton.runnables.ChunksRunnable:154 [TON-002] [run] Got response=HttpResponseProxy{HTTP/1.1 400 Bad Request [cache-control: no-cache, content-length: 0, date: Mon, 08 Jun 2015 03:04:43 GMT, server: tsa_b, set-cookie: guest_id=v1%3A143373268346751401; Domain=.twitter.com; Path=/; Expires=Wed, 07-Jun-2017 03:04:43 UTC, strict-transport-security: max-age=631138519, x-connection-hash: c15e8ea6b2c8495fc20037c1867e81f8, x-rate-limit-limit: 90000, x-rate-limit-remaining: 89997, x-rate-limit-reset: 1433733534, x-response-time: 118, x-tsa-request-body-time: 49208] [Content-Length: 0,Chunked: false]}

2015-06-08 03:04:45,346 INFO  com.netflix.twitter.api.ton.runnables.ChunksRunnable:154 [TON-013] [run] Got response=HttpResponseProxy{HTTP/1.1 400 Bad Request [cache-control: no-cache, content-length: 0, date: Mon, 08 Jun 2015 03:04:45 GMT, server: tsa_b, set-cookie: guest_id=v1%3A143373268523319474; Domain=.twitter.com; Path=/; Expires=Wed, 07-Jun-2017 03:04:45 UTC, strict-transport-security: max-age=631138519, x-connection-hash: d6d3795479cb100d4c3ce536a14446af, x-rate-limit-limit: 90000, x-rate-limit-remaining: 89996, x-rate-limit-reset: 1433733534, x-response-time: 109, x-tsa-request-body-time: 50920] [Content-Length: 0,Chunked: false]}

Problem with audience manager trying to create an audience through API
#2

By the way, here’s the initial post which succeeded:

2015-06-08 03:03:53,997 INFO  com.netflix.twitter.api.ton.runnables.InitialRunnable:85 [TaskGroup-000] [call] HTTP POST:{X-TON-Content-Length=4350085740, X-TON-Expires=Sat, 13 Jun 2015 03:03:53 UTC, X-TON-Content-Type=text/csv, Content-Type=text/csv}
2015-06-08 03:03:54,112 INFO  com.netflix.twitter.api.ton.runnables.InitialRunnable:97 [TaskGroup-000] [call] Got response=HttpResponseProxy{HTTP/1.1 201 Created [content-length: 0, content-type: text/csv, date: Mon, 08 Jun 2015 03:03:54 GMT, expires: Mon, 15 Jun 2015 03:03:54 UTC, location: /1.1/ton/data/ta_partner/757216850/KBZDPDdGXmFwzdW.csv?resumable=true&resumeId=703169, server: tsa_b, set-cookie: guest_id=v1%3A143373263409270687; Domain=.twitter.com; Path=/; Expires=Wed, 07-Jun-2017 03:03:54 UTC, strict-transport-security: max-age=631138519, x-connection-hash: 3c8441751dcc332cfbfeea8bc8de40f2, x-rate-limit-limit: 90000, x-rate-limit-remaining: 89999, x-rate-limit-reset: 1433733534, x-response-time: 17, x-ton-max-chunk-size: 67108864, x-ton-min-chunk-size: 1048576] [Content-Type: text/csv,Content-Length: 0,Chunked: false]}

#3

And lastly, because I use the apache http client, the Content-Length gets set automatically by the library before sending the request which is why it doesn’t appear in (these) logs.


#4

Your first PUT request should have the range 0-XXX/YYYY

where XXXX is the chunksize-1 and YYYY is the full size of the doc as you sent in the X-TON-Content-Length in the first call.

Your numbers are way off.

Also you do not need the X-TON calls for the PUT requests, just for the first POST.


#5

@pykler Ok, so there seems to be an overflow that sneaked in the code somehow which does indeed mess up the number and we’ll be looking into that.

Your reply makes an interesting point thought about the initial PUT: the very first PUT that follows the initial POST must have the range 0-XXXX/YYY or can it be any other range?

The reason why I am asking is because after the initial POST, we are piping all the PUT requests through a thread pool and as such have no control over the schedulling – which means the first PUT could end up being for a chunk in the file which is not the first one. (And lookin at the logs here, while the numbers are indeed off, this is definitely the case where the initial chunks uploaded are from somewhere in the middle of the file – so the range doesn’t start with 0.)
Do we have to ensure the initial PUT has the range 0-XXX?

Secondly, do these chunks have to be uploaded in order?

For instance, imagine this scenario (I’m using small numbers here just to exemplify, I am aware of the size requirements etc): file a.txt gets split in 4 chunks :

  1. 0-9
  2. 10-19
  3. 20-29
  4. 30-39

Let’s say that the initial chunk upload is for chunk 1 (0-9) then the other 3 uploads go through a thread pool so they end up being uploaded in this order:

  1. chunk 4 (30-39)
  2. chunk 2 (10-19)
  3. chunk 3 (20-29)

Will this fail when uploading to TON or will it work? Does the chunk upload have to follow the order 1,2,3,4?

Thanks,


#6

I have not tested but I would think they have to be sequential and in order. Try confirming that the sequential upload works first then parallelize the solution (that would be my approach).


#7

OK I will give that a go, however, if the sequential order is required this is a huge issue for the TON API because it means uploading large files cannot be optimized in any way and becomes a bottleneck.


#8

@pykler just validated going through the code that the overflow issue was just a reporting (logging) issue, the chunks were actually send the correct size but I think the problem was the order of the chunks which I am trying to confirm.


#9

@pykler and are you also saying that for the PUT requests (after the initial POST) none of the X-TON… headers are required? Does it cause issues if they are present?


#10

OK so having done all of that and ensure the first PUT is the range 0…XXX I am still getting HTTP/400 – here are some details:

{X-TON-Expires=Sat, 13 Jun 2015 13:50:39 UTC, Content-Range=bytes 0-4194303/5948800, X-TON-Content-Type=text/comma-separated-values, Content-Type=text/comma-separated-values}

This is part of a test trying to upload a 5Mb file into chunks of 4Mb. The content sent is is the size advertised in the headers. The initial POST was successful and this is the very first PUT yet it’s failing.

Ideas?


#11

@liviutudor are you still doing the uploads in parallel or did you switch it to perform the upload sequentially?


#12

@brandonmblack They are going sequentially for this test.


#13

would a 400 occur if the upload takes longer than 30 seconds?


#14

@liviutudor If your file is less than 8MB you cannot send it with resumable. It can only be sent as a single chunk. This is undocumented but I have tested it and mentioned it in the previous thread.

If your file is greater than 8MB you can chunk it. The POST request has some headers that tell you the min and max chunk size.


#15

The initial POST though was never successful for me, if the content-length is too small (less than 8MB).


#16

Can you not use the X-TON parameters in your PUT request. I haven’t tested with those, but the PUT does not need any X-TON params.


#17

I’m actually using a chunk size of 8Mb since this seems to be the “preferred” size for Twitter. It is within the boundaries of min/max chunks size coming back from TON from what I can see. However I am seeing intermitently HTTP/308 (which seems to indicate success) but most time it’s a 400 :frowning:


#18

Good point ok I’ll remove those and give it another try.


#19

P.S. I swear the docco used to show the TON headers in PUT before.


#20

Once you get a 400 usually the rest are all 400 to follow. Essentially though a 400 means nothing was accepted, so if you fix that request perhaps you should get the 308 and proceed.