{"id":1258,"date":"2022-04-10T03:41:45","date_gmt":"2022-04-10T03:41:45","guid":{"rendered":"http:\/\/tiemensfamily.com\/timoncs\/?p=1258"},"modified":"2022-04-10T03:41:45","modified_gmt":"2022-04-10T03:41:45","slug":"amazon-s3-etag-advanced-information","status":"publish","type":"post","link":"https:\/\/tiemensfamily.com\/timoncs\/2022\/04\/10\/amazon-s3-etag-advanced-information\/","title":{"rendered":"Amazon S3 ETag Advanced Information"},"content":{"rendered":"\n<p>You are probably here because you looked at one of your S3 object&#8217;s ETag, and it had a dash character (&#8220;-&#8220;) in it.  Most of your other ETag values are simple and correct md5sum hashes.  But this one is weird.<\/p>\n\n\n\n<p>Or, you&#8217;re here because one of your S3 object&#8217;s ETag ends with &#8220;-2&#8221;, and you&#8217;ve looked up multipart, and you&#8217;ve seen the <a href=\"https:\/\/aws.amazon.com\/premiumsupport\/knowledge-center\/s3-upload-large-files\/#:~:text=The%20default%20value%20is%201%2C000,upload%20for%20an%20individual%20file.\">multipart documentation<\/a> around &#8220;multipart_threshold&#8217; and &#8216;multipart_chunksize&#8217;, so you know that &#8220;-2&#8221; means the ETag was computed as two (2) chunks.  But things are still not working out.<\/p>\n\n\n\n<p>Or, you&#8217;re here because you know that &#8220;-2&#8221; means two (2) chunks, and you know the default chunk size is 8MB (8*1024*1024 bytes).  Which is all super, except the object is 18MB in size &#8211; and 8+8 is only 16 &#8211; surely S3 is not throwing away chunks?  What is going on here?<\/p>\n\n\n\n<p>The TL;DR answer is &#8211; S3 uses both 8MB and 16MB as the &#8220;default&#8221; chunk size (and, I assume, 32MB, 64MB, etc.  Once you break the rules, nothing stops you from doing it again.)  As a concrete example &#8211; the object size was 17,325,568 bytes and the ETag was &#8220;c44bfa98b2c188777ed18cb9190e304b-2&#8221;.  I used aws cli (aws-cli\/2.0.50 Python\/3.7.3 Linux) for this upload, so it should have used 8MB chunks, which means the ETag should end in &#8220;-3&#8221;, not &#8220;-2&#8221;.  Running the code (below) shows that 16MB chunks creates a matching ETag using the local file.<\/p>\n\n\n\n<p>I used &#8220;calculate_s3_etag&#8221; from this <a href=\"https:\/\/stackoverflow.com\/questions\/12186993\/what-is-the-algorithm-to-compute-the-amazon-s3-etag-for-a-file-larger-than-5gb\">stackoverflow <\/a>post by <a href=\"https:\/\/stackoverflow.com\/users\/518169\/hyperknot\">hypernot<\/a> [which seems to be in <a href=\"https:\/\/github.com\/hisplan\/compute-s3-etag\/blob\/master\/compute_s3_etag.py\">github <\/a>&#8211; but I used the stackoverflow code, not the github code].  I have confirmed the stackoverflow code works with my 30,000+ files &#8211; after trying 8MB, then trying 16MB &#8211; to compute the ETag from a local file.<\/p>\n\n\n\n<p>Other references:<\/p>\n\n\n\n<ul><li>Seems to indicate only 8MB and 16MB are chunk sizes (8MB for aws cli aka boto3, and 16MB for s3cmd).  Since I&#8217;ve only used &#8216;aws s3 sync &#8230;&#8217; to upload files, and I&#8217;ve seen ETags &#8220;right next to each other&#8221; where one uses 8MB and the other uses 16MB, I know this is not a rule.  Maybe it&#8217;s a &#8220;guideline&#8221;.<\/li><li><a href=\"https:\/\/stackoverflow.com\/questions\/6591047\/etag-definition-changed-in-amazon-s3\">Another stackoverflow<\/a> has the code in python, go, powershell, etc.   This article also mentions &#8211; but I have not tried this yet: <\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>aws configure set default.s3.multipart_threshold 64MB<\/code><\/pre>\n\n\n\n<ul><li><a href=\"https:\/\/pypi.org\/project\/s3id\/\">Pypi <\/a>has a page that talks about defaults of 5MB, 8MB, 15MB and 16MB<\/li><li>This <a href=\"https:\/\/teppen.io\/2018\/06\/23\/aws_s3_etags\/\" data-type=\"URL\" data-id=\"https:\/\/teppen.io\/2018\/06\/23\/aws_s3_etags\/\">teppen.io<\/a> post has some information (but the description doesn&#8217;t agree with any S3 documentation)<\/li><li>This <a href=\"https:\/\/savjee.be\/2015\/10\/Verifying-Amazon-S3-multi-part-uploads-with-ETag-hash\/\" data-type=\"URL\" data-id=\"https:\/\/savjee.be\/2015\/10\/Verifying-Amazon-S3-multi-part-uploads-with-ETag-hash\/\">savjee.be<\/a> post has the implementation in Bash.<\/li><\/ul>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>You are probably here because you looked at one of your S3 object&#8217;s ETag, and it had a dash character (&#8220;-&#8220;) in it. Most of your other ETag values are simple and correct md5sum hashes. But this one is weird. &hellip; <a href=\"https:\/\/tiemensfamily.com\/timoncs\/2022\/04\/10\/amazon-s3-etag-advanced-information\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[5,7],"tags":[],"_links":{"self":[{"href":"https:\/\/tiemensfamily.com\/timoncs\/wp-json\/wp\/v2\/posts\/1258"}],"collection":[{"href":"https:\/\/tiemensfamily.com\/timoncs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tiemensfamily.com\/timoncs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tiemensfamily.com\/timoncs\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/tiemensfamily.com\/timoncs\/wp-json\/wp\/v2\/comments?post=1258"}],"version-history":[{"count":3,"href":"https:\/\/tiemensfamily.com\/timoncs\/wp-json\/wp\/v2\/posts\/1258\/revisions"}],"predecessor-version":[{"id":1261,"href":"https:\/\/tiemensfamily.com\/timoncs\/wp-json\/wp\/v2\/posts\/1258\/revisions\/1261"}],"wp:attachment":[{"href":"https:\/\/tiemensfamily.com\/timoncs\/wp-json\/wp\/v2\/media?parent=1258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tiemensfamily.com\/timoncs\/wp-json\/wp\/v2\/categories?post=1258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tiemensfamily.com\/timoncs\/wp-json\/wp\/v2\/tags?post=1258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}