MD5 hash checks on Azure Blob Storage files

TLDR: don't use md5sum; use openssl.

I uploaded a fairly large (7GB) binary file to Azure blob storage using azcopy:

azcopy copy --put-md5 file.dat https://a.blob.core.windows.net/b/file.dat

The --put-md5 option tells azcopy to compute the MD5 hash of the file and put it base64 representation in the Content-MD5 property of the blob.

Now, a certain service needs to download this file locally on startup, and do a hash check to verify its integrity. I decided to put it in a bash script and run it as an init container in the service's pod.

Now the problem is that the md5sum's computation wasn't matching the hash on the blob's Content-MD5 property.

I tried:

 md5sum --binary $filename | awk '{print $1}' | base64

It returns something else altogether. And also, it turns out that the --binary flag does nothing on GNU systems. From the docs:

On systems like GNU that do not distinguish between binary and text files, this option merely flags each input mode as binary: the MD5 checksum is unaffected.

Googled around and found a suggestion to use openssl dgst, and it worked!

openssl dgst -md5 -binary $filename | base64

Turns out, md5sum returns a hex representation of the hash and I had to unhex it before computing its base64:

md5sum --binary $filename | awk '{print $1}' | xxd -p -r | base64