MD5 hash checks on Azure Blob Storage files
TLDR: don't use md5sum; use openssl.
I uploaded a fairly large (7GB) binary file to Azure blob storage using azcopy
:
azcopy copy --put-md5 file.dat https://a.blob.core.windows.net/b/file.dat
The --put-md5
option tells azcopy to compute the MD5 hash of the file and put it base64 representation in the Content-MD5
property of the blob.
Now, a certain service needs to download this file locally on startup, and do a hash check to verify its integrity. I decided to put it in a bash script and run it as an init container in the service's pod.
Now the problem is that the md5sum
's computation wasn't matching the hash on the blob's Content-MD5
property.
I tried:
md5sum --binary $filename | awk '{print $1}' | base64
It returns something else altogether. And also, it turns out that the --binary
flag does nothing on GNU systems. From the docs:
On systems like GNU that do not distinguish between binary and text files, this option merely flags each input mode as binary: the MD5 checksum is unaffected.
Googled around and found a suggestion to use openssl dgst
, and it worked!
openssl dgst -md5 -binary $filename | base64
Turns out, md5sum
returns a hex representation of the hash and I had to unhex it before computing its base64:
md5sum --binary $filename | awk '{print $1}' | xxd -p -r | base64