Cryptographic Hash Functions

A cryptographic hash function creates a unique digital fingerprint of binary data. A hash function takes an input (message) and produces a message digest such that any change to the original message will result in a different hash result:

A cryptographically strong hash function must satisfy the following requirements:

  • H(m) is relatively easy to compute for any given message m;
  • a hash value has a fixed length; an input message can be arbitrary-sized;
  • hash algorithms are one-way functions: it should be computationally infeasible to derive an original message from a given hash value;
  • hash functions are collision-free; if it is computationally infeasible to find any two messages m1 and m2 such that H(m1) = H(m2), then a hash algorithm is said to be a strongly collision-free function; if, given a message m1, it is infeasible to find another message m2 such that m1 ≠ m2 and H(m1) = H(m2), then a hash function is said to be weakly collision-free.

Hash functions can be used independently (for example, message digests are usually employed to check the integrity of local or downloaded files), or as part of more complex cryptographic operations.


Desktop Tools

OpenSSL is undoubtedly the most popular cryptographic toolkit today. This is an open-source implementation of the SSL and TLS protocols. LibreSSL maintained by the OpenBSD Team is its fork providing implementations of newer cryptographic algorithms. In addition to the TLS infrastructure projects, developers can turn to system-specific security tools, e.g. sha256sum from the GNU/Linux core utilities.

OpenSSL

Compiling the source code of OpenSSL creates security libraries as well as the openssl console application exposing various commands:

openssl list-standard-commands

The list of the standard commands includes dgst:

openssl dgst -sha256 file.pdf

The default format of the resultant hash is a hexadecimal string. The -binary option makes the dgst command produce the raw byte output:

openssl dgst -sha256 -binary file.pdf

Using the dgst with the -r option will display the hash in the format supported by GNU coreutils:

sha256sum file.pdf
openssl dgst -sha256 -r file.pdf

Both tools in the example above will print the hash as

419c4787949b87893a8e535eb72acd97e10ad3777f5c11fd8b3113bf459a3aea *file.pdf

The -c option separates the digest by colons:

openssl dgst -sha256 -c file.pdf
SHA256(file.pdf)= 41:9c:47:87:94:9b:87:89:3a:8e:53:5e:b7:2a:cd:97:e1:0a:d3:77:7f:5c:11:fd:8b:31:13:bf:45:9a:3a:ea

The dgst is not the only instrument that can be launched in OpenSSL for computing cryptographic hashes. Digest commands are printed out by

openssl list-message-digest-commands

Information about implemented algorithms and their aliases are retrieved by the following pseudo-command:

openssl list-message-digest-algorithms

As an example of using a hash command other than dgst, the directive below computes the same hash with the case-insensitive sha256:

openssl sha256 file.pdf

GNU Core Utilities

The GNU Core Utilities is a package of software for Unix-like operating systems. Such applications as md5sum, sha1sum, sha224sum, sha256sum, sha384sum and sha512sum compute message digests of files:

sha256sum file.xml

The sha256sum shows a message digest as a formatted string:

26b1d4188e3f21cb94a3c6bb1f9f22e0fe2a082d8e69abea08793be53b03811a *file.xml

The line above contains the SHA-256 message digest, the binary flag (*) and the name of the hashed file.

The --text option treats the input file as text:

sha256sum --text file.xml

The text flag is the empty space:

26b1d4188e3f21cb94a3c6bb1f9f22e0fe2a082d8e69abea08793be53b03811a file.xml

Formatted hashes of several files can be saved locally:

sha256sum file.xml > digests.txt
sha256sum file.pdf >> digests.txt
sha256sum file.docx >> digests.txt

The digests.txt passed to the sha256sum utility with the --check option allows sha256sum to perform hash verification of multiple files:

sha256sum --check digests.txt

Windows CertUtil

The certutil tool is a command-line application that is installed as part of Windows Certificate Services. It contains a number of subsidiary verbs allowing developers to use it for various cryptographic tasks outside the scope of the public-key certificate framework; for example, the -hashfile verb produces message digests of input files:

certutil.exe -hashfile file.htm SHA256


Programming Languages

Hash functions are profusely implemented in software. Applications written in C usually rely on OpenSSL routines, server-side security is coded in PHP or Java, and ASP.NET developers combine the cryptographic services of the .NET framework with powerful features of C# or Visual Basic. C++, Perl, Ruby and other programming languages have their own security APIs created as part of the standard library or as supplementary modules. A number of examples below will demonstrate digests computation in PHP, Python, Java and C#.

PHP

PHP is a widely used programming language originally designed for the server-side Web development. Standard hash functions in PHP are implemented in three cryptography extensions.

HASH Message Digest Framework

PHP HASH Message Digest Framework is a cryptographic engine for both direct and incremental processing of the message data. The list of the algorithms supported by the extension is retrieved by calling the hash_algos() function:

<?php
 $algorithms = hash_algos();
 foreach ($algorithms as $algorithm) {
  echo "$algorithm\n";
 }
?>

Computing a digest of a string is a single-part operation:

<?php
 $message = 'message';
 $digest = hash('sha256', $message);
 echo $digest;
?>

The cryptographic operation above is equivalent to any of the following:

echo -n "message" | openssl dgst -sha256
echo -n "message" | sha256sum

The hash() function can accept three arguments. The third argument is a boolean value; if it is TRUE, the digest is produced as a binary string:

$raw = hash('sha256', $message, TRUE);

The raw digest can be converted to a sequence of Base64 characters:

echo base64_encode($raw);

This operation as a set of console commands could look like this:

echo -n "message" | openssl dgst -sha256 -binary | base64

The base64 tool is another GNU/Linux core utility transforming data read from a file or standard input into Base64 encoded form.

Incremental hashing is provided by the hash_init(), hash_update() and hash_final() functions:

<?php
 $context = hash_init('sha256');
 hash_update($context, 'The thing that hath been, it is that which shall be; ');
 hash_update($context, 'and that which is done is that which shall be done: ');
 hash_update($context, 'and there is no new thing under the sun.');
 $digest = hash_final($context);
 echo $digest;
?>

The same initialize/update/finalize approach is applied to file hashing:

<?php
 $context = hash_init('sha256');
 hash_update_file($context, 'file.pdf');
 $digest = hash_final($context);
 echo $digest;
?>

Files can be processed within a stream context:

<?php
 $f = fopen('file.pdf', 'rb');
 $context = hash_init('sha256');
 hash_update_stream($context, $f);
 $digest = hash_final($context);
 fclose($f);
 echo $digest;
?>

The convenience function for computing file digests is hash_file():

$digest = hash_file('sha256', 'file.pdf');

Mhash

Another PHP extension for cryptographic digests and checksums is Mhash:

<?php
 $id = mhash_count();
 for ($i = 0; $i <= $id; $i++) {
  $algorithm = mhash_get_hash_name($i);
  if($algorithm != FALSE) {
   echo "$algorithm\n";
  }
 }
?>

Cryptographic algorithms implemented in Mhash are numbered from 0 to the value returned from the mhash_count(). Passing IDs of algorithms to the mhash_get_hash_name() retrieves algorithm names.

Additional characteristics of the supported hash functions are obtained by calling the mhash_get_block_size():

if($algorithm != FALSE) {
 $size = mhash_get_block_size($i);
 echo "$algorithm: message digest size is $size bytes\n";
}

The extension is based on the open-source mhash library, so developers can compare similar functions in both C and PHP:

C code snippet
int i;
size_t block;
char *algorithm;
size_t s = mhash_count();
for(i = 0; i <= s; i++) {
 algorithm = mhash_get_hash_name(i);
 if(algorithm! = NULL){
  block = mhash_get_block_size(i);
  printf("%s: message digest size is %i bytes\n", algorithm, block);
  free(algorithm);
 }
}

A message digest is the result of the mhash() function call:

$digest = mhash(MHASH_SHA256, 'message');
echo bin2hex($digest);

Unlike PHP, native mhash routines are more complicated: they require explicit context initialization.

C code snippet
int i;
MHASH ctx;
unsigned char digest[32];
ctx = mhash_init(MHASH_SHA256);
mhash(ctx, "message", strlen("message"));
mhash_deinit(ctx, digest);
for (i = 0; i < mhash_get_block_size(MHASH_SHA256); i++) {
 printf("%.2x", digest[i]);
}

OpenSSL

PHP interface to OpenSSL exposes openssl_get_md_methods() and openssl_digest() functions to applications computing cryptographic digests:

<?php
 echo "Digest algorithms without aliases:\n";
 $md = openssl_get_md_methods();
 foreach($md as $algorithm){
  echo "$algorithm\n";
 }

 echo "Digest algorithms with aliases:\n";
 $mda = openssl_get_md_methods(true);
 foreach($mda as $algorithm){
  echo "$algorithm\n";
 }

 echo "Aliases:\n";
 $aliases = array_diff($mda, $md);
 foreach($aliases as $alias){
  echo "$alias\n";
 }
?>

Data hashing does not require preliminary initialization:

$message = 'Learning without thought is labor lost; thought without learning is perilous.';
$digest = openssl_digest($message, 'SHA256');

By default, the hash value is returned as a hexadecimal string. If the openssl_digest() has the third argument equal to TRUE, the hash is represented as the raw binary data:

$digest = openssl_digest($message, 'SHA256', TRUE);

Python

Python cryptographic services implemented in its standard library expose basic hash algorithms in the hashlib module:

import hashlib

The algorithms returns a tuple of the algorithms guaranteed by the module:

print hashlib.algorithms

MD5, SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512 are always present, so the following code snippet can be employed to instantiate a SHA-256 engine:

sha256 = hashlib.sha256()

# block size is 64 bytes
print sha256.block_size

# message digest size is 32 bytes
print sha256.digest_size

The hashlib functions combined with I/O routines enable file hashing:

file = io.open('file.pdf', 'rb')
message = file.read()
file.close()
sha256.update(message)
print sha256.hexdigest()

The raw hash is returned by calling the digest() method:

print base64.b64encode(sha256.digest())

Java

Java has a sophisticated cryptography architecture designed around algorithm independence and extensibility. An application just requests an instance of the engine class, and the Java platform retrieves necessary algorithm implementation from one of cryptographic service providers.

The MessageDigest is an engine class for hash functions:

MessageDigest sha256 = MessageDigest.getInstance("SHA-256");

The static getInstance() method above does not specify the name of the CSP, so providers are searched in preference order, and the first implementation of the requested algorithm is returned.

Computing a hash value in Java can be a single-part operation:

byte[] digest = sha256.digest("message".getBytes());

Incremental updates are also supported:

int i;
byte[] buffer = new byte[8192];
MessageDigest sha256 = MessageDigest.getInstance("SHA-256", "BC");
BufferedInputStream stream = new BufferedInputStream(new FileInputStream("file.docx"));
while((i = stream.read(buffer)) != -1) {
 sha256.update(buffer, 0, i);
}
byte[] digest = sha256.digest();

In the example above, SHA-256 algorithm implementation is requested from the BouncyCastle provider. Blocks of the input file are placed in the subsidiary buffer, then the buffer is used to update the digest engine. The final digest() method completes the hash computation and returns the SHA-256 hash value.

The raw hash array can be converted to various data formats for the network transfer or persistent storage, e.g. the Hex class from the org.bouncycastle.util.encoders package transforms the binary digest into a hexadecimal string:

String hexDigest = Hex.toHexString(digest);

Digest Streams

Java streams enable more advanced hashing techniques, e.g. an application calculates the digest of a Web asset while the asset is being downloaded. The resultant value is compared with the hash declared by the asset creator. In case of discrepancy the resource is discarded because it may have been tampered.

In the following example, an instance of the DigestInputStream updates the SHA-256 hash engine with data read from underlying network stream:

int i;
byte[] buffer = new byte[8192];
URL url = new URL("http://example.com/downloads/image-viewer.exe");
URLConnection connection = url.openConnection();
InputStream networkStream = connection.getInputStream();
BufferedInputStream stream = new BufferedInputStream(networkStream);
DigestInputStream digestStream = new DigestInputStream(stream, MessageDigest.getInstance("SHA-256"));
while((i = digestStream.read(buffer)) != -1) {
 System.out.println("Hashing "+String.valueOf(i)+" bytes of the network stream . . .");
}
byte[] digest = digestStream.getMessageDigest().digest();

DigestOutputStream objects perform a reverse operation: the message digest is computed while the message data is being written to the underlying stream. The next demo prepares a file for the Derby database and simultaneously calculates its hash:

int i;
byte[] buffer = new byte[8192];
File file = new File("document.docx");
BufferedInputStream inputStream = new BufferedInputStream(new FileInputStream(file));
String connectionString = "jdbc:derby://db.example.com:1527/officedb";
Connection connection = DriverManager.getConnection(connectionString);
Blob blob = connection.createBlob();
DigestOutputStream digestStream = new DigestOutputStream(blob.setBinaryStream(1), MessageDigest.getInstance("SHA-256"));
while((i = inputStream.read(buffer)) != -1) {
 digestStream.write(buffer, 0, i);
}
digestStream.flush();
byte[] digest = digestStream.getMessageDigest().digest();

Then the file is stored in the database:

PreparedStatement statement = connection.prepareStatement("INSERT INTO Documents VALUES (?, ?)");
statement.setString(1, file.getName());
statement.setBlob(2, blob);
statement.execute();

The message digest of the file can be kept locally by the client application. Next time when the application connects to the database and retrieves the stored document, it will compute its SHA-256 value again and compare it with the previously saved hash. Digests mismatch may reveal unauthorized modification of database records.

C#

.NET Framework classes supporting hash algorithms can be logically divided into three groups. Members of the first group inherit their API from abstract classes:

string messageString = "message";
byte[] message = Encoding.Default.GetBytes(messageString);
SHA256Managed sha256 = new SHA256Managed();
byte[] digest = sha256.ComputeHash(message);
foreach (byte b in digest) {
 Console.Write("{0}", b.ToString("x2"));
}

The SHA256Managed inherits its methods and properties from SHA256 which in its turn extends HashAlgorithm. The HashAlgorithm class acts as a "blueprint" for all hash functions.

An instance of the SHA256Managed can be brought to life by calling the Create() method of base classes:

HashAlgorithm sha256 = HashAlgorithm.Create("SHA256");

SHA256 sha256 = SHA256.Create();

An alternative way to instantiate an SHA256Managed is the use of the CryptoConfig mapping algorithm names to cryptography classes:

SHA256Managed sha256 = (SHA256Managed) CryptoConfig.CreateFromName("SHA256");

The second group of digest-related classes consists of managed code wrappers for the FIPS certified implementations of cryptographic algorithms, e.g. SHA256CryptoServiceProvider or SHA384CryptoServiceProvider:

SHA256CryptoServiceProvider sha256 = new SHA256CryptoServiceProvider();
byte[] message = Encoding.Default.GetBytes("message");
byte[] digest = sha256.ComputeHash(message);

The Cryptography Next Generation (CNG) classes constitute the third group:

SHA256Cng sha256 = new SHA256Cng();

All implementations of hash algorithms expose a uniform API, so any of the classes mentioned above can be used to hash both byte arrays and streams:

SHA256Managed sha256 = new SHA256Managed();
FileStream stream = new FileStream("utils.lib", FileMode.Open, FileAccess.Read);
byte[] digest = sha256.ComputeHash(stream);

Stream hashing provides functionality similar to Java digest streams. The example below computes a message digest of a network resource while the resource is still being loaded by the application:

WebRequest request = HttpWebRequest.Create("http://example.com/freedownloads/audio-player.exe");
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
SHA256Managed sha256 = new SHA256Managed();
byte[] digest = sha256.ComputeHash(stream);

Working with streams in Java and C# requires special handling of the stream lifecycle: a stream is initialized, then I/O operations are performed in a try/catch block; when the operations are over, the stream should be closed. The close methods are usually placed in the finally block: even if an exception has interrupted the normal flow of the program, the stream will be disposed of.


Hash Algorithms

All examples in this article have used the SHA-256 algorithm as one of the most frequently used hash function. The list of main hash algorithms is shown below. Some of them are considered weak and should be avoided in security-sensitive applications.

For demonstration, hash functions are represented as a table of standard algorithm names supported by the BouncyCastle provider: a name, its alias or an object identifier acting as the alias can be passed to the getInstance() method of the MessageDigest engine class.

Name Alias/OID Remarks
GOST3411 GOST
GOST-3411
1.2.643.2.2.9
This is a cryptographic function defined in the Russian GOST R 34.11-94. It produces a hash of 32 bytes. The standard is considered obsolete and insecure: it was replaced with the GOST R 34.11-2012 establishing the Streebog algorithm. Streebog. however, is also vulnerable.
MD2 1.2.840.113549.2.2 The MD2 hash function producing 16-byte digests is considered weak.
MD4 1.2.840.113549.2.4 The MD4 hash function with 16-byte digests is considered weak.
MD5 1.2.840.113549.2.5 MD5 was once a widely used cryptographic function producing 128-bit message digest. There were successful collision attacks against MD5, so the function should not be used in security applications.
RIPEMD128 1.3.36.3.2.2 RIPEMD stands for "RACE Integrity Primitives Evaluation Message Digest". It was designed by European scientists in 1996. The hash function produces a digest of 160 bits (RIPEMD160), but there are also 128, 256 and 320-bit versions of the algorithm.
RIPEMD160 1.3.36.3.2.1 The hash function is available in OpenSSL as ripemd, rmd160 or RIPEMD160 digest commands.
RIPEMD256 1.3.36.3.2.3
RIPEMD320
SHA-1 SHA
SHA1
1.3.14.3.2.26
SHA-1 produces 160-bit message digest. SHA-1 collision resistance has been impaired, so it should not be employed in security-critical applications. Browser vendors have announced that their browsers will stop accepting SHA-1 TLS certificates by 2017.
SHA-224 SHA224
2.16.840.1.101.3.4.2.4
The SHA-2 term covers a group of cryptographic functions developed as a more secure substitute for SHA-1: these are SHA-224, SHA-256, SHA-384 and SHA-512 algorithms named after their digest lengths.
SHA-256 SHA256
2.16.840.1.101.3.4.2.1
SHA-384 SHA384
2.16.840.1.101.3.4.2.2
SHA-512 SHA512
2.16.840.1.101.3.4.2.3
SHA-512/224 SHA512/224
2.16.840.1.101.3.4.2.5
SHA-512/224 and SHA-512/256 are truncated versions of SHA-512.
SHA-512/256 SHA512256
2.16.840.1.101.3.4.2.6
SHA3-224 In 2012 the NIST hash function competition resulted in selecting Keccak as a new Secure Hash Algorithm (SHA-3). SHA-3 can be configured to have the output of 224, 256, 384, or 512 bits: these are SHA3-224, SHA3-256, SHA3-384 and SHA3-512 standard algorithm names, respectively.
Skein-256-128 Skein was one of SHA-3 finalists. The function can produce digests of various sizes. BouncyCastle provider supports Skein-256-128, Skein-256-160, Skein-256-224, Skein-256-256, Skein-512-128, Skein-512-160, Skein-512-224, Skein-512-256, Skein-512-384, Skein-512-512, Skein-1024-384, Skein-1024-512 and Skein-1024-1024 standard algorithm names.
SM3 1.2.156.197.1.401 This is a hash function created in China. SM3 computes digests of 32 bytes.
TIGER The default digest size of the Tiger hash function is 192 bits.
WHIRLPOOL Whirlpool returns a 512-bit digest.