teaser

hashstress - Hardware stress test using hash calculations

This post is about a little script I wrote named: hashstress. It's intended for stressing and testing the correctness of disk I/O, cache, memory and processing subsystems for an extended period of time. The basic idea is to generate files with random content and repeatedly calculate hash checksums verifying that they are the same every time.

The script is configured with the size of dataset, number of threads, type of hash-function and number of repetitions.

Background

I was investigating a server with hardware stability issues and wanted to test how it would handle heavy I/O and processing loads for an extended period of time. At the same time I wanted to verify if all the processing output was correct or if data was being corrupted.

I realised that repeated hash checksum calculations would be ideal for the task. Even a single incorrect bit or miscalculation would result in a totally different result, and by choosing the size of the dataset either memory cache or disk reads could be targeted by the test.

The script

Get it here: hashstress or from GitHub

The script is written in bash. It uses shred to generate the random files in a temporary location and for the checksum calculation programs such as md5sum or hashdeep can be used:

  • md5sum is very fast and suitable for testing high IO throughput
  • hashdeep performs both MD5 and SHA-256 hashing and will tax the CPU more

Example runs

Download the script and make it executable:

$ git clone https://github.com/larspontoppidan/hashstress.git
$ cd hashstress
$ chmod +x hashstress
$ ./hashstress 
hashstress rev. 2018-06-17, see more: http://larsee.com/blog/tag/hashstress

Syntax: ./hashstress <size pr. thread> <threads> <repetitions> <hash-cmd>

Example: ./hashstress 1G 4 20 md5sum

The following test will generate a dataset of 4 GB and have four threads running md5sum calculations 20 times:

$ ./hashstress 1G 4 20 md5sum
hashstress rev. 2018-06-17, see more: http://larsee.com/blog/tag/hashstress
Test config:
 - Data set: 4 file(s) each 1G
 - Threads: 4, repetitions: 20
 - Hash command: md5sum

Start time: Sun Jun 17 17:38:31 EEST 2018
Generating random data file(s) ...
Generating reference checksum(s) ...
Repetition: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Cleaning up
Repetitions took: 42 seconds
Test result: PASS

The following example is similar but with a dataset of 8 GB:

$ ./hashstress 2G 4 20 md5sum
hashstress rev. 2018-06-17, see more: http://larsee.com/blog/tag/hashstress
Test config:
 - Data set: 4 file(s) each 2G
 - Threads: 4, repetitions: 20
 - Hash command: md5sum

Start time: Sun Jun 17 17:44:54 EEST 2018
Generating random data file(s) ...
Generating reference checksum(s) ...
Repetition: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Cleaning up
Repetitions took: 304 seconds
Test result: PASS

In the first test the dataset was small enough to fit in the system's RAM caches, so the test ran entirely without disk I/O. In the second test this was not the case and the system ended up using disk I/O all the time, hence the much slower test execution time.

In the second test the CPU was not loaded very much because of the disk I/O bottleneck. To load the CPU more, the hashdeep checksum command may be used instead, for example:

$ sudo apt install hashdeep
$ ./hashstress 2G 4 20 hashdeep
hashstress rev. 2018-06-17, see more: http://larsee.com/blog/tag/hashstress
Test config:
 - Data set: 4 file(s) each 2G
 - Threads: 4, repetitions: 20
 - Hash command: hashdeep

Start time: Sun Jun 17 17:52:37 EEST 2018
Generating random data file(s) ...
Generating reference checksum(s) ...
Repetition: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Cleaning up
Repetitions took: 350 seconds
Test result: PASS

Actual test script

The actual test I ran on the server took something like 7 hours and called hashstress multiple times with various parameters. This was the script I used:

#!/bin/bash

LOGFILE=hpe_stress_log
LOOPS=4

for (( j = 1; j <= $LOOPS; j++ ))
do
echo "--- HPE STRESS LOOP: ${j} ---" >> ${LOGFILE}

# Doesn't stress HD: (20 min)
./hashstress 1600M 4 100 hashdeep >> ${LOGFILE}

# Stresses HD: (~25 min)
./hashstress 2G 4 100 hashdeep >> ${LOGFILE}

# Doesn't stress HD: (20 min)
./hashstress 1G 6 400 md5sum >> ${LOGFILE}

# Stresses HD: (~25 min)
./hashstress 2G 4 100 md5sum >> ${LOGFILE}

# Doesn't stress HD: (20 min)
./hashstress 1G 4 400 md5sum >> ${LOGFILE}

done


# loop total: 110 min x 4 = 440 min = 7 hours

And the server passed all the tests, by the way...


Comments

Comments powered by Talkyard