WPA2 Cracking On GPU with Hashcat and CUDA

In the previous tutorial, we used the Hashcat tool to crack the WPA2 passphrase using the GPU. In this tutorial, we will show how to speed up the passphrase cracking process on an NVIDIA GPU using the CUDA toolkit along with CPU. In addition, we'll demonstrate how to install and use legacy and current versions of Hashcat to crack a WPA2 passphrase. 

Hardware:
- Laptop Legion 5 Pro 16IAH7H (for generating  AI images with NVIDIA GPU)
- GPU: NVIDIA GeForce RTX 3070 Ti 8GB 150 W GPU, 8GB vRAM
- CPU: 12th Gen Intel(R) Core(TM) i9-12900H
- RAM: 32 GB

Software:
- OS Debian 11
- Driver Version: 470.182.03 from Debian 11 repository
- CUDA Version: 11.4 from Debian 11 repository
- Hashcat 6.1.1 from Debian 11 repository
- Hashcat 6.2.6 from GitHub


- Pcap file output_file-01.7z with 4-way handshake - https://brezular.com/wp-content/uploads/2021/02/output_file-01.7z
- Binary hash file hash file - output_file-01.hccapx https://brezular.com/wp-content/uploads/2021/03/output_file-01.hccapx.zip for legacy Hashcat 6.1.1
- Hash file output_file-01.hc22000 for Hashcat 6.2.4 and later

Note: 
The passphrase is a dictionary word from rockyou.txt: submarine

1. Installing Hashcat
+++++++++++++++++++++

1.2 Legacy Hashcat 6.1.1

We will use legacy hashcat version 6.1.1 which you can install from Debian 11 repository with the command below. This version of Hashcat still supports 2500 - WPA-EAPOL-PBKDF2 hash mode and the binary hash format hccapx (Picture 1).  

$ sudo apt install hashcat

Figure 1 - The Hashcat version 6.1.1 Installed from Debian 11 Repository in /usr/bin

The captured pcap file with 4-way handshake is here: https://brezular.com/wp-content/uploads/2021/02/output_file-01.7z. However,  we need to create hccapx hash file from this pcap file which is the format supported by old Hashcat 6.1.1. It can be done with cap2hccapx.bin tool.

Note: Hccapx file contains 13 attributes such as EAPOL, EAPOL length, ESSID, BSSID (MAC of AP) etc. Read more here (https://hashcat.net/wiki/doku.php?id=hccapx).

We explained how to install and use cap2hccapx in the first part of the Cracking WPA/WPA2 Pre-shared Key Using GPU tutorial (https://brezular.com/2021/07/01/cracking-wpa-wpa2-pre-shared-key-using-gpu). Therefore, we will skip it for now and just share the hccapx file, which is located in the output_file-01.hccapx.zip archive. This is the hash format that we will be using with Hashcat 6.1.1 installed from the repository.  


1.2 New Hashcat 6.2.6

Since version v6.2.4 plugins 2500/2501 and 16800/16801 are outdated and stopped working. So if your hashcat is 6.2.4 or later you cannot use use hash mode 2500 and the hash format hcaapx is not working as well. In that case you must install hcxtool  and use hcxpcaptool to convert captured EAPOL messages (4-way handshake) to the hashmode 22000 (WPA-PBKDF2-PMKID+EAPOL). The hasmode is supported by which is supported by the hashcat 6.0.0 and later. 

To demonstrate cracking passphrase with a new hashcat 6.2.6, we fist need to download from git and compile it as following:

$ git clone https://github.com/hashcat/hashcat.git
$ cd hashcat/
$ make
$ sudo make install

Figure 1 - The Hashcat Version 6.2.6 from Github

Now, we can install hcxtool utility from Debian 11 repository.

$ sudo apt install hcxtool

To convert capture to the hash recognized by Hashcat 6.2.4 and later, use the following command:

$ hcxpcapngtool -o output_file-01.hc22000  output_file-01.cap

The file output_file-01.cap is from the archive we used in Part 1.

2. CUDA versus OpenCL
++++++++++++++++++++
Before we delve into the installation process, let's explore why we will use CUDA toolkit for enhancing brute-force techniques. 

2.1 CUDA

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA. CUDA can run on Windows, Linux, and MacOS, but only on NVIDIA hardware. Hashcat will pick CUDA, if your hardware supports it, because CUDA is faster than OpenCL (see thread). If your compute device does not support CUDA, hashcat will fall back to the OpenCL backend.

When it comes to brute-force attacks, CUDA can be advantageous due to the following reasons:

- Massive parallelism: GPUs are designed with thousands of cores, allowing for massive parallel processing. CUDA can distribute the workload across these cores, significantly accelerating the process compared to traditional CPU-based approaches.

- Computational speed: GPUs are optimized for data-parallel computations, and they excel at performing repetitive, computationally intensive tasks. This makes them well-suited for brute-force attacks, which involve repetitive calculations to systematically generate and test potential solutions.

- CUDA libraries: NVIDIA provides libraries like cuRAND (random number generation) and cuBLAS (basic linear algebra subroutines) that are optimized for GPU computing. These libraries can be leveraged to enhance the performance of brute-force algorithms by utilizing the parallel processing capabilities of GPUs.


2.2 OpenCL

Open Computing Language (OpenCL) serves as an independent, open standard for cross-platform parallel programming. OpenCL applications can run on almost any operating system, and on most types of hardware, including NVIDIA, AMD, FPGAs and ASICs. A study that directly compared CUDA programs with OpenCL on NVIDIA GPUs showed that CUDA was 30% faster than OpenCL (https://www.run.ai/guides/nvidia-cuda-basics-and-best-practices/cuda-vs-opencl)


2.3 Do I need to Install NVIDIA proprietary Driver if I want to Use OpenCl for NVIDIA?

OpenCL is a framework for parallel programming that allows developers to write code that can be executed on various types of processors, including GPUs. However, to utilize OpenCL specifically with NVIDIA GPUs, you need to have the appropriate driver installed. The NVIDIA proprietary driver provides the necessary components and optimizations for OpenCL to work effectively on NVIDIA hardware.


3. NVIDIA Driver Installation and OpenCL Benchmark for 2500 Hash type
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

In this section, we will proceed with the installation of the NVIDIA proprietary driver from the Debian 11 repository.

Add contrib and non-free repositories to /etc/apt/sources.list

Figure 3 - Adding Contrin and Non-free Repositories to Sources.list

$ sudo apt update

Install NVIDIA proprietary driver:

$ sudo apt install nvidia-driver 

The command has installed Driver Version: 470.182.03.

Now, OpenCL should be used for NVDIA card. Let's check hashcat benchmark for 2500 hash type with the command bellow (Figure 4):

$ /usr/bin/hashcat -m 2500 -b


Figure 4 - Hashcat 6.1.1 Benchmark for Hash mode 2500 and NVIDIA GPU when OpenCL is Used

The hashrate is 547 000 hashes per second for NVIDIA GeForce RTX 3070 Ti GPI (for Device 1) when OpenCL are used with Hashcat 6.1.1. The Intel CPU - Device 2 is skipped or not used in the benchmark. We will latyer show how to use both GPU and CPU if needed.

Note: The Hashcat 6.2.6 with hash mode 22000 is about the same. In that case, the command for the benchmark is:

$ /usr/local/bin/hashcat -m 22000 -b

5. CUDA Toolkit Installation
++++++++++++++++++++++++++++

In this section, we will proceed with the installation CUDA toolkit from the Debian 11 repository.

In order to run a CUDA application, the system should have a CUDA enabled GPU and an NVIDIA display driver that is compatible with the CUDA Toolkit (Figure 5). Since we install both the NVIDIA driver and CUDA from the Debian 11 repository, we meet this requirement.

Figure 5 - Minimum Required Driver Version for CUDA Minor Version Compatibility
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

$ sudo apt install nvidia-cuda-toolkit

Here is the list of packages installed with nvidia-cuda-toolkit package.

We have successfully installed CUDA so the Hashcat now prefers CUDA to OpenCL. Hashcat benchamrk for CUDA API (Device 1) is 536 kH/s (Figure 6). OpenCL (Device 2 - NVIDA GPU) and Device 4 (Intel CPU) are skipped.

$ /usr/bin/hashcat -m 2500 -b

Figure 6 - Figure 4 - Hashcat 6.1.1 Benchmark for Hash mode 2500 and NVIDIA GPU when CUDA is Used

The value 536kH/s for CUDA is less than 547 kH/s when OpenCL is used. Howver, how it is written here (https://hashcat.net/forum/thread-11482.html) some benchmarks will indeed show up faster for OpenCL, but CUDA is still the preferred runtime. Ultimate speed is not the only consideration being made. There are limitations that make OpenCL on Nvidia cards more difficult to work with or more inconsistent than CUDA. A great example is the memory limitations imposed by the OpenCL runtime that, while not directly impacting performance in your ideal work benchmark state, will impact your ability to load different hash lists or attacks. For some reason, Nvidia interprets the OpenCL spec as stating that single allocations are limited to 1/4 the total memory size, which means we must work around that with multiple allocations or be limited to only using 1/4 of the VRAM. CUDA, however, does not have this same limitation.


Note: The Hashcat 6.2.6 hashrate is 541.6 kH/s  with hash mode 22000. In that case, the command for the benchmark is:

$ /usr/local/bin/hashcat -m 22000 -b

End.
++++++++++++++++++++++++++++++++++++++++++++++++


Finally, we will discuss how to create checkpoints to break the attack so that we can continue the attack later without losing the current progress. 


Benchmark including CPU:
$ hashcat -m 2500 -b -D 1,2


New method
https://miloserdov.org/?p=7794
https://miloserdov.org/?p=7801