Manny Kressel
CEO and Founder, BitMindz Forensic Solutions
Published May 30, 2018 · Last updated March 2026
Executive Overview & Research Objectives
This white paper presents benchmark findings from BitMindz Forensic Solutions' ongoing research into hardware-accelerated password recovery, with a primary focus on ZIP and 7-Zip archives encrypted using AES-256. As digital forensic investigations increasingly encounter password-protected evidence containers, understanding the practical throughput ceiling of contemporary hardware is critical for case planning, resource allocation, and toolchain selection.
The BitMindz Decryption Engines are purpose-built workstations and server nodes designed to maximize passwords-per-second (p/sec) throughput by exploiting the massively parallel architecture of modern GPUs via OpenCL. Each engine is configured to harness simultaneous CPU and GPU resources, with the GPU handling the bulk of hash computation and the CPU managing job orchestration, dictionary preprocessing, and attack sequencing.
Testing began in 2018 on a single-GPU consumer workstation and has expanded progressively to multi-GPU and multi-node configurations. The data collected across this period provides a unique longitudinal perspective on how GPU architectural improvements — from the Turing to Ada Lovelace microarchitectures — translate into measurable forensic throughput.
Scope & Disclaimer
All testing described herein was conducted in controlled forensic laboratory environments on archival files created specifically for benchmark validation. This research is published for academic, educational, and professional digital forensics purposes. BitMindz does not condone unauthorized access to encrypted data belonging to third parties.
Research Questions
- How does GPU-accelerated password recovery throughput scale across hardware generations for AES-256 ZIP archives?
- What is the performance contribution ratio between CPU and GPU components in a hybrid attack configuration?
- How does thermal loading affect sustained throughput in long-duration recovery operations?
- What are the diminishing returns of adding additional GPU nodes in a distributed recovery cluster?
- How do attack strategies (brute-force vs. dictionary/join attacks) compare in time-to-solution for common password patterns?
Hardware Configurations Under Test
Benchmarks were captured across six distinct system configurations spanning consumer-grade workstations to enterprise multi-GPU servers. The following hardware specifications were extracted directly from the Passware Kit Forensic resource monitoring interface as captured in the benchmark screenshots.
≈2020–2021
CPU
AMD Ryzen 9 5950X
Zen 3 · 16C/32T · 64 GB RAM
GPU
NVIDIA GeForce RTX 3090 ×1
Ampere GA102 · 10,496 CUDA · 24 GB GDDR6X
System
≈2022–2023
CPU
AMD Threadripper 3970X
Zen 2 · 32C/64T · 256 GB RAM
GPU
NVIDIA GeForce RTX 3090 ×2
Ampere GA102 · 10,496 CUDA each · 24 GB GDDR6X each
System
≈2023
CPU
Intel Core i9-13950HX
Raptor Lake-HX · 32C/64T · 64 GB RAM
GPU
NVIDIA RTX 4090 Laptop GPU ×1
Ada Lovelace · 88% load · 69°C
System
≈2023–2024
CPU
Intel Xeon Gold 6326 ×2 nodes
Ice Lake-SP · 16C @ 2.90GHz · 512 GB RAM/node
GPU
NVIDIA RTX 6000 Ada ×16 + RTX 4090L ×1
Ada Lovelace GL102 · 18,176 CUDA/GPU · 48 GB GDDR6/GPU
System
≈2023–2024
CPU
Intel Xeon Gold 6248R ×2 nodes
24C @ 3.00GHz · 256 GB RAM/node
GPU
NVIDIA RTX 3090 ×4 per node (8 total)
Ampere · 49–68°C sustained
System
Benchmark Results & Performance Data
The following table summarizes all benchmark sessions captured, ordered by total system throughput. All measurements are derived directly from the Passware Kit Forensic resource monitoring panels captured during live attack sessions targeting AES-256 encrypted ZIP archives.
| Config | System | GPU(s) | Count | RAM | Attack | Peak p/sec | GPU % | Temp |
|---|---|---|---|---|---|---|---|---|
| A | Ryzen 9 5950X | RTX 3090 ×1 | 1 | 64 GB | Join Attacks | 7.83M | 98.1% | 72°C |
| C | i9-13950HX (Laptop) | RTX 4090 Laptop | 1 | 64 GB | Brute-force 5–7 | 6.58M | 98.1% | 69°C |
| B | Threadripper 3970X | RTX 3090 ×2 | 2 | 256 GB | Brute-force 5–7 | 15.61M | 97.8% | 82°C ⚠ |
| F | Xeon 6248R ×2 nodes | RTX 3090 ×8 | 8 | 256 GB ×2 | Brute-force 5–7 | 58.68M | 99.5% | 66°C |
| D/E | Xeon 6326 ×2 + i9 Laptop | RTX 6000 Ada ×16 + RTX 4090L | 17 | 512 GB ×2 + 64 GB | Brute-force 5–7 | 193.27M | 99.9% | 75°C |
// PeakThroughput by Configuration (M p/sec)
// Scaling Efficiency: GPU Count vs. Throughput
Scaling efficiency exceeds linear at the 17-GPU mark due to the generational uplift of the RTX 6000 Ada architecture. Cross-node network overhead is minimal (<1% throughput loss).
Thermal Observation
Configuration B (dual RTX 3090, Threadripper 3970X) recorded the highest sustained GPU temperature at 82°C with a thermal throttle warning on GPU 0. This correlates to a reduced clock state and explains the slightly lower-than-expected throughput for a dual-GPU configuration. Improved case airflow or liquid cooling is recommended for sustained dual-RTX-3090 deployments.
Password Analysis Counts Captured During Benchmark Sessions
| Session | Time Elapsed | Passwords Analyzed | Search Speed | Progress | Target File |
|---|---|---|---|---|---|
| A-1 | 3 min, 2 sec | 823,338,041 | 7,629,247 p/sec | Attack 6/10 | Encryption_Test.zip |
| A-2 | 4 min, 13 sec | 1,376,728,740 | 7,827,033 p/sec | Attack 6/10 | Encryption_Test.zip |
| B-1 | ~30 min | 902,539,036+ | 15,297,272 p/sec (avg) | Ongoing | Encryption_Test.zip |
| D/E | 30 min, 27 sec | 43,758,723,167 | 193,274,061 p/sec | Attack 8/20 | Passware_Test_Win_Zip.zip |
| F-1 | 8 min, 17 sec | 15,039,915,725 | 56,085,472 p/sec | Attack 8/10 | Encryption_Test_3.zip |
| F-2 | 2 min, 41 sec | 5,794,596,061 | 58,684,257 p/sec | Attack 5/7 | Encryption_4.zip |
Cryptographic Algorithms & Encryption Targets
The primary encryption target in all benchmark sessions was AES-256-CBC as implemented within the ZIP and 7-Zip archive format specifications. Understanding the cryptographic underpinning is essential to contextualizing the throughput figures, as the cost-per-hash function directly determines how quickly candidate passwords can be tested.
AES-256
Advanced Encryption Standard with 256-bit key length. The symmetric block cipher used to encrypt archive payload data. While AES itself is extremely fast in hardware, the password-to-key derivation function creates the performance bottleneck.
PBKDF2-SHA1 (ZIP)
The standard AES-encrypted ZIP format uses PBKDF2 with HMAC-SHA1, 1,000 iterations, and a 128-bit or 256-bit derived key. This key stretching function is the primary performance limiter — each password candidate requires 1,000 SHA-1 hash computations.
7-Zip AES-256
7-Zip's implementation uses a custom key derivation with SHA-256 and a configurable iteration count (default 2^19 ≈ 524,288 iterations). This dramatically reduces cracking throughput compared to standard ZIP AES.
SHA-1 / SHA-256
SHA-1 underpins the PBKDF2 key derivation in WinZip-compatible AES encryption. SHA-256 is used in 7-Zip's native format. Both benefit from GPU parallelism, though the number of iterations per password candidate is the dominant performance factor.
ZIP 2.0 (Legacy)
The legacy PKZIP 2.0 encryption uses a much weaker 96-bit key derived from a simple CRC-32 chain. While trivially broken in seconds by modern hardware, it is not the target format used in these benchmarks. All tested archives employed AES-256.
OpenCL Acceleration
All GPU-accelerated sessions utilized OpenCL kernels optimized for AES and PBKDF2/SHA computation. NVIDIA GPUs execute these kernels via their OpenCL runtime, enabling all CUDA cores to operate as parallel password-testing engines simultaneously.
Why GPU Acceleration Dominates
Modern GPUs contain thousands of shader cores designed for highly parallelizable mathematical operations. PBKDF2-SHA1 — the key derivation function in AES-encrypted ZIP archives — can be decomposed into thousands of independent computation streams, each testing a different password candidate. The RTX 3090, with 10,496 CUDA cores, can sustain nearly 8 million such computations per second. The RTX 6000 Ada, with 18,176 CUDA cores, achieves even higher throughput while operating at lower thermal density due to architectural efficiency improvements in Ada Lovelace.
By contrast, a CPU — even a 32-core Threadripper — processes password candidates sequentially across fewer, more complex cores optimized for latency rather than throughput. The resulting CPU contribution is consistently below 2% of total throughput across all configurations tested.
// Conceptual representation of PBKDF2-SHA1 password check (ZIP AES-256)
function testCandidate(password, salt, verifier) {
// 1,000 iterations of HMAC-SHA1 key derivation
dk = PBKDF2(HMAC-SHA1, password, salt, iterations=1000, dkLen=32);
// Verify derived key against stored password verifier
return verify(dk, verifier);
}
// GPU executes ~8,000,000+ instances of testCandidate() per second (RTX 3090)
// GPU executes ~13,000,000+ instances per second (RTX 6000 Ada)
// 17 GPUs in cluster: ~193,000,000+ instances per secondAttack Strategies Employed
| Attack Type | Description | Observed In | Keyspace | Notes |
|---|---|---|---|---|
| Join Attacks | Combines dictionary segments (e.g., word + number + symbol) to generate candidates | Config A (attacks 6/10) | Millions–Billions | High coverage of real-world passwords like "apology($" (len 9) |
| Brute-Force (EN) | Exhaustive enumeration of English character set combinations | Configs B, D/E, F | Exponential by length | 5–7 char: ~24B candidates; covers short numeric/alpha combos |
| Dictionary | Tests known wordlists, common passwords, and variations | Attack queue early stages | Wordlist dependent | Fastest time-to-solution for weak passwords |
| Mask Attack | Pattern-based generation (e.g., ?l?l?l?d?d for 3 lower + 2 digit) | Implicit in attack profiles | Configurable | Efficient when password policy is known |
Observed Password Patterns During Testing
The benchmark sessions captured live "current password" candidates including: "apology($" (length 9, join attack), "deanery7503" (length 11, dictionary+number join), "86q2t0t" (length 7, brute-force alphanumeric), "1hpcak5" (length 7, brute-force), and "1r80r7x" (length 7, brute-force). These illustrate the realistic password patterns encountered in forensic recovery operations.
Methodology & Testing Protocol
All benchmark data was captured using Passware Kit Forensic, a commercially licensed password recovery platform widely used in law enforcement and digital forensics contexts. The software's resources panel provides real-time per-device throughput measurements, allowing precise attribution of performance contributions to individual CPU and GPU components.
Test File Preparation
Benchmark archive files were created as standard ZIP files using AES-256 encryption, consistent with the output produced by both WinZip and 7-Zip when configured for AES-256 mode. Files observed across sessions include Encryption_Test.zip, Encryption_Test_3.zip, Encryption_4.zip, and Passware_Test_Win_Zip.zip. Each archive was encrypted with a known password to validate recovery success and confirm algorithmic correctness of the attack queue.
Performance Measurement Protocol
Screenshots were captured at representative intervals during active attack sessions, with the software displaying real-time and average throughput figures per device. For the purposes of this study, "peak p/sec" refers to the highest instantaneous throughput observed in the system-level aggregate reading during that session, while "average p/sec" refers to the time-weighted mean reported by the software.
Distributed Agent Configuration
Multi-node configurations (Configs D/E and F) used Passware Kit Forensic's agent protocol over a LAN connection. Remote agent nodes are identified by IP address in the resource panel. The primary workstation coordinates attack distribution and aggregates results; remote nodes execute hash computations and report back at regular intervals. Network overhead was observed to be negligible (<1% throughput variation attributable to network latency) on a gigabit LAN.
Project Inception — Baseline Configuration
BitMindz begins systematic benchmarking of GPU-accelerated password recovery. Initial configuration utilizes a single consumer GPU. Establishes baseline throughput metrics for AES-256 ZIP archives and documents attack methodology.
Configuration A — AMD Ryzen 9 5950X + RTX 3090
Deployment of Ampere-architecture RTX 3090 (10,496 CUDA cores). First benchmarks to surpass 7.5M p/sec on ZIP AES-256. Join attack methodology documented. GPU thermal behavior at 72°C under sustained 100% load confirmed stable.
Configuration B — Threadripper 3970X + Dual RTX 3090
Dual-GPU configuration deployed in 256 GB RAM workstation. Achieves ~15.5M p/sec sustained throughput. Thermal throttling observed on GPU 0 (82°C) highlights importance of active cooling in multi-GPU dense configurations.
Configurations C & F — Mobile Platform + Quad-GPU Dual-Node
Mobile RTX 4090 laptop platform benchmarked (~6.6M p/sec), validating field-deployable decryption capability. Dual-node 8× RTX 3090 cluster achieves ~58M p/sec — 7.5× scaling efficiency from single-GPU baseline.
Configurations D/E — Enterprise RTX 6000 Ada Cluster
Peak performance milestone reached: 193.27M p/sec across 17 GPUs (16× RTX 6000 Ada + 1× RTX 4090 Laptop). Over 43 billion passwords checked in 30 minutes. RTX 6000 Ada delivers ~60% higher per-GPU throughput than RTX 3090 at similar thermal envelope.
Next-Generation Engine Builds — Coming Soon
BitMindz has additional decryption engine configurations in development. Upcoming builds are expected to leverage next-generation NVIDIA Blackwell architecture GPUs. Benchmark data will be added to this study upon completion of commissioning and validation testing.
Conclusions & Forward Outlook
Eight years of continuous benchmarking by BitMindz Forensic Solutions has yielded a clear and quantified picture of how GPU-accelerated password recovery performance has evolved — and continues to evolve — across hardware generations.
Key Findings
GPU dominance is total and consistent.
Across every configuration tested, GPU hardware accounts for 98–99.9% of total password-checking throughput. CPU resources play an essential but secondary role as orchestration and dispatch engines. This finding validates the BitMindz engineering philosophy of maximizing GPU count and quality as the primary performance lever.
Architectural generation matters as much as GPU count.
The RTX 6000 Ada (Ada Lovelace) delivers approximately 60% higher per-GPU throughput against ZIP AES-256 compared to the RTX 3090 (Ampere) at comparable sustained loads. In a multi-GPU cluster, upgrading GPU generation can yield greater performance gains than simply adding more cards of an older generation.
Scaling efficiency remains near-linear through 17 GPUs.
The 3-node, 17-GPU configuration achieved 24.7× the throughput of a single RTX 3090 — slightly exceeding the theoretical 24× from the architectural uplift and GPU count combined. This indicates that the distributed agent architecture introduces minimal overhead on gigabit LAN infrastructure.
Thermal management is a critical operational constraint.
The 82°C thermal event observed on a dual-RTX-3090 workstation (GPU 0) represents the most significant risk factor for sustained forensic operations. Enterprise server nodes running RTX 6000 Ada maintained 63–75°C across 30+ minute sessions at full load, demonstrating superior thermal headroom in purpose-built enclosures.
AES-256 ZIP remains a practical recovery target.
At 193M p/sec, an 8-character password drawn from lowercase letters and digits (keyspace ≈ 2.8 trillion) can be exhaustively searched in approximately 4 hours. Complex passwords exceeding 10 characters with mixed case, digits, and symbols remain computationally impractical for brute-force within operational timeframes — underscoring the importance of intelligent attack ordering (dictionary → rule-based → brute-force) in production recovery workflows.
Coming Next — New Engine Specifications
BitMindz has several next-generation Decryption Engine builds currently in commission. These systems will incorporate the latest GPU architectures and are expected to significantly advance the performance ceiling documented in this study. Benchmark data from these new configurations will be appended to this white paper upon their completion and validation. Visit www.bitmindz.com for announcements and updated specifications.
Practical Implications for Digital Forensics
For forensic practitioners, these findings have direct operational implications. A single-GPU workstation with an RTX 3090 or equivalent can recover typical short (≤8 character) passwords from AES-256 ZIP archives within minutes to hours using combined dictionary and brute-force strategies. Multi-GPU configurations reduce this window proportionally. For large-scale or time-sensitive investigations, a distributed BitMindz Decryption Engine cluster eliminates most practical password-length barriers below 10 characters.
The data further reinforces that password complexity — not just length — remains the most effective deterrent. Mixed-case alphanumeric passwords with special characters create keyspaces that remain computationally intractable even at 193M p/sec, making strong password hygiene the most reliable defense against GPU-accelerated recovery attacks.
Future Research Directions
Planned expansions to this benchmark study include: (1) Performance characterization against 7-Zip native encryption (AES-256 with SHA-256 key derivation), which is expected to show significantly lower throughput due to higher iteration count; (2) Integration of next-generation Blackwell-architecture GPUs; (3) Performance profiling of rule-based mutation attacks for enhanced coverage of real-world password patterns; and (4) Cross-platform comparison of CUDA vs. OpenCL kernel performance on identical hardware.
References & Technical Standards
| # | Reference | Relevance |
|---|---|---|
| 1 | NIST FIPS 197 — Advanced Encryption Standard (AES) | AES-256 block cipher specification |
| 2 | RFC 2898 — PKCS #5: Password-Based Cryptography Specification v2.0 (PBKDF2) | Key derivation function for ZIP AES |
| 3 | WinZip AES Encryption Specification (WinZip Computing, LLC) | ZIP AES-256 format implementation |
| 4 | 7-Zip Source Documentation — Igor Pavlov, 7-zip.org | 7z native AES-256 / SHA-256 KDF |
| 5 | NVIDIA Ampere Architecture Whitepaper (GA102) | RTX 3090 architecture reference |
| 6 | NVIDIA Ada Lovelace Architecture Technical Brief | RTX 4090 / RTX 6000 Ada reference |
| 7 | Passware Kit Forensic Documentation — Passware Inc. | Benchmark tool and attack methodology |
| 8 | Khronos OpenCL Specification 3.0 | GPU compute framework for hash kernels |
| 9 | BitMindz Forensic Solutions Internal Benchmark Dataset (2018–2026) | Primary data source for this study |
Explore BitMindz Decryption Engines
Purpose-built GPU-accelerated decryption hardware for digital forensics professionals — from single-node workstations to enterprise multi-GPU clusters.
Have Questions or Comments?
Please reach out to Manny Kressel at manny@bitmindz.com