RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

TL;DR

A user successfully configured an RTX 5080 and RTX 3090 together, reaching over 80 tokens/sec on Qwen 3.6 27B Q8. This showcases potential for high-performance local AI setups with mixed GPU architectures.

In 2026, a user has demonstrated that combining an RTX 5080 with an RTX 3090 can achieve over 80 tokens per second when running Qwen 3.6 27B Q8, marking a significant performance milestone for local AI setups.

The user configured a dual-GPU system using an Asus Prime X570-Pro motherboard, enabling the use of both an RTX 5080 and an RTX 3090. They detailed BIOS settings, driver installation, and specific build flags supporting both GPUs, achieving a combined processing rate of over 80 tokens/sec on the Qwen 3.6 27B model with q8 quantization.

This setup utilized custom kernel parameters, patched drivers, and specific llama.cpp startup options, including multi-GPU flags, to optimize VRAM utilization and performance. The result was a substantial increase from previous single-GPU benchmarks, with token generation speeds reaching as high as 90 tokens/sec depending on task complexity.

Impact of Dual-GPU Setup on Local AI Performance

This achievement demonstrates that users can significantly boost local large language model (LLM) inference speeds by combining high-end GPUs like the RTX 5080 and RTX 3090. It offers a practical pathway for enthusiasts and researchers to run advanced models with high throughput, reducing reliance on cloud services and enabling more powerful on-premise AI experimentation.

Furthermore, the detailed configuration guide provides valuable insights into BIOS tweaks, driver patching, and software flags necessary for multi-GPU operation, which could influence future hardware setups and software development for AI workloads.

PNY NVIDIA GeForce RTX™ 5080 Epic-X™ ARGB OC Triple Fan, Graphics Card (16GB GDDR7, 256-bit, Boost Speed: 2775 MHz, PCIe® 5.0, HDMI®/DP 2.1, 2.99-Slot, NVIDIA Blackwell Architecture, DLSS 4)

PNY NVIDIA GeForce RTX™ 5080 Epic-X™ ARGB OC Triple Fan, Graphics Card (16GB GDDR7, 256-bit, Boost Speed: 2775 MHz, PCIe® 5.0, HDMI®/DP 2.1, 2.99-Slot, NVIDIA Blackwell Architecture, DLSS 4)

NVIDIA DLSS 4 – Supreme Speed. Superior Visuals. Powered by AI. DLSS is a revolutionary suite of neural…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Advances in Multi-GPU AI Hardware Configurations in 2026

Over the past year, AI enthusiasts have increasingly experimented with combining different GPU architectures to maximize performance. The RTX 5080, a recent high-performance card, and the widely used RTX 3090, with 24GB VRAM, are now being used together for local large language model inference. Prior benchmarks with single GPUs hovered around 50-60 tokens/sec, but recent configurations, including BIOS adjustments and driver patches, have pushed this above 80 tokens/sec.

While multi-GPU setups are not new, the specific combination of these two cards, with detailed technical configurations, marks a notable step forward in accessible high-speed local AI processing.

“By configuring BIOS, drivers, and llama.cpp with specific flags, I managed to push token speeds above 80/sec on Qwen 3.6 27B Q8 using both the RTX 5080 and RTX 3090.”

— User who conducted the setup

ASUS Prime B550-PLUS AC-HES AMD AM4 (3rd Gen Ryzen) ATX Motherboard (Dual M.2, PCIe4.0, WIFI5, 1Gb Ethernet, SATA 6 Gbps, USB 3.2 Gen 2 Type-C, Front USB 3.2 Gen 1 Type-C and CEC Tier II Ready)

ASUS Prime B550-PLUS AC-HES AMD AM4 (3rd Gen Ryzen) ATX Motherboard (Dual M.2, PCIe4.0, WIFI5, 1Gb Ethernet, SATA 6 Gbps, USB 3.2 Gen 2 Type-C, Front USB 3.2 Gen 1 Type-C and CEC Tier II Ready)

AMD AM4 Socket and PCIe 4.0: The perfect pairing for 3rd Gen AMD Ryzen CPUs

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Remaining Technical Challenges and Compatibility Limits

While the user successfully achieved high performance, some aspects remain uncertain, such as long-term stability, driver compatibility with different GPU models, and potential issues with more diverse GPU combinations. The process involved complex BIOS and driver configurations, which may not be straightforward for all users.

It is also unclear whether future updates to drivers or hardware will simplify or complicate multi-GPU setups like this.

Apple 2026 MacBook Pro Laptop with Apple M5 Pro chip with 15-core CPU and 16-core GPU: Built for AI, 14.2-inch Liquid Retina XDR Display, 24GB Unified Memory, 1TB SSD, Wi-Fi 7; Space Black

Apple 2026 MacBook Pro Laptop with Apple M5 Pro chip with 15-core CPU and 16-core GPU: Built for AI, 14.2-inch Liquid Retina XDR Display, 24GB Unified Memory, 1TB SSD, Wi-Fi 7; Space Black

FAST RUNS IN THE FAMILY — The 14-inch MacBook Pro with the M5 Pro or M5 Max chip…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for Multi-GPU AI Optimization

Further testing is expected to explore stability and scalability across different GPU models and configurations. Software updates, driver patches, and community sharing of optimized settings are likely to improve ease of setup and performance. Additionally, developers may focus on better multi-GPU support in AI frameworks to facilitate wider adoption.

Users interested in replicating this setup should monitor updates from GPU vendors and AI software communities for improved tools and documentation.

MOLUCKFU Graphics Card Adapter Cable Pcie 3.0 Riser Card External Gpu Extension for Laptop Gaming Multimedia Upgrade with USB Cable 5.03X1.77X0.78 Inches

MOLUCKFU Graphics Card Adapter Cable Pcie 3.0 Riser Card External Gpu Extension for Laptop Gaming Multimedia Upgrade with USB Cable 5.03X1.77X0.78 Inches

Core Graphics Card Adapter Functionality: This Graphics Card PCIe riser cable enables quick and easy connection to external…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Can I achieve similar performance with different GPU models?

Potentially, but success depends on driver support, BIOS configuration, and hardware compatibility. Matching GPUs simplifies setup, but mixed configurations like this are possible with careful tuning.

What BIOS settings are critical for multi-GPU operation?

Disabling CSM, enabling Above 4G Decoding, and setting ReSize BAR Support to Auto or Enabled are essential BIOS tweaks for multi-GPU setups.

Is this setup suitable for everyday AI tasks or only experimental use?

While promising, such configurations are primarily for enthusiasts and researchers. Stability and support may vary, so caution is advised for production environments.

Will future driver updates make multi-GPU AI setups easier?

Likely, as GPU vendors and AI frameworks improve multi-GPU support, setup complexity should decrease, making high-performance local AI more accessible.

Source: Hacker News


You May Also Like

Collaborative Robots (Cobots) and the Future of Work

Collaborative robots, or cobots, are transforming your workplace by working alongside you…

AI Security Basics: Prompt Injection and Data Leakage

Securing AI systems against prompt injection and data leakage is crucial to prevent vulnerabilities and protect sensitive information effectively.

Ethics of AI: Addressing Bias and Accountability

Lifting the veil on AI ethics reveals critical issues of bias and accountability that demand your attention to ensure fair and responsible technology.

Software engineering. The canonical case.

New data shows a 40% drop in junior hiring, with senior engineers benefiting from AI augmentation, revealing a bifurcated labor market in software engineering.