RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

TL;DR

A user successfully configured an RTX 5080 and RTX 3090 together, reaching over 80 tokens/sec on Qwen 3.6 27B Q8. This showcases potential for high-performance local AI setups with mixed GPU architectures.

In 2026, a user has demonstrated that combining an RTX 5080 with an RTX 3090 can achieve over 80 tokens per second when running Qwen 3.6 27B Q8, marking a significant performance milestone for local AI setups.

The user configured a dual-GPU system using an Asus Prime X570-Pro motherboard, enabling the use of both an RTX 5080 and an RTX 3090. They detailed BIOS settings, driver installation, and specific build flags supporting both GPUs, achieving a combined processing rate of over 80 tokens/sec on the Qwen 3.6 27B model with q8 quantization.

This setup utilized custom kernel parameters, patched drivers, and specific llama.cpp startup options, including multi-GPU flags, to optimize VRAM utilization and performance. The result was a substantial increase from previous single-GPU benchmarks, with token generation speeds reaching as high as 90 tokens/sec depending on task complexity.

Impact of Dual-GPU Setup on Local AI Performance

This achievement demonstrates that users can significantly boost local large language model (LLM) inference speeds by combining high-end GPUs like the RTX 5080 and RTX 3090. It offers a practical pathway for enthusiasts and researchers to run advanced models with high throughput, reducing reliance on cloud services and enabling more powerful on-premise AI experimentation.

Furthermore, the detailed configuration guide provides valuable insights into BIOS tweaks, driver patching, and software flags necessary for multi-GPU operation, which could influence future hardware setups and software development for AI workloads.

PNY NVIDIA GeForce RTX™ 5080 Epic-X™ ARGB OC Triple Fan, Graphics Card (16GB GDDR7, 256-bit, Boost Speed: 2775 MHz, PCIe® 5.0, HDMI®/DP 2.1, 2.99-Slot, NVIDIA Blackwell Architecture, DLSS 4)

NVIDIA DLSS 4 – Supreme Speed. Superior Visuals. Powered by AI. DLSS is a revolutionary suite of neural…

As an affiliate, we earn on qualifying purchases.

Advances in Multi-GPU AI Hardware Configurations in 2026

Over the past year, AI enthusiasts have increasingly experimented with combining different GPU architectures to maximize performance. The RTX 5080, a recent high-performance card, and the widely used RTX 3090, with 24GB VRAM, are now being used together for local large language model inference. Prior benchmarks with single GPUs hovered around 50-60 tokens/sec, but recent configurations, including BIOS adjustments and driver patches, have pushed this above 80 tokens/sec.

While multi-GPU setups are not new, the specific combination of these two cards, with detailed technical configurations, marks a notable step forward in accessible high-speed local AI processing.

“By configuring BIOS, drivers, and llama.cpp with specific flags, I managed to push token speeds above 80/sec on Qwen 3.6 27B Q8 using both the RTX 5080 and RTX 3090.”

— User who conducted the setup

Amazon

high performance dual GPU motherboard

As an affiliate, we earn on qualifying purchases.

Remaining Technical Challenges and Compatibility Limits

While the user successfully achieved high performance, some aspects remain uncertain, such as long-term stability, driver compatibility with different GPU models, and potential issues with more diverse GPU combinations. The process involved complex BIOS and driver configurations, which may not be straightforward for all users.

It is also unclear whether future updates to drivers or hardware will simplify or complicate multi-GPU setups like this.

Amazon

AI inference GPU setup

As an affiliate, we earn on qualifying purchases.

Next Steps for Multi-GPU AI Optimization

Further testing is expected to explore stability and scalability across different GPU models and configurations. Software updates, driver patches, and community sharing of optimized settings are likely to improve ease of setup and performance. Additionally, developers may focus on better multi-GPU support in AI frameworks to facilitate wider adoption.

Users interested in replicating this setup should monitor updates from GPU vendors and AI software communities for improved tools and documentation.

Amazon

multi-GPU graphics card adapter

As an affiliate, we earn on qualifying purchases.

Key Questions

Can I achieve similar performance with different GPU models?

Potentially, but success depends on driver support, BIOS configuration, and hardware compatibility. Matching GPUs simplifies setup, but mixed configurations like this are possible with careful tuning.

What BIOS settings are critical for multi-GPU operation?

Disabling CSM, enabling Above 4G Decoding, and setting ReSize BAR Support to Auto or Enabled are essential BIOS tweaks for multi-GPU setups.

Is this setup suitable for everyday AI tasks or only experimental use?

While promising, such configurations are primarily for enthusiasts and researchers. Stability and support may vary, so caution is advised for production environments.

Will future driver updates make multi-GPU AI setups easier?

Likely, as GPU vendors and AI frameworks improve multi-GPU support, setup complexity should decrease, making high-performance local AI more accessible.

Source: Hacker News

RTX 5080 and RTX 3090 Setup: 80 Tok/s on Qwen 3.6 27B Q8

Up next

Amazon CEO’s talks with U.S. officials triggered crackdown on Anthropic models

Author

T3chBillion Team

Share article

Impact of Dual-GPU Setup on Local AI Performance

PNY NVIDIA GeForce RTX™ 5080 Epic-X™ ARGB OC Triple Fan, Graphics Card (16GB GDDR7, 256-bit, Boost Speed: 2775 MHz, PCIe® 5.0, HDMI®/DP 2.1, 2.99-Slot, NVIDIA Blackwell Architecture, DLSS 4)