The UL Procyon benchmarks for AI are designed to map different real-world workloads. They cover classic computer vision tasks, image generation with stable diffusion and text generation with large language models. The particular value of this suite lies in the fact that identical models are executed using different inference stacks. This makes it clear whether performance advantages come from optimized runtimes, from the hardware itself or simply from available VRAM. This limit is quickly reached in memory-intensive benchmarks, especially for cards with only 8 GB of memory, meaning that measurements are either severely limited or can no longer be carried out at all.
Interfaces and implementations
Windows ML is a generic inference API integrated into Windows. It uses DirectML and distributes operators to the CPU and GPU. The results are stable and manufacturer-independent, but in benchmarks they usually lag behind specialized runtimes. For example, the Intel Arc Pro B50 with Windows ML achieved 527 points in the Vision test, while the RTX A1000 fell behind with 311 points and the W7500 with 238 points. Especially with memory-hungry models such as LLaMA 2, bottlenecks quickly occur with 8 GB cards, making a run impossible.
TensorRT is NVIDIA’s engine for optimized inference, which merges operators and uses GPU memory efficiently. It achieves the best results on RTX cards in many cases, as long as there is enough VRAM available. In the Stable Diffusion test, the RTX A1000 with TensorRT scored 564 points – well ahead of its performance with ONNX Olive, which was only 174 points due to memory bottlenecks. However, with large language models such as LLaMA 2, the memory was no longer sufficient, so the test was aborted.
OpenVINO is Intel’s inference stack, which streamlines models via the Model Optimizer and distributes them to XMX units as well as CPU and GPU. The Arc Pro B50 regularly achieved the best results in the benchmarks: 609 points in Computer Vision FP32, 757 points in Stable Diffusion FP16 and top scores in text generation with Phi 3.5 (2589 points), Mistral 7B (2479 points), LLaMA 3.1 (2446 points) and LLaMA 2 (2402 points). The decisive factor here was not only the optimization, but also the larger VRAM of 16 GB, which made the difference with complex language models.
ONNX Olive optimizes models within the ONNX Runtime and runs vendor-neutral. Olive achieved solid results on the Arc Pro B50, such as 547 points with Stable Diffusion, or 1768 points with Phi 3.5. Compared to OpenVINO, however, the gap remained visible because Olive is more dependent on generic kernels. With the 8 GB models from AMD and NVIDIA, the memory limited the results: the W7500 only achieved 467 points with Stable Diffusion, the RTX A1000 174 points, while more complex language models such as LLaMA 2 could no longer be run at all with NVIDIA. The AMD-optimized ONNX runtime specifically addresses RDNA GPUs, but also remains dependent on memory.
AI Computer Vision FP32
The UL Procyon AI Computer Vision benchmark shows a classic picture in favor of NVIDIA, but with significant progress for Intel. The RTX 4000 Ada achieves 1073 points with TensorRT, closely followed by the RTX 2000 Ada with 942 points. The Intel Arc Pro B60 is in a solid third place with 822 points, clearly outperforming both AMD’s Radeon Pro W7700 (733 points) and its own B50 with 609 points. These results show that Intel’s OpenVINO framework now works extremely efficiently in classic image processing tasks such as object recognition or segmentation and that NVIDIA only scores with the advantage of the tensor cores. AMD continues to lag far behind with Windows ML, which is due to the still limited optimization of its ML drivers.
AI Image Generation Benchmark – Stable Diffusion 1.5 (FP16)
In the image generation benchmark with Stable Diffusion 1.5, Intel takes the top position for the first time. The Arc Pro B60 achieves first place with 1362 points and is just ahead of the RTX 4000 Ada (1358 points). This shows how efficiently OpenVINO works in combination with Intel’s Xe architecture. The B50 follows with 757 points, while AMD’s Radeon Pro W7700 performs slightly better than expected with 902 points. NVIDIA loses its previous TensorRT lead here, as the benchmark benefits less from specialized hardware and more from memory bandwidth and execution optimization. The fact that the RTX A1000 falls behind with only 564 points illustrates the high memory requirements and the dependence on wide GPU memory for generative workloads.
AI Text Generation Benchmark – Mistral 7B
The Mistral 7B test reveals the current performance monopoly of Intel Arc GPUs. With 4284 points, the B60 achieves the clear top score and is thus more than twice as high as the RTX 4000 Ada (2017 points). Even the smaller B50 with 2479 points beats all competitor models. OpenVINO shows its strength here with quantized transformer models based on mixed precision and high throughput. Although NVIDIA can deliver solid results with TensorRT, it lags behind in terms of efficiency per watt. AMD’s cards under Windows ML and ONNX only achieve a quarter of Intel’s performance with 900 to 1000 points – a clear indication of insufficient optimizations for Large Language Models (LLMs).
AI Text Generation Benchmark – LLAMA 3.1
The dominance of the Arc GPUs is also confirmed in LLAMA 3.1. The B60 achieves 4190 points, the B50 2446 points – a considerable lead over the RTX 4000 Ada (1773 points) and RTX 2000 Ada (1432 points). The Arc GPUs benefit massively here from Intel’s own optimizations, which are specially tailored to text tokenization and attention mechanisms. NVIDIA remains powerful, but the TensorRT implementation shows higher latencies for longer sequences. AMD’s Radeon Pro W7700 still achieves the best result of the RDNA3-based models with 860 points, but is clearly lagging behind.
AI Text Generation Benchmark – LLAMA 2
In the older LLAMA 2 test, the Arc Pro B60 once again achieved the top score with 4342 points, followed by the B50 with 2402 points. NVIDIA can no longer keep up with the RTX 4000 Ada (1705 points) and RTX 2000 Ada (1429 points). Particularly striking is the large gap to AMD, whose ONNX implementation is not competitive with a maximum of 152 points. The reason for this is the lack of support for quantized INT8 and BF16 models in the current AMD driver environment. Intel, on the other hand, can use all common quantization modes with OpenVINO and thus achieves an exceptionally high computing performance per clock.
Interim conclusion
Intel’s Arc Pro B60 dominates the current AI benchmarks with an impressive lead in text and image generation tasks. While NVIDIA remains strong in the classic FP32 training and TensorRT segment, Intel is taking full advantage of the open ONNX and OpenVINO frameworks. The results show that Arc GPUs are not only suitable in practice as workstation graphics cards, but increasingly also as efficient accelerators for AI inference. AMD’s Windows ML approach, on the other hand, is currently still too limited to keep up in productive AI workflows.
- 1 - Intro, overview and technical data
- 2 - Test system and equipment
- 3 - Teardown: PCB, topology and components
- 4 - Teardown: Cooler and fan
- 5 - Teardown: Material analysis and TIM testing
- 6 - Autodesk AutoCAD
- 7 - Autodesk Inventor Pro
- 8 - PTC Creo
- 9 - Dassault Systèmes Solidworks
- 10 - Autodesk Maya
- 11 - SPECviewperf 15 (2025)
- 12 - Adobe Photoshop 26.10
- 13 - Adobe After Effects 2025
- 14 - Adobe Premiere Pro 25.41
- 15 - AI benchmarks (AI Vision, Image, Text)
- 16 - Rendering
- 17 - Temperatures, clock rate, power consumption, noise
- 18 - Summary and conclusion









































26 Antworten
Kommentar
Lade neue Kommentare
Urgestein
Veteran
1
Veteran
Urgestein
Urgestein
Urgestein
Urgestein
1
Mitglied
Urgestein
Urgestein
1
Urgestein
Veteran
Urgestein
Veteran
Mitglied
1
Alle Kommentare lesen unter igor´sLAB Community →