The UL Procyon benchmarks for AI are designed to map different real-world workloads. They cover classic computer vision tasks, image generation with stable diffusion and text generation with large language models. The particular value of this suite lies in the fact that identical models are executed using different inference stacks. This makes it clear whether performance advantages come from optimized runtimes, from the hardware itself or simply from available VRAM. Cards with only 8 GB of memory in particular show that this limit is quickly reached in memory-intensive benchmarks, meaning that measurements are either severely restricted or can no longer be carried out at all.
Interfaces and implementations
Windows ML is a generic inference API integrated into Windows. It uses DirectML and distributes operators to the CPU and GPU. The results are stable and manufacturer-independent, but in benchmarks they usually lag behind specialized runtimes. For example, the Intel Arc Pro B50 with Windows ML achieved 527 points in the Vision test, while the RTX A1000 fell behind with 311 points and the W7500 with 238 points. Especially with memory-hungry models such as LLaMA 2, bottlenecks quickly occur with 8 GB cards, making a run impossible.
TensorRT is NVIDIA’s engine for optimized inference, which merges operators and uses GPU memory efficiently. It achieves the best results on RTX cards in many cases, as long as there is enough VRAM available. In the Stable Diffusion test, the RTX A1000 with TensorRT scored 564 points – well ahead of its performance with ONNX Olive, which was only 174 points due to memory bottlenecks. However, with large language models such as LLaMA 2, the memory was no longer sufficient, so the test was aborted.
OpenVINO is Intel’s inference stack, which streamlines models via the Model Optimizer and distributes them to XMX units as well as CPU and GPU. The Arc Pro B50 regularly achieved the best results in the benchmarks: 609 points in Computer Vision FP32, 757 points in Stable Diffusion FP16 and top scores in text generation with Phi 3.5 (2589 points), Mistral 7B (2479 points), LLaMA 3.1 (2446 points) and LLaMA 2 (2402 points). The decisive factor here was not only the optimization, but also the larger VRAM of 16 GB, which made the difference with complex language models.
ONNX Olive optimizes models within the ONNX Runtime and runs vendor-neutral. Olive achieved solid results on the Arc Pro B50, such as 547 points with Stable Diffusion, or 1768 points with Phi 3.5. Compared to OpenVINO, however, the gap remained visible because Olive is more dependent on generic kernels. With the 8 GB models from AMD and NVIDIA, the memory limited the results: the W7500 only achieved 467 points with Stable Diffusion, the RTX A1000 174 points, while more complex language models such as LLaMA 2 could no longer be run at all with NVIDIA. The AMD-optimized ONNX runtime specifically addresses RDNA GPUs, but also remains dependent on memory.
AI Computer Vision FP32
The vision benchmark focuses on classic image classification and object recognition with full FP32 precision. Here, the Intel Arc Pro B50 is clearly ahead with 609 points (OpenVINO) and 527 points (Windows ML). NVIDIA RTX A1000 achieves 311 points with Windows ML, while AMD Radeon Pro W7500 achieves 238 points. It is striking that the RTX A1000 with TensorRT does not even run through due to the limited 8 GB memory – a clear example of where VRAM becomes a hard limitation. Intel benefits greatly here from OpenVINO, which efficiently distributes the operators across the XMX units.
AI Image Generation – Stable Diffusion 1.5 (FP16)
Stable Diffusion is memory intensive and benefits from FP16 to reduce VRAM requirements. Intel Arc Pro B50 achieves 757 points with OpenVINO, while it achieves 547 points with ONNX Olive. NVIDIA RTX A1000 scores 564 points with TensorRT and is thus solidly in the midfield, but drops to just 174 points with ONNX Olive due to the lack of memory optimizations. AMD Radeon Pro W7500 achieves 467 points with the AMD-optimized ONNX runtime. This shows that both engine efficiency and VRAM are crucial for image generation. Cards with only 8 GB quickly reach their limits and deliver inconsistent results.
AI Text Generation – Phi 3.5
Phi 3.5 is a relatively compact model, which means that all cards can be run. Intel Arc Pro B50 achieves 2589 points with OpenVINO and 1768 with ONNX. NVIDIA RTX A1000 achieves 1114 points, AMD W7500 729, which means Intel clearly dominates, as OpenVINO uses particularly efficient kernels here. NVIDIA and AMD lag behind, but still benefit from the smaller model sizes.
AI Text Generation – Mistral 7B
Mistral 7B is significantly more memory-intensive and makes greater demands on the VRAM capacity. Intel Arc Pro B50 achieves 2479 points with OpenVINO, with ONNX 1434. RTX A1000 achieves 967 points, AMD W7500 only 518 points. This shows that 8 GB cards can hardly keep up: Batch sizes have to be reduced, which slows down the token rate considerably. Intel remains clearly superior due to the larger memory.
AI Text Generation – LLaMA 3.1
LLaMA 3.1 scales memory requirements and computing load even higher. Intel Arc Pro B50 achieves 2446 points with OpenVINO and 1109 points with ONNX. RTX A1000 achieves 858 points, AMD W7500 only 477 points. The gaps are widening, VRAM bottlenecks are visibly pushing performance down.
AI Text Generation – LLaMA 2
The VRAM limit is finally revealed here. Intel Arc Pro B50 can still run at full speed with 2402 points (OpenVINO) and 1249 points (ONNX). AMD Radeon Pro W7500 achieves practically unusable results with only 129 points. NVIDIA RTX A1000 fails completely at the memory limit, the benchmark aborts. This shows that models in the 7B class and above can no longer be processed reliably with 8 GB cards.
Conclusion
The Procyon benchmarks clearly show the differences between the cards and runtimes. Intel dominates almost all tests with the Arc Pro B50 thanks to 16 GB of VRAM and OpenVINO optimizations. NVIDIA shows solid results with TensorRT, but slumps or fails completely with memory-hungry models. AMD delivers constant but low values with ONNX AMD Optimized and also suffers from the 8 GB limit. It becomes particularly clear with large language models: 8 GB of VRAM is no longer enough to reliably handle modern AI workloads.
- 1 - Introduction, unboxing and technical data
- 2 - Test system and equipment
- 3 - Teardown: PCB, topology and components
- 4 - Teardown: Cooling solution
- 5 - Teardown: Material analysis and ASTM TIM testing
- 6 - Autodesk AutoCAD
- 7 - Autodesk Inventor Pro
- 8 - PTC Creo
- 9 - Dassault Systèmes Solidworks
- 10 - Autodesk Maya
- 11 - SPECviewperf 15 (2025)
- 12 - Adobe Photoshop 26.10
- 13 - Adobe After Effects 2025
- 14 - Adobe Premiere Pro 25.41
- 15 - AI Benchmarks (AI Vision, Image, Text)
- 16 - Rendering
- 17 - Temperatues, clock rates, power draw and fan speed
- 18 - Summary and conclusion









































32 Antworten
Kommentar
Lade neue Kommentare
Urgestein
Veteran
1
Urgestein
1
Urgestein
1
1
Urgestein
1
Urgestein
1
Urgestein
1
Urgestein
Urgestein
Urgestein
Veteran
1
Alle Kommentare lesen unter igor´sLAB Community →