NVIDIA and Oracle Team Up to Offer Faster, Cheaper AI Inference on the Cloud

NVIDIA Triton inference server provides flexibility to build and run AI apps efficiently on Oracle Cloud Infrastructure. It reduced costs by 10% and increased throughput up to 76%.
Oracle Cloud Infrastructure Vision AI is used by customers for object detection and image classification, like automated toll billing and invoice recognition.
Triton is being adopted across other Oracle services to provide an easy way for customers to leverage its capabilities.
Triton handles concurrent multi-model inference very effectively compared to other frameworks.
Oracle is exploring using TensorRT-LLM to accelerate large language models on NVIDIA GPUs, finding a balance between precision and performance.