Georg Kalus
/

OVHcloud AI endpoints are now in General Availability

OVHcloud AI endpoints

OVHcloud officially announced last week the General Availability of the OVHcloud AI Endpoints. These make more than 40 models for text, speech, image and easily accessible behind ready-to-use API endpoints.

The AI endpoints are a very convenient, easy to use way to leverage powerful AI models for GenAI, text-to-speech, image recognition, image generation, working with code, and more, in a no-commitment pay-per-use pricing model.

Available Models at OVHcloud

OVHcloud are launching the AI endpoints with over 40 models. The available models are listed below.

LLM (Large Language Models)

High-performance templates designed to understand and generate text in a human way, used for chatbots, text summarization, or content creation.

ModelDescription
Llama-3.1-8B-InstructLlama 3.1 (8B parameters version) is an auto-regressive language model that uses an optimized transformer architecture. It was released by Meta AI on July 23, 2024, and utilizes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The knowledge cutoff for this model is December 31, 2023.
Llama-3.1-70B-InstructLlama 3.1 (70B parameters version) is an auto-regressive language model that uses an optimized transformer architecture. It was released by Meta AI on July 23, 2024, and utilizes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The knowledge cutoff for this model is December 31, 2023.
Llama-3.3-70B-InstructLlama 3.3 is an instruction-tuned generative language model optimized for multilingual dialogue use cases. It was released by Meta AI on December 6, 2024, and utilizes an advanced transformer architecture and is designed to align with human preferences for helpfulness and safety through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). The knowledge cutoff for this 70B model is December 31, 2023.
Mistral-Nemo-Instruct-2407The Mistral-Nemo-Instruct-2407 model, developed collaboratively by Mistral AI and NVIDIA, is an instruction-tuned LLM released in 2024. Designed for multilingual applications, it excels in tasks such as conversational dialogue, code generation, and instructional comprehension across various languages.

Reasoning LLM

ModelDescription
DeepSeek-R1-Distill-Llama-70BThe DeepSeek-R1-Distill-Llama-70B model is a model trained via large-scale reinforcement learning. It was released by DeepSeek on January 20, 2025, and it is a distilled version of the Llama 3.3 70B model. The knowledge cutoff date for this model is July 1, 2024.

Visual LLM

ModelDescription
Qwen2.5-​VL-​72B-​InstructQwen2.5-VL is a powerful vision-language model, designed for advanced image understanding. It can generate detailed image captions, analyze documents, OCR, detect objects, and answer questions based on visuals, making it useful for AI assistants, RAG and Agents.
llava-​next-​mistral-​7bLLaVa combines a pre-trained large language model with a pre-trained vision encoder for multimodal (image + text) chatbot use cases. LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets.

Code LLM

ModelDescription
Qwen2.5-​Coder-​32B-​InstructQwen2.5-VL is a powerful vision-language model, designed for advanced image understanding. It can generate detailed image captions, analyze documents, OCR, detect objects, and answer questions based on visuals, making it useful for AI assistants, RAG and Agents.
mamba-​codestral-​7B-​v0.1BGE-Multilingual-Gemma2 is a LLM-based multilingual embedding model. It is trained on a diverse range of languages and tasks. BGE-Multilingual-Gemma2 primarily demonstrates the following advancements: Diverse training data: The model’s training data spans a broad range of languages, including English, Chinese, Japanese, Korean, French, and more.Additionally, the data covers a variety of task types, such as retrieval, classification, and clustering. Outstanding performance: The model exhibits state-of-the-art (SOTA) results on multilingual benchmarks like MIRACL, MTEB-pl, and MTEB-fr. It also achieves excellent performance on other major evaluations, including MTEB, C-MTEB and AIR-Bench.

Embeddings

ModelDescription
bge-​multilingual-​gemma2BGE-Multilingual-Gemma2 is a LLM-based multilingual embedding model. It is trained on a diverse range of languages and tasks. BGE-Multilingual-Gemma2 primarily demonstrates the following advancements: Diverse training data: The model’s training data spans a broad range of languages, including English, Chinese, Japanese, Korean, French, and more.Additionally, the data covers a variety of task types, such as retrieval, classification, and clustering. Outstanding performance: The model exhibits state-of-the-art (SOTA) results on multilingual benchmarks like MIRACL, MTEB-pl, and MTEB-fr. It also achieves excellent performance on other major evaluations, including MTEB, C-MTEB and AIR-Bench.
BGE-​M3The BGE-M3 model, developed by the BAAI and released in early 2024, is a powerful multilingual multifunctional and multigranular embedding model designed for retrieval tasks. It supports dense, multi-vector, and sparse retrieval while handling texts of varying lengths across over 100 languages.
bge-​base-​en-​v1.5This model converts English text into dense vector embeddings, facilitating tasks like semantic similarity search and information retrieval. It was released in September 2023 and was developed by the BAAI (the Beijing Academy of Artificial Intelligence).

Computer Vision

ModelDescription
yolov11x-​image-​segmentationThis YOLO model, developed by Ultralytics, excels in real-time instance segmentation, enabling precise identification and delineation of objects within images.
yolov11x-​object-​detectionThis YOLO model, developed by Ultralytics, is a state-of-the-art object detection model released in 2024, offering improved performance in real-time object detection across various image types.

Image Generation

ModelDescription
stable-​diffusion-​xl-​base-​v10SDXL, developed by Stability AI, is an advanced text-to-image model released in July 2023, offering enhanced image generation capabilities.

Natural Language Processing

ModelDescription
roberta-​base-​go_emotionsThis model, developed by Sam Lowe, is a fine-tuned version of RoBERTa for multi-label emotion classification, released in 2020. It is designed to identify 28 different emotions in text.
bart-​large-​cnnThe Bart large CNN model, published by Meta AI, is a fine-tuned version of BART specifically trained for summarization tasks using the CNN/Daily Mail dataset. It is one of the best pre-trained models available for abstractive text summarization.
bert-​base-​NERThis Bert base NER model is a fine-tuned version of BERT, developed by dslim. This model is specifically trained for Named Entity Recognition (NER) tasks, enabling it to identify and classify entities such as locations, organizations, persons, and miscellaneous entities within text. It was fine-tuned on the English version of the CoNLL-2003 Named Entity Recognition dataset.
bert-​base-​multilingual-​uncased-​sentimentThis Bert base model is a fine-tuned version of BERT, developed by NLP Town. This model is specifically trained for sentiment analysis across six languages: English, Dutch, German, French, Spanish, and Italian. It predicts the sentiment of a review on a scale of 1 to 5 stars.

Translation

ModelDescription
t5-​largeThe T5-large model, developed by Google Research, is designed to translate English text into other languages.

Audio Analysis

ModelDescription
nvr-​tts-​it-​itThis NVIDIA TTS model generates natural-sounding Italian speech from raw text without requiring additional information.
nvr-​tts-​de-​deThis NVIDIA TTS model generates natural-sounding German speech from raw text without requiring additional information.
nvr-​tts-​en-​usThis NVIDIA TTS model generates natural-sounding American English speech from raw text without requiring additional information.
nvr-​asr-​en-​gbThis NVIDIA ASR model allows you to recognize and transcribe British English audio speech into text. It is trained on diverse datasets to ensure high accuracy across different accents and domains.
nvr-​asr-​fr-​frThis NVIDIA ASR model allows you to recognize and transcribe European French audio speech into text. It is trained on diverse datasets to ensure high accuracy across different accents and domains.
nvr-​asr-​es-​esThis NVIDIA ASR model allows you to recognize and transcribe European Spanish audio speech into text. It is trained on diverse datasets to ensure high accuracy across different accents and domains.
nvr-​tts-​es-​esThis NVIDIA TTS model generates natural-sounding European Spanish speech from raw text without requiring additional information.
nvr-​asr-​en-​usThis NVIDIA ASR model allows you to recognize and transcribe American English audio speech into text. It is trained on diverse datasets to ensure high accuracy across different accents and domains.

Pricing

OVHcloud AI endpoints are charged per 1m tokens. Below is a comparison with the comparable services of Azure AI, AWS, Scaleway and IONOS for the Llama 3.3 70B Instruct model.

VendorModelInput Tokens €/1MOutput Tokens €/1M
IONOS AI Model HubLlama 3.3 70B Instruct1,50 €1,75 €
Scaleway Generative APIsLlama 3.3 70B Instruct0,90 €0,90 €
OVHcloud AI EndpointsLlama 3.3 70B Instruct0,79 €0,79 €
AWS BedrockLlama 3.3 70B Instruct0,72 €0,72 €
Azure AILlama 3.3 70B Instruct0,268 €0,354 €

Try the OVHcloud AI Endpoints

Check out the OVHcloud AI endpoints here: OVHcloud AI Endpoints Playground.

To top