OVHcloud AI endpoints are now in General Availability
OVHcloud officially announced last week the General Availability of the OVHcloud AI Endpoints. These make more than 40 models for text, speech, image and easily accessible behind ready-to-use API endpoints.
The AI endpoints are a very convenient, easy to use way to leverage powerful AI models for GenAI, text-to-speech, image recognition, image generation, working with code, and more, in a no-commitment pay-per-use pricing model.
Available Models at OVHcloud
OVHcloud are launching the AI endpoints with over 40 models. The available models are listed below.
LLM (Large Language Models)
High-performance templates designed to understand and generate text in a human way, used for chatbots, text summarization, or content creation.
Model
Description
Llama-3.1-8B-Instruct
Llama 3.1 (8B parameters version) is an auto-regressive language model that uses an optimized transformer architecture. It was released by Meta AI on July 23, 2024, and utilizes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The knowledge cutoff for this model is December 31, 2023.
Llama-3.1-70B-Instruct
Llama 3.1 (70B parameters version) is an auto-regressive language model that uses an optimized transformer architecture. It was released by Meta AI on July 23, 2024, and utilizes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The knowledge cutoff for this model is December 31, 2023.
Llama-3.3-70B-Instruct
Llama 3.3 is an instruction-tuned generative language model optimized for multilingual dialogue use cases. It was released by Meta AI on December 6, 2024, and utilizes an advanced transformer architecture and is designed to align with human preferences for helpfulness and safety through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). The knowledge cutoff for this 70B model is December 31, 2023.
Mistral-Nemo-Instruct-2407
The Mistral-Nemo-Instruct-2407 model, developed collaboratively by Mistral AI and NVIDIA, is an instruction-tuned LLM released in 2024. Designed for multilingual applications, it excels in tasks such as conversational dialogue, code generation, and instructional comprehension across various languages.
Reasoning LLM
Model
Description
DeepSeek-R1-Distill-Llama-70B
The DeepSeek-R1-Distill-Llama-70B model is a model trained via large-scale reinforcement learning. It was released by DeepSeek on January 20, 2025, and it is a distilled version of the Llama 3.3 70B model. The knowledge cutoff date for this model is July 1, 2024.
Visual LLM
Model
Description
Qwen2.5-VL-72B-Instruct
Qwen2.5-VL is a powerful vision-language model, designed for advanced image understanding. It can generate detailed image captions, analyze documents, OCR, detect objects, and answer questions based on visuals, making it useful for AI assistants, RAG and Agents.
llava-next-mistral-7b
LLaVa combines a pre-trained large language model with a pre-trained vision encoder for multimodal (image + text) chatbot use cases. LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets.
Code LLM
Model
Description
Qwen2.5-Coder-32B-Instruct
Qwen2.5-VL is a powerful vision-language model, designed for advanced image understanding. It can generate detailed image captions, analyze documents, OCR, detect objects, and answer questions based on visuals, making it useful for AI assistants, RAG and Agents.
mamba-codestral-7B-v0.1
BGE-Multilingual-Gemma2 is a LLM-based multilingual embedding model. It is trained on a diverse range of languages and tasks. BGE-Multilingual-Gemma2 primarily demonstrates the following advancements: Diverse training data: The model’s training data spans a broad range of languages, including English, Chinese, Japanese, Korean, French, and more.Additionally, the data covers a variety of task types, such as retrieval, classification, and clustering. Outstanding performance: The model exhibits state-of-the-art (SOTA) results on multilingual benchmarks like MIRACL, MTEB-pl, and MTEB-fr. It also achieves excellent performance on other major evaluations, including MTEB, C-MTEB and AIR-Bench.
Embeddings
Model
Description
bge-multilingual-gemma2
BGE-Multilingual-Gemma2 is a LLM-based multilingual embedding model. It is trained on a diverse range of languages and tasks. BGE-Multilingual-Gemma2 primarily demonstrates the following advancements: Diverse training data: The model’s training data spans a broad range of languages, including English, Chinese, Japanese, Korean, French, and more.Additionally, the data covers a variety of task types, such as retrieval, classification, and clustering. Outstanding performance: The model exhibits state-of-the-art (SOTA) results on multilingual benchmarks like MIRACL, MTEB-pl, and MTEB-fr. It also achieves excellent performance on other major evaluations, including MTEB, C-MTEB and AIR-Bench.
BGE-M3
The BGE-M3 model, developed by the BAAI and released in early 2024, is a powerful multilingual multifunctional and multigranular embedding model designed for retrieval tasks. It supports dense, multi-vector, and sparse retrieval while handling texts of varying lengths across over 100 languages.
bge-base-en-v1.5
This model converts English text into dense vector embeddings, facilitating tasks like semantic similarity search and information retrieval. It was released in September 2023 and was developed by the BAAI (the Beijing Academy of Artificial Intelligence).
Computer Vision
Model
Description
yolov11x-image-segmentation
This YOLO model, developed by Ultralytics, excels in real-time instance segmentation, enabling precise identification and delineation of objects within images.
yolov11x-object-detection
This YOLO model, developed by Ultralytics, is a state-of-the-art object detection model released in 2024, offering improved performance in real-time object detection across various image types.
Image Generation
Model
Description
stable-diffusion-xl-base-v10
SDXL, developed by Stability AI, is an advanced text-to-image model released in July 2023, offering enhanced image generation capabilities.
Natural Language Processing
Model
Description
roberta-base-go_emotions
This model, developed by Sam Lowe, is a fine-tuned version of RoBERTa for multi-label emotion classification, released in 2020. It is designed to identify 28 different emotions in text.
bart-large-cnn
The Bart large CNN model, published by Meta AI, is a fine-tuned version of BART specifically trained for summarization tasks using the CNN/Daily Mail dataset. It is one of the best pre-trained models available for abstractive text summarization.
bert-base-NER
This Bert base NER model is a fine-tuned version of BERT, developed by dslim. This model is specifically trained for Named Entity Recognition (NER) tasks, enabling it to identify and classify entities such as locations, organizations, persons, and miscellaneous entities within text. It was fine-tuned on the English version of the CoNLL-2003 Named Entity Recognition dataset.
bert-base-multilingual-uncased-sentiment
This Bert base model is a fine-tuned version of BERT, developed by NLP Town. This model is specifically trained for sentiment analysis across six languages: English, Dutch, German, French, Spanish, and Italian. It predicts the sentiment of a review on a scale of 1 to 5 stars.
Translation
Model
Description
t5-large
The T5-large model, developed by Google Research, is designed to translate English text into other languages.
Audio Analysis
Model
Description
nvr-tts-it-it
This NVIDIA TTS model generates natural-sounding Italian speech from raw text without requiring additional information.
nvr-tts-de-de
This NVIDIA TTS model generates natural-sounding German speech from raw text without requiring additional information.
nvr-tts-en-us
This NVIDIA TTS model generates natural-sounding American English speech from raw text without requiring additional information.
nvr-asr-en-gb
This NVIDIA ASR model allows you to recognize and transcribe British English audio speech into text. It is trained on diverse datasets to ensure high accuracy across different accents and domains.
nvr-asr-fr-fr
This NVIDIA ASR model allows you to recognize and transcribe European French audio speech into text. It is trained on diverse datasets to ensure high accuracy across different accents and domains.
nvr-asr-es-es
This NVIDIA ASR model allows you to recognize and transcribe European Spanish audio speech into text. It is trained on diverse datasets to ensure high accuracy across different accents and domains.
nvr-tts-es-es
This NVIDIA TTS model generates natural-sounding European Spanish speech from raw text without requiring additional information.
nvr-asr-en-us
This NVIDIA ASR model allows you to recognize and transcribe American English audio speech into text. It is trained on diverse datasets to ensure high accuracy across different accents and domains.
Pricing
OVHcloud AI endpoints are charged per 1m tokens. Below is a comparison with the comparable services of Azure AI, AWS, Scaleway and IONOS for the Llama 3.3 70B Instruct model.
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OkPrivacy policy