Experimental OpenAI-compatible local LLM inference instances of NHR@FAU – LLMs as a Service
Most of the enhances models will be turned off again on June 22
There is still no funding for hardware or compute resources. Most of the extended models therefore have to be turned off again on June 22 as stated in the table below.
Terms of use:
- This experimental service is provided as-is for scientific research and development only.
- The service may fail or be stopped at any time without prior notice.
- The models provided are subject to change at any time at NHR@FAU’s sole discretion.
- NHR@FAU may introduce rate limiting as needed once too many requests come in.
- There is no right or entitlement to an API key. NHR@FAU may refuse or revoke API keys at any time at its sole discretion.
- Usage per API key is accounted (i.e. the number of tokens transferred and the model used), but queries&answers are never stored.
- Using the local LLM inference instances of NHR@FAU, you agree to provide once per year a short scientific report on its usage upon request.
With regard to the EU AI Act:
- The EU AI Act is not applicable owing to the exception of scientific research and development according to Art. 2 (6).
- We provide Open Weight models as-is; detailed model cards for all provided models are available on Huggingface. That’s also the place where we download the pre-trained models.
- We typically use vLLM as inference software. We always run the inference on-premis on our data center hardware at FAU.
- Customers of the API endpoint are solely responsible for the risk classification and for complying with the obligations of the EU AI Act within their specific use case.
- We have no knowledge of your concrete deployment scenario and therefore cannot perform any end-user risk assessment.
- The usage for all kinds of high-risk applications is forbidden.
Application
Usage examples
Once you got your personal API key, you can use the LLM inference instances as follows (assuming you make the API key available to the code by setting the environment variable LLMAPI_KEY):
Show available models with curl:
curl -s -H "Authorization: Bearer $LLMAPI_KEY" \ https://hub.nhr.fau.de/api/llmgw/v1/models | jq .
Simple chat using the OpenAI python module:
from openai import OpenAI
import os
# Initialize client with private endpoint URL and API key
client = OpenAI(
# This is the default and can be omitted
api_key=os.getenv("LLMAPI_KEY"),
base_url="https://hub.nhr.fau.de/api/llmgw/v1"
)
# Create a chat completion request
response = client.chat.completions.create(
model="gpt-oss-120b", # Replace with a model name available to you!
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": """Help me using the OpenAI API!
Keep in mind that I have to use https://hub.nhr.fau.de/api/llmgw/v1
as base URL for the API and consider that OpenAI changed its API in version 1.0.0.
Simple curl command lines are also welcome."""}
],
temperature=0.7 # Optional parameter
)
# Print the response
print(response.choices[0].message.content)
Use with OpenCode:
OpenCode cannot auto-detected the available models automatically. Thus, they have to be specified manually. Some possibly need to be tweaked with add additional settings like here https://opencode.ai/docs/providers/#example.
Sample for ~/.config/opencode/opencode.json (only add models which are really available!):
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"myprovider": {
"npm": "@ai-sdk/openai-compatible",
"name": "NHR@FAU",
"options": {
"baseURL": "https://hub.nhr.fau.de/api/llmgw/v1"
},
"models": {
"gpt-oss-120b": {
"name": "gpt-oss-120b"
},
"Model123": {
"name": "Model123"
}
}
}
}
}
You can now start opencode again in your project directory, type /connect, select the newly created NHR@FAU provider and enter your API key when prompted. Alternatively, the API key can be set by creating ~/.local/share/opencode/auth.json with:
{
"myprovider": {
"type": "api",
"key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
}
Now you can type /model and select one of the defined models.
For more information on OpenCode check https://opencode.ai/docs/ and here https://opencode.ai/docs/tui/.
Use with Claude Code:
Set the following environment variables before starting claude:
ANTHROPIC_BASE_URL=https://hub.nhr.fau.de/api/llmgw ANTHROPIC_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ANTHROPIC_DEFAULT_SONNET_MODEL=... ANTHROPIC_DEFAULT_OPUS_MODEL=... ANTHROPIC_DEFAULT_HAIKU_MODEL=...
Specify appropriate models you have access to.
Use with Qwen Code:
Set the following environment variables before starting qwen:
OPENAI_BASE_URL=https://hub.nhr.fau.de/api/llmgw/v1 OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx OPENAI_MODEL=...
Specify appropriate model you have access to.
Use with Kimi Code:
Set the following environment variables before starting kimi:
export KIMI_BASE_URL=https://hub.nhr.fau.de/api/llmgw/v1 export KIMI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx export KIMI_MODEL_NAME=...
Specify appropriate model you have access to.
For more details see https://www.kimi.com/code/docs/en/kimi-code-cli/configuration/environment-variables.html.
Models (as of May 13, 2026 / last update Jun 10, 2026)
Available models
Not all listed models are generally available!
You can always check your access using
curl -s -H "Authorization: Bearer $LLMAPI_KEY" \
https://hub.nhr.fau.de/api/llmgw/v1/models | jq .
Many models need (new) sponsors for their hardware or their compute cycles!
Additional models can only be added if people sponsor the required hardware.
| Model (with HuggingFace link) | License & origin | Sponsor | Hardware & size | Notes |
|---|---|---|---|---|
| GaleneAI/Magistral-Small-2509-FP8-Dynamic | Apache-2.0; originally 🇫🇷 | BKI for DVB (temporarily) | 1x H100 24B | 2026-06-09: Upgraded from vLLM-0.11.0 to 0.20.2 due to issues with interleaved reasoning output |
| RedHatAI/Mistral-Small-3.2-24B-Instruct-2506-FP8 | Apache-2.0; originally 🇫🇷 | BKI for DVB (temporarily) | 1x H100 24B | |
| lightonai/LightOnOCR-2-1B | Apache-2.0; 🇫🇷 | BKI for DVB (temporarily) | shared H100 1B | OCR model |
| lamaindex/vdr-2b-multi-v1 | Apache-2.0; 🇺🇸 | BKI for DVB (temporarily) | shared H100 | embedding model |
| intfloat/multilingual-e5-large | MIT; 🇺🇸/🇨🇳 | -/- | shared H100 | embedding model |
| ibm-granite/granite-4.1-3b | Apache-2.0; 🇺🇸 | -/- | shared H100 dense 3B | context length limited to 64k 2026-06-09: enabled tool-call |
| microsoft/Phi-4-mini-instruct | MIT; 🇺🇸 | -/- | shared H100 4B | context length limited to 16k 2026-06-09: enabled tool-call |
| google/gemma-4-E4B-it | Apache-2.0; 🇺🇸 | FAU | 1x L4 8B | |
| RedHatAI/gemma-4-31B-it-FP8-block | Apache-2.0; 🇺🇸 | ⚡ | 1x H100 31B (MoE) | limited availability until June 22, 2026 for evaluation purpose only |
| Qwen/Qwen3.6-35B-A3B-FP8 | Apache-2.0; 🇨🇳 | NHR director’s budget | 1x H100 35B (MoE) | |
| openai/gpt-oss-120b | Apache-2.0; 🇺🇸 | A group from Uni-Würzburg | 2x H100 120B (MoE) | since 2026-05-13 with speculative Eagle3-v3 |
| mistralai/Mistral-Medium-3.5-128B | modified MIT; 🇫🇷 | ⚡ | 4x H100 dense 128B | limited availability until June 22, 2026 for evaluation purpose only |
| deepseek-ai/DeepSeek-V284B4-Flash | MIT; 🇨🇳 | A group from HS-Hof | 4x H100 284B (MoE) | limited availability until June 22, 2026 for evaluation purpose only; very long context |
| moonshotai/Kimi-K2.6 | modified MIT; 🇨🇳 | ⚡ | 8x RTX Pro 6000 BSE 1T (MoE) | limited availability until June 22, 2026 for evaluation purpose only |
| NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 | OpenMDW-1.1; 🇺🇸 | ⚡ | 8x RTX Pro 6000 BSE 550B | limited availability until June 22, 2026 for evaluation purpose only |
You can always check your available models using
curl -s -H "Authorization: Bearer $LLMAPI_KEY" \
https://hub.nhr.fau.de/api/llmgw/v1/models | jq .
Deprecated models (to be removed soon; already not generally available)
Qwen/Qwen3-VL-8B-Instruct; Apache-2.0 license; 🇨🇳; ⚡ 1 H100=> removed 2026-06-12Qwen/Qwen3.5-9B; Apache-2.0 license; 🇨🇳; ⚡ 0.5 H100=> removed 2026-06-12RedHatAI/gemma-3-27b-it-quantized.w4a16; Gemma license; 🇺🇸; ⚡ 1 H100=> removed 2026-06-12
Changelog
- 2026-05-13: enabled speculative Eagle3-v3 for openai/gpt-oss-120b
- 2026-06-09: enabled tool-call for ibm-granite/granite-4.1-3b
- 2026-06-09: enabled tool-call for microsoft/Phi-4-mini-instruct
- 2026-06-09: Upgraded from vLLM-0.11.0 to 0.20.2 for GaleneAI/Magistral-Small-2509-FP8-Dynamic due to issues with interleaved reasoning output
- 2026-06-10: temporarily added NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4
- 2026-06-11: rate limiting and access restrictions had to be introduced due to excessive use and overloading of models
- 2026-06-12: Qwen/Qwen3-VL-8B-Instruct, Qwen/Qwen3.5-9B, RedHatAI/gemma-3-27b-it-quantized.w4a16 have been removed (as announced in May);
RedHatAI/gemma-4-31B-it-FP8-block has been added instead