Request LLM API key & documentation – LLMs as a Service

Experimental OpenAI-compatible local LLM inference instances of NHR@FAU – LLMs as a Service

Most of the enhances models will be turned off again on June 22

There is still no funding for hardware or compute resources. Most of the extended models therefore have to be turned off again on June 22 as stated in the table below.

Terms of use:

  • This experimental service is provided as-is for scientific research and development only.
  • The service may fail or be stopped at any time without prior notice.
  • The models provided are subject to change at any time at NHR@FAU’s sole discretion.
  • NHR@FAU may introduce rate limiting as needed once too many requests come in.
  • There is no right or entitlement to an API key. NHR@FAU may refuse or revoke API keys at any time at its sole discretion.
  • Usage per API key is accounted (i.e. the number of tokens transferred and the model used), but queries&answers are never stored.
  • Using the local LLM inference instances of NHR@FAU, you agree to provide once per year a short scientific report on its usage upon request.

With regard to the EU AI Act:

  • The EU AI Act is not applicable owing to the exception of scientific research and development according to Art. 2 (6).
  • We provide Open Weight models as-is; detailed model cards for all provided models are available on Huggingface. That’s also the place where we download the pre-trained models.
  • We typically use vLLM as inference software. We always run the inference on-premis on our data center hardware at FAU.
  • Customers of the API endpoint are solely responsible for the risk classification and for complying with the obligations of the EU AI Act within their specific use case.
  • We have no knowledge of your concrete deployment scenario and therefore cannot perform any end-user risk assessment.
  • The usage for all kinds of high-risk applications is forbidden.

Application



    Usage examples

    Once you got your personal API key, you can use the LLM inference instances as follows (assuming you make the API key available to the code by setting the environment variable LLMAPI_KEY):

    Show available models with curl:

    curl -s -H "Authorization: Bearer $LLMAPI_KEY" \
             https://hub.nhr.fau.de/api/llmgw/v1/models | jq .

    Simple chat using the OpenAI python module:

    from openai import OpenAI
    import os
    
    # Initialize client with private endpoint URL and API key
    client = OpenAI(
        # This is the default and can be omitted
        api_key=os.getenv("LLMAPI_KEY"),
        base_url="https://hub.nhr.fau.de/api/llmgw/v1"
    )
    
    # Create a chat completion request
    response = client.chat.completions.create(
        model="gpt-oss-120b", # Replace with a model name available to you!
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": """Help me using the OpenAI API!
               Keep in mind that I have to use https://hub.nhr.fau.de/api/llmgw/v1
               as base URL for the API and consider that OpenAI changed its API in version 1.0.0.
               Simple curl command lines are also welcome."""}
        ],
        temperature=0.7 # Optional parameter
    )
    
    # Print the response
    print(response.choices[0].message.content)

    Use with OpenCode:

    OpenCode cannot auto-detected the available models automatically. Thus, they have to be specified manually. Some possibly need to be tweaked with add additional settings like here https://opencode.ai/docs/providers/#example.
    Sample for ~/.config/opencode/opencode.json (only add models which are really available!):

    {
    "$schema": "https://opencode.ai/config.json",
    "provider": {
    "myprovider": {
    "npm": "@ai-sdk/openai-compatible",
    "name": "NHR@FAU",
    "options": {
    "baseURL": "https://hub.nhr.fau.de/api/llmgw/v1"
    },
    "models": {
    "gpt-oss-120b": {
    "name": "gpt-oss-120b"
    },
    "Model123": {
    "name": "Model123"
    }
    }
    }
    }
    }

    You can now start opencode again in your project directory, type /connect, select the newly created NHR@FAU provider and enter your API key when prompted. Alternatively, the API key can be set by creating ~/.local/share/opencode/auth.json with:

    {
        "myprovider": {
            "type": "api",
            "key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
        }
    }

    Now you can type /model and select one of the defined models.

    For more information on OpenCode check https://opencode.ai/docs/ and here https://opencode.ai/docs/tui/.

    Use with Claude Code:

    Set the following environment variables before starting claude:

    ANTHROPIC_BASE_URL=https://hub.nhr.fau.de/api/llmgw
    ANTHROPIC_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    ANTHROPIC_DEFAULT_SONNET_MODEL=...
    ANTHROPIC_DEFAULT_OPUS_MODEL=...
    ANTHROPIC_DEFAULT_HAIKU_MODEL=...

    Specify appropriate models you have access to.

    Use with Qwen Code:

    Set the following environment variables before starting qwen:

    OPENAI_BASE_URL=https://hub.nhr.fau.de/api/llmgw/v1
    OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    OPENAI_MODEL=...

    Specify appropriate model you have access to.

    Use with Kimi Code:

    Set the following environment variables before starting kimi:

    export KIMI_BASE_URL=https://hub.nhr.fau.de/api/llmgw/v1
    export KIMI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    export KIMI_MODEL_NAME=...

    Specify appropriate model you have access to.

    For more details see https://www.kimi.com/code/docs/en/kimi-code-cli/configuration/environment-variables.html.


    Models (as of May 13, 2026 / last update Jun 10, 2026)

    Available models

    Not all listed models are generally available!

    You can always check your access using

    curl -s -H "Authorization: Bearer $LLMAPI_KEY" \
            https://hub.nhr.fau.de/api/llmgw/v1/models | jq .

    Many models need (new) sponsors for their hardware or their compute cycles!

    Additional models can only be added if people sponsor the required hardware.

    Model (with HuggingFace link)License & originSponsorHardware & sizeNotes
    GaleneAI/Magistral-Small-2509-FP8-DynamicApache-2.0;
    originally 🇫🇷
    BKI for DVB (temporarily)1x H100
    24B
    2026-06-09: Upgraded from vLLM-0.11.0 to 0.20.2 due to issues with interleaved reasoning output
    RedHatAI/Mistral-Small-3.2-24B-Instruct-2506-FP8Apache-2.0;
    originally 🇫🇷
    BKI for DVB
    (temporarily)
    1x H100
    24B
    lightonai/LightOnOCR-2-1BApache-2.0; 🇫🇷
    BKI for DVB
    (temporarily)
    shared H100
    1B
    OCR model
    lamaindex/vdr-2b-multi-v1Apache-2.0; 🇺🇸BKI for DVB
    (temporarily)
    shared H100embedding model
    intfloat/multilingual-e5-largeMIT; 🇺🇸/🇨🇳-/-shared H100embedding model
    ibm-granite/granite-4.1-3bApache-2.0; 🇺🇸-/-shared H100
    dense 3B
    context length limited to 64k
    2026-06-09: enabled tool-call
    microsoft/Phi-4-mini-instructMIT; 🇺🇸-/-shared H100
    4B
    context length limited to 16k
    2026-06-09: enabled tool-call
    google/gemma-4-E4B-itApache-2.0; 🇺🇸FAU1x L4
    8B
    RedHatAI/gemma-4-31B-it-FP8-blockApache-2.0; 🇺🇸1x H100
    31B (MoE)
    limited availability until June 22, 2026 for evaluation purpose only
    Qwen/Qwen3.6-35B-A3B-FP8Apache-2.0; 🇨🇳NHR director’s budget1x H100
    35B (MoE)
    openai/gpt-oss-120bApache-2.0; 🇺🇸A group from Uni-Würzburg2x H100
    120B (MoE)
    since 2026-05-13 with speculative Eagle3-v3
    mistralai/Mistral-Medium-3.5-128Bmodified MIT; 🇫🇷4x H100
    dense 128B
    limited availability until June 22, 2026 for evaluation purpose only
    deepseek-ai/DeepSeek-V284B4-FlashMIT; 🇨🇳A group from HS-Hof4x H100
    284B (MoE)
    limited availability until June 22, 2026 for evaluation purpose only;
    very long context
    moonshotai/Kimi-K2.6modified MIT; 🇨🇳8x RTX Pro 6000 BSE
    1T (MoE)
    limited availability until June 22, 2026 for evaluation purpose only
    NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4OpenMDW-1.1; 🇺🇸8x RTX Pro 6000 BSE
    550B
    limited availability until June 22, 2026 for evaluation purpose only
    License and origin are based on information from HuggingFace.

    You can always check your available models using

    curl -s -H "Authorization: Bearer $LLMAPI_KEY" \
            https://hub.nhr.fau.de/api/llmgw/v1/models | jq .

    Deprecated models (to be removed soon; already not generally available)

    Changelog

    • 2026-05-13: enabled speculative Eagle3-v3 for openai/gpt-oss-120b
    • 2026-06-09: enabled tool-call for ibm-granite/granite-4.1-3b
    • 2026-06-09: enabled tool-call for microsoft/Phi-4-mini-instruct
    • 2026-06-09: Upgraded from vLLM-0.11.0 to 0.20.2 for GaleneAI/Magistral-Small-2509-FP8-Dynamic due to issues with interleaved reasoning output
    • 2026-06-10: temporarily added NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4
    • 2026-06-11: rate limiting and access restrictions had to be introduced due to excessive use and overloading of models
    • 2026-06-12: Qwen/Qwen3-VL-8B-Instruct, Qwen/Qwen3.5-9B, RedHatAI/gemma-3-27b-it-quantized.w4a16 have been removed (as announced in May);
      RedHatAI/gemma-4-31B-it-FP8-block has been added instead