Text Generation | Open Creative Studio

The Text Generation section is where you can allow various AI models to rewrite the user prompts you wrote in the Prompts section of Open Creative Studio to obtain more pleasing images and videos.

Prompt Enricher for T2I

The Prompt Enricher for T2I function enriches your user prompt with additional text generated by a large language model (LLM).

The LLM enriching your user prompt requires instructions on how to perform the enrichment. These instructions are called system prompts.

An enriched prompt can dramatically change the quality of your generated images.

Open Creative Studio offers three example system prompts that you can use to enrich your prompts: a generic one, one focused on film still generation (AI Cinema), and one focused on collage art images.

Open Creative Studio allows you to use either centralized proprietary models (for example: OpenAI GPT-5, Anthropic Claude Opus 4.1, etc.) or open-access models (for example: gpt-oss-120b and gpt-oss-20b, LLaMA 4, Kimi K2, DeepSeek R1, Qwen 3, etc.) installed locally.

Centralized, proprietary LLMs

The use of centralized LLMs requires an API key. To set up your API key, follow these instructions.

WARNING: If you use centralized LLMs, you will be charged every time the Prompt Enricher for T2I function is enabled and a new ComfyUI run is processed.

Open-access, local LLMs

The use of local open access models requires the separate installation of an AI system like LM Studio, Msty, Ollama, or Oobabooga WebUI.

If you are uncertain about which open-access LLM to use, we recommend the use of LM Studio. You can follow this Prompt Enrichment with LM Studio guide to configure it.

Prompt Generator for Upscaler

If you don’t want to manually set a prompt for the Upscaler (SUPIR) function via the Prompt for Upscaler function, you can let a Visual Language Model (VLM) observe your source image, generate a caption of it, and then use that caption as the prompt for the Upscaler (SUPIR) function.

NOTICE: When you activate the Prompt Generator for Upscaler function, the Prompt for Upscaler function is ignored.

Open Creative Studio uses Florence-2 as the VLM of choice for this function.
Florence-2 is an open-access, local VLM known for its reliability and speed, and it requires zero configuration to work.

You can customize the way Florence-2 generates captions to obtain more or less sophisticated prompts.

Caption Generator for IPAdapter/Redux/I2V

The Caption Generator for IPAdapter/Redux/I2V function is designed to generate a caption that will serve as a user prompt for specific Open Creative Studio functions.

In particular, the generated caption will be useful for styling functions (IPAdapter and Redux) and for image-to-video (I2V) generation.

The Caption Generator for IPAdapter/Redux/I2V function works almost exactly like the Prompt Generator for Upscaler function, described above.

The main difference is that while the latter exclusively uses Florence-2 as VLM, this function allows you to choose any type of VLM.

You can use:

Florence-2, which requires zero configuration.
A commercial VLM like OpenAI GPT-4o, which requires an API key.
An open-access VLM that you have installed locally, like LLaMA 4, and you serve via LM Studio or an alternative AI system.

OCS 13.0 Caption Generator for IPAdapter-Redux-I2V

Notice that, just like for the Prompt Enricher for T2I function, the use of a commercial VLM requires an API key. To set up your API key, follow these instructions.

WARNING: Once you enable the Caption Generator for IPAdapter/Redux/I2V function replaces any positive prompt you have written with the generated caption.

Prompt Enrichers for T2V/V2V

You can automatically enrich your user prompt for both text-to-video (T2V) and video-to-video (V2V) generations via independent Prompt Enricher functions.

They work exactly like the Prompt Enricher for T2I function.

Speech / Audio Tags / Lyrics Generators

You can automatically enrich your speech, audio tags, and lyrics via independent Generator functions.

They work similarly to the Prompt Enricher for T2I function, but with a special capability.

If you also enable the Caption Generator for IPAdapter/Redux/I2V function, the user prompt will ask the LLM model of choice to generate speech, audio tags, and lyrics inspired by the caption of the image you selected as source image.