AI Art Glossary (A-Z)

Every AI art and image generation term explained in plain language. Your complete reference from beginner basics to advanced technical concepts.

40+ Terms Updated March 2026 Beginner Friendly

Quick Reference: Most Important AI Art Terms

The essential terms every AI artist needs to know: Prompt (your text description), Negative Prompt (what to exclude), CFG Scale (how closely to follow the prompt), Sampling Steps (how many refinement passes), Seed (the random starting point for reproducibility), Checkpoint (the AI model file), and LoRA (small style add-ons). Master these seven concepts and you can use any AI art tool effectively.

A

Aspect Ratio

The proportional relationship between an image's width and height. Common aspect ratios in AI art include 1:1 (square, great for social media), 16:9 (widescreen, ideal for desktop wallpapers), 2:3 (portrait orientation), and 3:2 (landscape photography). In Midjourney, you set this with --ar 16:9. In Stable Diffusion, you adjust width and height pixel values directly. Choosing the right aspect ratio before generation avoids awkward cropping later.

Attention Weighting

A technique for emphasizing or de-emphasizing specific words within a prompt. In Stable Diffusion, parentheses increase weight: (golden hair:1.3) makes the AI pay 30% more attention to "golden hair." In Midjourney, you use ::2 syntax. This gives you fine-grained control over which elements of your prompt the AI prioritizes. Useful when the AI keeps ignoring a specific detail.

B

Batch Size

The number of images generated simultaneously in a single operation. Generating a batch of 4 images at once is faster than generating 4 images one at a time because the GPU can process them in parallel. Larger batch sizes require more VRAM. Most users set batch size to 1-4 for standard generation and increase it when exploring variations.

Batch Count

The number of sequential batches to generate. If batch size is 4 and batch count is 3, you will receive 12 images total (4 per batch, 3 batches in sequence). Unlike batch size, increasing batch count does not require additional VRAM — it simply takes proportionally longer.

C

CFG Scale (Classifier-Free Guidance Scale)

A parameter that controls how strictly the AI follows your text prompt versus generating freely. Low CFG (1-4) produces creative, sometimes unexpected results with soft, dreamy qualities. Medium CFG (5-9) is the sweet spot for most generations, balancing prompt adherence with natural-looking results. High CFG (10-20) forces strict prompt following but can cause oversaturation, artifacts, and an artificial look. Most users keep CFG between 7 and 9.

Checkpoint

A saved file containing all the trained weights and parameters of an AI model. Think of it as the "brain" that determines the AI's artistic capabilities. Different checkpoints produce different styles: some excel at photorealism, others at anime, others at concept art. Popular checkpoints include Stable Diffusion v1.5, SDXL, Realistic Vision, and DreamShaper. Checkpoint files are typically 2-7 GB in size.

CLIP (Contrastive Language-Image Pre-training)

The neural network component that translates your text prompt into numerical representations the image generation model can understand. CLIP is what connects human language to visual concepts. When you type "sunset over mountains," CLIP converts that phrase into a mathematical embedding that guides the diffusion process. Different CLIP models interpret language differently, which is why the same prompt can produce different results across tools.

ComfyUI

A node-based graphical interface for Stable Diffusion that lets you build complex image generation workflows by connecting visual blocks. More powerful than the standard WebUI for advanced users because it allows branching workflows, conditional logic, and custom pipelines. Steeper learning curve but unmatched flexibility for professional workflows.

ControlNet

A neural network extension that gives you precise spatial control over AI image generation. Instead of relying solely on text descriptions for composition, ControlNet lets you provide structural guidance through reference inputs: edge maps (Canny), depth maps, pose skeletons (OpenPose), segmentation maps, and more. This means you can specify exactly where the subject stands, how the scene is composed, and what the spatial layout looks like — transforming AI art from "close enough" to "exactly what I envisioned."

D

Denoising Strength

In img2img workflows, this parameter controls how much the AI changes the input image. A low value (0.1-0.3) makes subtle adjustments while preserving most of the original. A high value (0.7-1.0) dramatically transforms the image, using the original mainly as a rough composition guide. This is the most important slider in img2img — finding the right denoising strength is the key to getting results that balance your input reference with creative transformation.

Diffusion

The core process behind most modern AI image generation. The AI starts with random noise (like TV static) and gradually removes it in steps, guided by your text prompt, until a coherent image emerges. This is why the process is called "diffusion" — it is the reverse of adding noise. Each step makes the image slightly clearer and more aligned with your prompt. The number of these steps is controlled by the "sampling steps" parameter.

DreamBooth

A fine-tuning technique that teaches an AI model to recognize and reproduce a specific subject (a person, pet, object, or style) from just 5-30 training images. Unlike LoRA, DreamBooth modifies the entire model checkpoint. It is the go-to method for creating personalized AI models that can generate images of a specific individual or product in any style, pose, or setting.

E

Embedding (Textual Inversion)

A small file that teaches the AI a new concept, style, or object by associating it with a trigger word. Unlike LoRA or DreamBooth, embeddings only add a new "vocabulary word" to the model without changing its weights. They are tiny (typically a few KB) and can capture specific concepts like "a particular art style" or "a specific type of lighting." Negative embeddings are especially popular for improving quality by teaching the model what "bad quality" looks like so it can avoid it.

Euler (Sampler)

One of the most common sampling algorithms used in Stable Diffusion. Euler is fast, produces consistent results, and works well at lower step counts (20-30 steps). The variant "Euler a" (Euler Ancestral) adds stochastic noise at each step, producing more creative and varied results but less reproducible outputs. Euler is often recommended as the default sampler for beginners.

F

Fine-Tuning

The process of further training a pre-trained AI model on specific data to specialize it for a particular task, style, or subject. Fine-tuning is how community models like Realistic Vision or DreamShaper are created — by taking the base Stable Diffusion model and training it further on curated datasets. Methods include full fine-tuning (modifying all weights), DreamBooth (subject-specific), and LoRA (lightweight adaptation).

Flux

A family of open-source AI image generation models developed by Black Forest Labs (founded by former Stability AI researchers). Flux models are known for exceptional photorealism, accurate text rendering within images, and strong prompt adherence. Flux.1 comes in three variants: Pro (API-only, highest quality), Dev (open-weight, non-commercial), and Schnell (fastest, Apache 2.0 license). As of 2026, Flux represents the cutting edge of open-source image generation.

G

Guidance Scale

See CFG Scale. These terms are used interchangeably across different AI art tools. Midjourney uses the term "stylize" for a related but distinct concept that controls aesthetic enhancement.

GPU (Graphics Processing Unit)

The hardware that powers local AI image generation. AI models require massive parallel computations that GPUs excel at. For running Stable Diffusion locally, you need an NVIDIA GPU with at least 6GB VRAM (8GB+ recommended). AMD GPUs have limited support. Cloud services like Google Colab, RunPod, and Vast.ai offer GPU access for users without powerful local hardware.

H

Hallucination

When an AI generates elements that were not requested and do not make logical sense — extra fingers on hands, text that is gibberish, objects merging into each other, or anatomically impossible features. Hallucinations are a known limitation of current AI models. They can be reduced (but not eliminated) through negative prompts, higher sampling steps, appropriate CFG values, and newer model versions.

Hires Fix (High Resolution Fix)

A two-pass generation technique in Stable Diffusion that first creates a low-resolution image and then upscales and refines it. This avoids the composition problems that occur when generating directly at high resolutions (like duplicate subjects or incoherent layouts). Hires fix produces cleaner, more detailed images than single-pass generation and is considered essential for outputs above 768x768 pixels.

I

img2img (Image-to-Image)

A generation mode that uses an existing image as a starting point and transforms it based on a text prompt. You provide a reference image and the AI modifies it according to your description and the denoising strength setting. Use cases include: transforming a sketch into a finished illustration, changing the style of a photo (photo to oil painting), adding details to a rough composition, and iterating on a previous AI generation.

Inpainting

Selectively regenerating specific areas of an image while keeping the rest intact. You "paint" a mask over the area you want to change, write a prompt describing what should appear there, and the AI fills in only the masked region while maintaining consistency with the surrounding image. Essential for fixing hands, faces, small details, or replacing specific objects without regenerating the entire image.

K

Karras (Noise Schedule)

A noise scheduling algorithm (developed by Tero Karras at NVIDIA) that controls how noise is added and removed during the diffusion process. The Karras schedule front-loads more denoising in the early steps and uses finer refinement in later steps, often producing sharper, more detailed results than the default schedule. In Stable Diffusion, you will see samplers like "DPM++ 2M Karras" — the "Karras" suffix indicates this improved noise schedule.

L

Latent Space

A compressed mathematical representation of image data where the AI actually performs its generation work. Instead of working directly with millions of pixels, diffusion models operate in this lower-dimensional latent space, which is far more computationally efficient. This is why the process is called "latent diffusion." The VAE component handles encoding images into latent space and decoding them back into pixel space.

LoRA (Low-Rank Adaptation)

A lightweight fine-tuning technique that creates small add-on files (typically 10-200 MB) which modify a base model's behavior to learn specific styles, characters, or concepts. LoRAs are the most popular way to customize AI art because they are small, stackable (you can combine multiple LoRAs), and do not require retraining the entire model. Community-created LoRAs are available on platforms like CivitAI and Hugging Face, covering everything from specific art styles to individual characters.

M

Mask

In inpainting and outpainting workflows, the mask defines which area of the image the AI should regenerate. Masked (usually white) areas are regenerated; unmasked (black) areas are preserved. Precise masking is critical for clean results. Advanced masking techniques include soft edges (feathering) for smooth blending, and using separate masks for different prompt segments.

Midjourney

A proprietary AI art generation tool known for producing highly aesthetic, stylized outputs. Accessed through Discord (and now a web interface), Midjourney excels at artistic interpretations, dramatic lighting, and visual polish. It uses a subscription model ($10-60/month) and does not allow local installation. Midjourney is often considered the easiest tool for producing "beautiful" results with minimal prompt engineering.

Model Merging

The technique of combining two or more checkpoint models into a single new model that blends their capabilities. For example, merging a photorealistic model with an anime model can create a model that produces anime-style images with realistic lighting and detail. Merging ratios control how much of each source model contributes to the result.

N

Negative Prompt

A text description of everything you do not want in your generated image. Negative prompts are essential for quality control. A basic negative prompt might include: blurry, low quality, watermark, text, deformed, bad anatomy, extra fingers. Advanced negative prompts target specific unwanted artifacts of each model. Some models also support negative embeddings (like "BadDream" or "UnrealisticDream") that encode broad quality exclusions more effectively than text alone.

O

Outpainting

Extending an existing image beyond its original borders. The AI generates new content that seamlessly continues the composition, maintaining style, perspective, and lighting consistency. Outpainting is used to change an image's aspect ratio (turning a square into a widescreen panorama), add more environment context, or create extended backgrounds for design projects. Both DALL-E and Stable Diffusion support outpainting natively.

Overfitting

When a fine-tuned model has learned its training data too specifically and can only reproduce those exact images rather than generalizing the concept. An overfitted face LoRA, for example, might only generate the subject in the exact poses and lighting conditions from the training images. Preventing overfitting requires careful training with appropriate learning rates, step counts, and diverse training data.

P

Prompt

The text description you provide to an AI art generator that describes the image you want to create. The prompt is the primary creative input in AI art. Effective prompts typically include: subject description, art style, lighting, mood/atmosphere, composition, and quality modifiers. Prompt engineering — the skill of crafting effective prompts — is the most important skill in AI art creation.

Prompt Weighting

See Attention Weighting. The syntax varies by tool: Stable Diffusion uses (word:1.3), Midjourney uses word::1.3, and some tools use capitalization or repetition for emphasis.

R

Real-ESRGAN

A popular AI upscaling model that increases image resolution while adding realistic detail. It is the most widely used upscaler in AI art workflows, capable of 2x-4x enlargement with impressive quality. Different Real-ESRGAN variants are optimized for different content: the standard model works best for photorealistic images, while anime-specific variants handle illustrated styles better.

Refiner

In SDXL workflows, the refiner is a secondary model that processes the output of the base model to add fine details, improve textures, and enhance overall quality. The base model handles composition and major elements, then hands off to the refiner for the finishing pass. The "switch point" (typically 0.7-0.8) determines when the transition occurs. Using a refiner typically adds 20-40% more generation time but noticeably improves quality.

S

Sampler (Sampling Method)

The algorithm that controls how the AI removes noise step by step during image generation. Different samplers produce different results even from identical prompts and seeds. Popular samplers include: Euler (fast, consistent), DPM++ 2M Karras (high quality, widely recommended), DDIM (deterministic, good for animations), and UniPC (fast convergence). The choice of sampler affects image quality, generation speed, and how many steps are needed.

Sampling Steps

The number of denoising iterations the AI performs during image generation. More steps generally means more detail and refinement, but with diminishing returns. 20-30 steps is the sweet spot for most samplers. Below 15 steps, images look unfinished. Above 50 steps, improvements are minimal but generation time doubles. Some samplers (like DDIM) converge faster and need fewer steps, while others (like Euler) benefit from more.

SDXL (Stable Diffusion XL)

An advanced version of Stable Diffusion that generates images at 1024x1024 base resolution (compared to 512x512 for SD 1.5). SDXL uses a two-stage pipeline with a base model and optional refiner. It offers significantly improved image quality, better text rendering, more coherent compositions, and enhanced prompt understanding. SDXL requires more VRAM than SD 1.5 (8GB minimum recommended) but produces noticeably superior results.

Seed

A number that determines the initial random noise pattern used to start image generation. Using the same seed with the same prompt, model, and settings produces an identical image every time. This makes results reproducible and allows systematic experimentation: change one parameter while keeping the seed fixed to see exactly how that parameter affects the output. Setting seed to -1 (or "random") uses a different seed each time for varied results.

Stable Diffusion

An open-source AI image generation model originally created by Stability AI. Stable Diffusion is the foundation of the open-source AI art ecosystem, allowing anyone to run image generation locally, create custom models, and build tools on top of it. Its open nature has spawned thousands of community models, LoRAs, and tools. Available versions include SD 1.5 (most community models), SD 2.1, SDXL, and SD 3.x.

Style Transfer

Applying the visual style of one image to the content of another. In AI art, this is commonly achieved through img2img workflows (using a content image as input with a style-describing prompt), IP-Adapter (which can extract style from a reference image), or specific LoRAs trained on particular art styles. Style transfer lets you reimagine any photograph or image in the visual language of any art style.

T

Token

The smallest unit of text that the AI model processes. Prompts are broken into tokens before being interpreted. Most models have a token limit (typically 75-77 tokens for SD 1.5, 150+ for SDXL). Words like "lighthouse" count as one token, while complex words may be split into multiple tokens. Understanding token limits helps you write prompts that stay within the model's processing capacity and ensures all your descriptions are actually "read" by the AI.

txt2img (Text-to-Image)

The standard generation mode where you provide only a text prompt and the AI creates an image from scratch. This is the most common and intuitive way to create AI art. The AI starts with random noise (determined by the seed) and progressively shapes it into an image guided by your text description. Compare with img2img, which also uses a reference image as input.

Training Data

The dataset of images and text descriptions used to train an AI model. The scope and quality of training data directly impacts what the model can generate. Models trained on photography datasets produce better photorealistic results. Models trained on anime datasets produce better illustrations. Understanding a model's training data helps you write prompts in the "language" the model best understands.

U

Upscaling

Increasing an image's resolution while preserving or enhancing detail. AI upscaling uses neural networks (like Real-ESRGAN, LDSR, or SwinIR) to intelligently add new pixels based on the image content, producing far superior results to traditional bicubic scaling. In AI art workflows, upscaling is typically the final step: generate at the model's native resolution, then upscale 2-4x for the final output. This approach avoids the composition issues that can occur when generating directly at high resolutions.

V

VAE (Variational Autoencoder)

The component that handles converting images between pixel space (what you see) and latent space (where the AI works). The encoder compresses images into latent representations, and the decoder reconstructs them back into pixels. Different VAE models can significantly affect color vibrancy, detail clarity, and overall image quality. Swapping the VAE is one of the easiest ways to improve output quality without changing the checkpoint model. Common signs of a bad VAE include washed-out colors, blurry details, and color banding.

VRAM (Video RAM)

The dedicated memory on a GPU used for storing model weights and processing data during image generation. VRAM is the primary hardware limitation for local AI art generation. Stable Diffusion 1.5 requires 4-6GB VRAM. SDXL needs 8GB minimum. Running ControlNet adds 2-4GB. Large batch sizes and high resolutions increase VRAM usage. If you exceed your VRAM capacity, generation either fails or falls back to slower system RAM processing.

W

WebUI (Automatic1111 / AUTOMATIC1111)

The most popular graphical web interface for running Stable Diffusion locally. Created by AUTOMATIC1111, it provides a browser-based interface for txt2img, img2img, inpainting, extensions, and model management. It supports thousands of community-created extensions that add features like ControlNet, additional samplers, and batch processing tools. The standard recommendation for anyone setting up Stable Diffusion for the first time.

Weight

See Attention Weighting. In the context of models, "weights" refers to the numerical parameters that define a trained model's knowledge — the checkpoint file is essentially a file of billions of weight values.