LoRA Fine-Tuning with Stable Diffusion: How Businesses Can Personalize AI Images (with Code)

In today's world of Generative AI, businesses across industries are looking beyond stock visuals and generic AI images. What they need are personalized, brand-specific, and context-aware visuals that improve user experience and drive conversions.

One of the most powerful techniques to achieve this is LoRA (Low-Rank Adaptation) fine-tuning — a lightweight method to adapt large AI models like Stable Diffusion to a specific style or industry dataset.

We originally built this pipeline for a travel personalization client, who wanted to show real-time destination previews inside their itinerary planner. For confidentiality, we're demonstrating the exact same code here on a public Yarn-Art style dataset — but the workflow is identical to what we delivered for the client.

Yarn art is a form of textile art where images, objects, or decorations are created by arranging, wrapping, or weaving colored yarn. Few examples of Yarn art are below:

Yarn Art Example 1 Yarn Art Example 2

This post will cover:

How LoRA fine-tuning works step by step
The travel personalization client case study
A public dataset demo with code (so you can reproduce it)
The significance of each stage in the pipeline
How to evaluate and visualize your results

Business Applications Across Industries

LoRA fine-tuning is not just for travel. Once trained, LoRA adapters unlock industry-specific personalization:

Travel & Hospitality → Generate real-time destination previews (hotels, landmarks, beaches)
E-commerce → Create product mockups or seasonal banners in a consistent style
Marketing & Advertising → Produce brand-consistent visuals instantly for campaigns
Finance & Insurance → Generate infographics & explainer visuals tailored to customer journeys
Gaming & Media → Adapt models to a specific art style for immersive experiences
AI Assistants → Make chatbot / RAG experiences visually dynamic & engaging

Why this matters: LoRA turns AI content from being generic → into being business-aware, brand-aligned, and user-personalized.

Real-World Use Case: Travel Personalization Client

The client's challenge

Text-only itineraries lacked visual engagement
Stock photos were too generic and repetitive
Off-the-shelf AI models produced inconsistent quality

Our solution

Fine-tuned Stable Diffusion with LoRA on destination-specific curated images
Generated visuals like "Eiffel Tower at sunset", "Maldives beach resort", "Swiss Alps ski lodge"
Integrated images dynamically into the itinerary planner

Business impact

Travelers saw personalized visual previews of their journey → increased engagement, longer site interaction, and higher booking conversions.

Why LoRA Fine-Tuning?

Traditional fine-tuning of large models is expensive and resource-intensive. LoRA solves this by:

Training only small adapter layers in the attention blocks
Producing tiny weights (~few MBs) instead of retraining full models
Retaining general world knowledge while adding custom style/domain knowledge
Allowing fast re-training when new datasets arrive

For our travel client, LoRA was the perfect trade-off: business value with low cost.

Workflow & Files

We built the pipeline in a modular way, so each part is replaceable and reusable:

preprocess_data.py → Cleans images + captions, builds metadata file
train_metadata.json → Dataset mapping (image → caption)
config.yaml → Training hyperparameters
train_lora.py → Main LoRA training script
generate_images.py → Load model + generate new images
Colab Notebook → Orchestrates everything in the cloud (GPU ready)

LoRA Training Pipeline

Step 1: Prepare the Environment

Install dependencies:

pip install -r lora_training_project/requirements.txt

Step 2: Preprocess Dataset → train_metadata.json

Run:

python preprocess_data.py --input data/yarn_art_export --output train_metadata.json

It produces entries like:

{
  "image": "data/yarn_art_export/yarn_001.png",
  "caption": "Frog, yarn art style"
}

Captions are critical → they guide how prompts map to visuals. Good captions = better generation.

Step 3: Configure Training → config.yaml

training:
  base_model: "SG161222/Realistic_Vision_V5.1_noVAE"
  resolution: 512
  train_batch_size: 2
  learning_rate: 1e-4
  num_train_epochs: 10
  mixed_precision: "fp16"
  output_dir: "output"

Key parameters:

resolution impacts detail vs GPU cost
epochs control style strength (too low = underfit, too high = overfit)
fp16 saves VRAM (important for Colab / smaller GPUs)

Step 4: Train LoRA → train_lora.py

This is how LoRA training script looks like:

#!/usr/bin/env python3
"""LoRA Training Script for Custom Icon-Style Images"""

import os
import json
import math
import argparse
import random
from pathlib import Path
from typing import Dict, List, Optional, Tuple

import torch
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from PIL import Image
from tqdm.auto import tqdm

# Diffusers and transformers
from diffusers import (
    DDPMScheduler,
    UNet2DConditionModel,
    AutoencoderKL
)
from diffusers.optimization import get_scheduler
from transformers import CLIPTextModel, CLIPTokenizer

# LoRA and training utilities
from peft import LoraConfig, get_peft_model
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import set_seed
import torchvision.transforms as transforms

logger = get_logger(__name__)

class IconDataset(Dataset):
    """Dataset class for loading icon images and captions."""
    
    def __init__(
        self,
        data_path: str,
        tokenizer: CLIPTokenizer,
        size: int = 512,
        center_crop: bool = True,
        caption_dropout: float = 0.1
    ):
        self.size = size
        self.center_crop = center_crop
        self.tokenizer = tokenizer
        self.caption_dropout = caption_dropout
        
        # Load training metadata
        with open(data_path, 'r', encoding='utf-8') as f:
            self.metadata = json.load(f)
        
        # Update paths to be relative to the project root
        self.training_data = []
        for item in self.metadata['training_data']:
            # Make a copy to avoid modifying original
            item = item.copy()
            
            # Fix the processed_path to be relative to project root
            processed_path = item['processed_path']
            if 'training_data' in processed_path:
                # Keep as is if it already has training_data
                item['processed_path'] = processed_path
            else:
                # Add training_data prefix if needed
                item['processed_path'] = f"training_data/{processed_path}"
            
            self.training_data.append(item)
        
        logger.info(f"Loaded dataset with {len(self.training_data)} images")

    def __len__(self):
        return len(self.training_data)

    def __getitem__(self, idx):
        item = self.training_data[idx]
        image_path = item['processed_path']
        caption = item['caption']
        
        # Fix path separators for cross-platform compatibility
        image_path = image_path.replace('\\', '/')
        
        # If we're running from outside the project directory, prepend the project path
        if not os.path.exists(image_path):
            # Try with lora_training_project prefix
            alt_path = f"lora_training_project/{image_path}"
            if os.path.exists(alt_path):
                image_path = alt_path
            else:
                # Debug: show what we're looking for and what exists
                print(f"Could not find image at: {image_path}")
                print(f"Also tried: {alt_path}")
                print(f"Current working directory: {os.getcwd()}")
                print("Available files in current directory:")
                for f in os.listdir('.'):
                    print(f"  - {f}")
                if os.path.exists('lora_training_project'):
                    print("Files in lora_training_project:")
                    for f in os.listdir('lora_training_project'):
                        print(f"  - {f}")
                raise FileNotFoundError(f"Could not find image: {image_path}")
        
        # Randomly drop caption with probability caption_dropout
        if random.random() < self.caption_dropout:
            caption = ''
        
        # Load and preprocess image
        image = Image.open(image_path).convert('RGB')
        
        # Apply transforms
        if self.center_crop:
            transform = transforms.Compose([
                transforms.Resize(self.size, interpolation=transforms.InterpolationMode.BILINEAR),
                transforms.CenterCrop(self.size),
                transforms.ToTensor(),
                transforms.Normalize([0.5], [0.5])
            ])
        else:
            transform = transforms.Compose([
                transforms.Resize((self.size, self.size), interpolation=transforms.InterpolationMode.BILINEAR),
                transforms.ToTensor(),
                transforms.Normalize([0.5], [0.5])
            ])
        
        image = transform(image)
        
        # Tokenize caption
        tokenized = self.tokenizer(
            caption,
            padding='max_length',
            max_length=self.tokenizer.model_max_length,
            truncation=True,
            return_tensors='pt'
        )
        
        return {
            'pixel_values': image,
            'input_ids': tokenized.input_ids.squeeze(),
            'attention_mask': tokenized.attention_mask.squeeze()
        }

def load_models(model_id: str, device: torch.device) -> Tuple:
    """Load and return all required models."""
    logger.info(f"Loading models from: {model_id}")
    
    # Load tokenizer and text encoder
    tokenizer = CLIPTokenizer.from_pretrained(model_id, subfolder="tokenizer")
    text_encoder = CLIPTextModel.from_pretrained(model_id, subfolder="text_encoder")
    
    # Load VAE and UNet
    vae = AutoencoderKL.from_pretrained(model_id, subfolder="vae")
    unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet")
    
    # Load noise scheduler
    noise_scheduler = DDPMScheduler.from_pretrained(model_id, subfolder="scheduler")
    
    # Move to device and set to eval mode
    for model in [text_encoder, vae, unet]:
        model.to(device)
        model.eval()
        model.requires_grad_(False)
    
    logger.info("✓ Models loaded successfully")
    return tokenizer, text_encoder, vae, unet, noise_scheduler

def setup_lora(unet: UNet2DConditionModel, config: Dict) -> UNet2DConditionModel:
    """Setup LoRA configuration for UNet."""
    # Handle nested config structure
    training_config = config.get('training', config)
    
    lora_config = LoraConfig(
        r=training_config.get('lora_rank', 4),
        lora_alpha=training_config.get('lora_alpha', 32),
        target_modules=["to_k", "to_q", "to_v", "to_out.0"],
        lora_dropout=training_config.get('lora_dropout', 0.1),
    )
    
    unet = get_peft_model(unet, lora_config)
    unet.print_trainable_parameters()
    
    logger.info("✓ LoRA configuration applied to UNet")
    return unet

def compute_loss(model_pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
    """Compute training loss."""
    return F.mse_loss(model_pred.float(), target.float(), reduction="mean")

# ... (Additional training functions would continue here)

Run training:

python train_lora.py \
  --config config.yaml \
  --data train_metadata.json \
  --output output

Only attention adapters are trained → making training faster, cheaper, and scalable.

Step 5: Generate Images → generate_images.py

This is how the generate_images.py looks like:

#!/usr/bin/env python3
"""Image Generation Script for Trained LoRA Model"""

import os
import json
import argparse
from pathlib import Path
from typing import Dict, List, Optional

import yaml
import torch
from PIL import Image
import numpy as np
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from peft import PeftModel
import pandas as pd

def load_config(config_path: str = "config.yaml") -> Dict:
    """Load configuration from YAML file."""
    try:
        with open(config_path, 'r') as file:
            return yaml.safe_load(file)
    except FileNotFoundError:
        print(f"Warning: Config file {config_path} not found. Using default settings.")
        return {}

def load_trained_pipeline(
    base_model_id: str,
    lora_path: str,
    device: str = "cuda"
) -> StableDiffusionPipeline:
    """Load the trained LoRA model and create inference pipeline."""
    print(f"Loading base model: {base_model_id}")
    
    # Load base pipeline
    pipeline = StableDiffusionPipeline.from_pretrained(
        base_model_id,
        torch_dtype=torch.float16 if device == "cuda" else torch.float32,
        safety_checker=None,
        requires_safety_checker=False,
    )
    
    # Load LoRA weights
    print(f"Loading LoRA weights from: {lora_path}")
    pipeline.unet = PeftModel.from_pretrained(pipeline.unet, lora_path)
    
    # Keep the original scheduler (more compatible)
    print("✓ Using original model scheduler")
    
    # Move to device
    pipeline = pipeline.to(device)
    
    # Enable memory efficient attention if available
    if hasattr(pipeline, "enable_xformers_memory_efficient_attention"):
        try:
            pipeline.enable_xformers_memory_efficient_attention()
            print("✓ Enabled xformers memory efficient attention")
        except Exception:
            print("⚠ xformers not available, using standard attention")
    
    # Enable CPU offload for lower VRAM usage
    if device == "cuda":
        pipeline.enable_model_cpu_offload()
        print("✓ Enabled model CPU offload")
    
    print("✓ Pipeline loaded successfully")
    return pipeline

def generate_images(
    pipeline: StableDiffusionPipeline,
    prompts: List[str],
    output_dir: str,
    num_images_per_prompt: int = 1,
    num_inference_steps: int = 20,
    guidance_scale: float = 7.5,
    negative_prompt: str = "blurry, low quality, distorted, ugly, bad anatomy",
    height: int = 512,
    width: int = 512,
    seed: Optional[int] = None
) -> List[List[Image.Image]]:
    """Generate images from prompts using the trained LoRA model."""
    
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)
    
    all_images = []
    generation_metadata = []
    
    # Set up generator for reproducibility
    generator = torch.Generator(device=pipeline.device)
    if seed is not None:
        generator.manual_seed(seed)
    
    print(f"Generating images for {len(prompts)} prompts...")
    
    for i, prompt in enumerate(prompts):
        print(f"\nPrompt {i+1}/{len(prompts)}: {prompt}")
        
        # Generate images
        with torch.no_grad():
            result = pipeline(
                prompt=prompt,
                negative_prompt=negative_prompt,
                num_images_per_prompt=num_images_per_prompt,
                num_inference_steps=num_inference_steps,
                guidance_scale=guidance_scale,
                height=height,
                width=width,
                generator=generator
            )
        
        images = result.images
        all_images.append(images)
        
        # Save images
        for j, image in enumerate(images):
            filename = f"prompt_{i+1:02d}_image_{j+1:02d}.png"
            image_path = output_path / filename
            image.save(image_path)
            print(f"  ✓ Saved: {filename}")
    
    return all_images

# ... (Additional generation functions would continue here)

Quick Generation Example

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "SG161222/Realistic_Vision_V5.1_noVAE",
    torch_dtype=torch.float16
)
pipe.unet.load_attn_procs("output")

prompt = "Pikachu with electricity, yarn art style"
image = pipe(prompt).images[0]
image.show()

Key benefits:

Plug LoRA into any Stable Diffusion pipeline
Prompts immediately reflect the trained style/domain

Step 6: Evaluate Results

You'll know training worked if:

Prompts like "Flowers, yarn art style" generate consistent yarn textures
New prompts (unseen in training) still follow the style
Generated visuals feel coherent and high quality

Results

Using the public Yarn-Art dataset, LoRA produced:

Pikachu in yarn-style
Eiffel Tower rendered as yarn

Generated Pikachu Yarn Art Generated Eiffel Tower Yarn Art

For the travel client, the same pipeline generated:

Realistic yet stylized destination previews
Integrated into itineraries as visual journey previews
Increased user engagement and bookings

Training Tips and Best Practices

Dataset Preparation

Curate high-quality images (minimum 50-100 images for good results)
Write descriptive captions that capture the style and content
Maintain consistent image resolution (512x512 or 768x768)
Balance your dataset across different subjects and compositions

Hyperparameter Tuning

Learning Rate: Start with 1e-4, adjust based on loss curves
LoRA Rank: Higher values (8-16) for complex styles, lower (4-8) for simple ones
Training Steps: Monitor for overfitting, typically 1000-3000 steps
Batch Size: Limited by GPU memory, 1-4 usually sufficient

Evaluation Metrics

Visual Consistency: Generated images should maintain style coherence
Prompt Adherence: New prompts should produce expected content
Quality Assessment: Use FID scores or manual evaluation
Business KPIs: Track engagement, conversions, user satisfaction

Installation and Setup

Requirements

pip install torch torchvision
pip install diffusers transformers accelerate
pip install peft
pip install pillow numpy tqdm pyyaml pandas

Hardware Requirements

GPU: NVIDIA GPU with 8GB+ VRAM (RTX 3070 or better)
RAM: 16GB+ system RAM
Storage: 10GB+ for models and datasets
Alternative: Use Google Colab Pro for cloud training

Environment Setup

# Clone or create project directory
mkdir lora_training_project
cd lora_training_project

# Create directory structure
mkdir -p data/yarn_art_export
mkdir -p training_data
mkdir -p output

# Set environment variables
export CUDA_VISIBLE_DEVICES=0  # Use first GPU
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

Conclusion

LoRA fine-tuning turns generic AI models into business-specific content engines.

For businesses, this means:

Lightweight training on smaller datasets
Customized styles & domains that align with brand identity
Direct impact on engagement, conversions, and brand identity
Cost-effective scaling compared to traditional fine-tuning methods

Next Steps

Start small: Begin with 50-100 high-quality images in your target style
Iterate rapidly: Train multiple versions with different hyperparameters
Measure impact: Track business metrics like engagement and conversions
Scale gradually: Expand dataset and refine based on results
Integration: Build APIs or workflows to generate images on-demand

The combination of LoRA's efficiency with Stable Diffusion's power creates unprecedented opportunities for businesses to create personalized, on-brand AI-generated content at scale.

The author is the Founder of Shark AI Solutions which specializes at building production grade value added solutions using AI

LoRA Fine-Tuning with Stable Diffusion: Personalizing AI Images for Business

Author: Dr. Shiney Jeyaraj

Published: 2025-07-29

Category: Machine Learning

Reading Time: 15 min read

Tags: LoRA, Stable Diffusion, Computer Vision, Business AI

Excerpt: Discover how businesses across industries are leveraging LoRA fine-tuning with Stable Diffusion to create personalized, brand-specific AI images. Complete with code examples and real-world use cases.

Article Content

LoRA Fine-Tuning with Stable Diffusion: How Businesses Can Personalize AI Images (with Code) In today's world of Generative AI, businesses across industries are looking beyond stock visuals and generic AI images. What they need are personalized, brand-specific, and context-aware visuals that improve user experience and drive conversions. One of the most powerful techniques to achieve this is LoRA (Low-Rank Adaptation) fine-tuning — a lightweight method to adapt large AI models like Stable Diffusion to a specific style or industry dataset. We originally built this pipeline for a travel personalization client, who wanted to show real-time destination previews inside their itinerary planner. For confidentiality, we're demonstrating the exact same code here on a public Yarn-Art style dataset — but the workflow is identical to what we delivered for the client. Yarn art is a form of textile art where images, objects, or decorations are created by arranging, wrapping, or weaving colored yarn. Few examples of Yarn art are below: This post will cover: How LoRA fine-tuning works step by step The travel personalization client case study A public dataset demo with code (so you can reproduce it) The significance of each stage in the pipeline How to evaluate and visualize your results Business Applications Across Industries LoRA fine-tuning is not just for travel. Once trained, LoRA adapters unlock industry-specific personalization: Travel & Hospitality → Generate real-time destination previews (hotels, landmarks, beaches) E-commerce → Create product mockups or seasonal banners in a consistent style Marketing & Advertising → Produce brand-consistent visuals instantly for campaigns Finance & Insurance → Generate infographics & explainer visuals tailored to customer journeys Gaming & Media → Adapt models to a specific art style for immersive experiences AI Assistants → Make chatbot / RAG experiences visually dynamic & engaging Why this matters: LoRA turns AI content from being generic → into being business-aware, brand-aligned, and user-personalized. Real-World Use Case: Travel Personalization Client The client's challenge Text-only itineraries lacked visual engagement Stock photos were too generic and repetitive Off-the-shelf AI models produced inconsistent quality Our solution Fine-tuned Stable Diffusion with LoRA on destination-specific curated images Generated visuals like "Eiffel Tower at sunset", "Maldives beach resort", "Swiss Alps ski lodge" Integrated images dynamically into the itinerary planner Business impact Travelers saw personalized visual previews of their journey → increased engagement, longer site interaction, and higher booking conversions. Why LoRA Fine-Tuning? Traditional fine-tuning of large models is expensive and resource-intensive. LoRA solves this by: Training only small adapter layers in the attention blocks Producing tiny weights (~few MBs) instead of retraining full models Retaining general world knowledge while adding custom style/domain knowledge Allowing fast re-training when new datasets arrive For our travel client, LoRA was the perfect trade-off: business value with low cost. Workflow & Files We built the pipeline in a modular way, so each part is replaceable and reusable: preprocess_data.py → Cleans images + captions, builds metadata file train_metadata.json → Dataset mapping (image → caption) config.yaml → Training hyperparameters train_lora.py → Main LoRA training script generate_images.py → Load model + generate new images Colab Notebook → Orchestrates everything in the cloud (GPU ready) Step 1: Prepare the Environment Install dependencies: pip install -r lora_training_project/requirements.txt Step 2: Preprocess Dataset → train_metadata.json Run: python preprocess_data.py --input data/yarn_art_export --output train_metadata.json It produces entries like: { "image": "data/yarn_art_export/yarn_001.png", "caption": "Frog, yarn art style" } Captions are critical → they guide how prompts map to visuals. Good captions = better generation. Step 3: Configure Training → config.yaml training: base_model: "SG161222/Realistic_Vision_V5.1_noVAE" resolution: 512 train_batch_size: 2 learning_rate: 1e-4 num_train_epochs: 10 mixed_precision: "fp16" output_dir: "output" Key parameters: resolution impacts detail vs GPU cost epochs control style strength (too low = underfit, too high = overfit) fp16 saves VRAM (important for Colab / smaller GPUs) Step 4: Train LoRA → train_lora.py This is how LoRA training script looks like: #!/usr/bin/env python3 """LoRA Training Script for Custom Icon-Style Images""" import os import json import math import argparse import random from pathlib import Path from typing import Dict, List, Optional, Tuple import torch import torch.nn.functional as F from torch.utils.data import Dataset, DataLoader from PIL import Image from tqdm.auto import tqdm # Diffusers and transformers from diffusers import ( DDPMScheduler, UNet2DConditionModel, AutoencoderKL ) from diffusers.optimization import get_scheduler from transformers import CLIPTextModel, CLIPTokenizer # LoRA and training utilities from peft import LoraConfig, get_peft_model from accelerate import Accelerator from accelerate.logging import get_logger from accelerate.utils import set_seed import torchvision.transforms as transforms logger = get_logger(__name__) class IconDataset(Dataset): """Dataset class for loading icon images and captions.""" def __init__( self, data_path: str, tokenizer: CLIPTokenizer, size: int = 512, center_crop: bool = True, caption_dropout: float = 0.1 ): self.size = size self.center_crop = center_crop self.tokenizer = tokenizer self.caption_dropout = caption_dropout # Load training metadata with open(data_path, 'r', encoding='utf-8') as f: self.metadata = json.load(f) # Update paths to be relative to the project root self.training_data = [] for item in self.metadata['training_data']: # Make a copy to avoid modifying original item = item.copy() # Fix the processed_path to be relative to project root processed_path = item['processed_path'] if 'training_data' in processed_path: # Keep as is if it already has training_data item['processed_path'] = processed_path else: # Add training_data prefix if needed item['processed_path'] = f"training_data/{processed_path}" self.training_data.append(item) logger.info(f"Loaded dataset with {len(self.training_data)} images") def __len__(self): return len(self.training_data) def __getitem__(self, idx): item = self.training_data[idx] image_path = item['processed_path'] caption = item['caption'] # Fix path separators for cross-platform compatibility image_path = image_path.replace('\\', '/') # If we're running from outside the project directory, prepend the project path if not os.path.exists(image_path): # Try with lora_training_project prefix alt_path = f"lora_training_project/{image_path}" if os.path.exists(alt_path): image_path = alt_path else: # Debug: show what we're looking for and what exists print(f"Could not find image at: {image_path}") print(f"Also tried: {alt_path}") print(f"Current working directory: {os.getcwd()}") print("Available files in current directory:") for f in os.listdir('.'): print(f" - {f}") if os.path.exists('lora_training_project'): print("Files in lora_training_project:") for f in os.listdir('lora_training_project'): print(f" - {f}") raise FileNotFoundError(f"Could not find image: {image_path}") # Randomly drop caption with probability caption_dropout if random.random() < self.caption_dropout: caption = '' # Load and preprocess image image = Image.open(image_path).convert('RGB') # Apply transforms if self.center_crop: transform = transforms.Compose([ transforms.Resize(self.size, interpolation=transforms.InterpolationMode.BILINEAR), transforms.CenterCrop(self.size), transforms.ToTensor(), transforms.Normalize([0.5], [0.5]) ]) else: transform = transforms.Compose([ transforms.Resize((self.size, self.size), interpolation=transforms.InterpolationMode.BILINEAR), transforms.ToTensor(), transforms.Normalize([0.5], [0.5]) ]) image = transform(image) # Tokenize caption tokenized = self.tokenizer( caption, padding='max_length', max_length=self.tokenizer.model_max_length, truncation=True, return_tensors='pt' ) return { 'pixel_values': image, 'input_ids': tokenized.input_ids.squeeze(), 'attention_mask': tokenized.attention_mask.squeeze() } def load_models(model_id: str, device: torch.device) -> Tuple: """Load and return all required models.""" logger.info(f"Loading models from: {model_id}") # Load tokenizer and text encoder tokenizer = CLIPTokenizer.from_pretrained(model_id, subfolder="tokenizer") text_encoder = CLIPTextModel.from_pretrained(model_id, subfolder="text_encoder") # Load VAE and UNet vae = AutoencoderKL.from_pretrained(model_id, subfolder="vae") unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet") # Load noise scheduler noise_scheduler = DDPMScheduler.from_pretrained(model_id, subfolder="scheduler") # Move to device and set to eval mode for model in [text_encoder, vae, unet]: model.to(device) model.eval() model.requires_grad_(False) logger.info("✓ Models loaded successfully") return tokenizer, text_encoder, vae, unet, noise_scheduler def setup_lora(unet: UNet2DConditionModel, config: Dict) -> UNet2DConditionModel: """Setup LoRA configuration for UNet.""" # Handle nested config structure training_config = config.get('training', config) lora_config = LoraConfig( r=training_config.get('lora_rank', 4), lora_alpha=training_config.get('lora_alpha', 32), target_modules=["to_k", "to_q", "to_v", "to_out.0"], lora_dropout=training_config.get('lora_dropout', 0.1), ) unet = get_peft_model(unet, lora_config) unet.print_trainable_parameters() logger.info("✓ LoRA configuration applied to UNet") return unet def compute_loss(model_pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor: """Compute training loss.""" return F.mse_loss(model_pred.float(), target.float(), reduction="mean") # ... (Additional training functions would continue here) Run training: python train_lora.py \ --config config.yaml \ --data train_metadata.json \ --output output Only attention adapters are trained → making training faster, cheaper, and scalable. Step 5: Generate Images → generate_images.py This is how the generate_images.py looks like: #!/usr/bin/env python3 """Image Generation Script for Trained LoRA Model""" import os import json import argparse from pathlib import Path from typing import Dict, List, Optional import yaml import torch from PIL import Image import numpy as np from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler from peft import PeftModel import pandas as pd def load_config(config_path: str = "config.yaml") -> Dict: """Load configuration from YAML file.""" try: with open(config_path, 'r') as file: return yaml.safe_load(file) except FileNotFoundError: print(f"Warning: Config file {config_path} not found. Using default settings.") return {} def load_trained_pipeline( base_model_id: str, lora_path: str, device: str = "cuda" ) -> StableDiffusionPipeline: """Load the trained LoRA model and create inference pipeline.""" print(f"Loading base model: {base_model_id}") # Load base pipeline pipeline = StableDiffusionPipeline.from_pretrained( base_model_id, torch_dtype=torch.float16 if device == "cuda" else torch.float32, safety_checker=None, requires_safety_checker=False, ) # Load LoRA weights print(f"Loading LoRA weights from: {lora_path}") pipeline.unet = PeftModel.from_pretrained(pipeline.unet, lora_path) # Keep the original scheduler (more compatible) print("✓ Using original model scheduler") # Move to device pipeline = pipeline.to(device) # Enable memory efficient attention if available if hasattr(pipeline, "enable_xformers_memory_efficient_attention"): try: pipeline.enable_xformers_memory_efficient_attention() print("✓ Enabled xformers memory efficient attention") except Exception: print("⚠ xformers not available, using standard attention") # Enable CPU offload for lower VRAM usage if device == "cuda": pipeline.enable_model_cpu_offload() print("✓ Enabled model CPU offload") print("✓ Pipeline loaded successfully") return pipeline def generate_images( pipeline: StableDiffusionPipeline, prompts: List[str], output_dir: str, num_images_per_prompt: int = 1, num_inference_steps: int = 20, guidance_scale: float = 7.5, negative_prompt: str = "blurry, low quality, distorted, ugly, bad anatomy", height: int = 512, width: int = 512, seed: Optional[int] = None ) -> List[List[Image.Image]]: """Generate images from prompts using the trained LoRA model.""" output_path = Path(output_dir) output_path.mkdir(parents=True, exist_ok=True) all_images = [] generation_metadata = [] # Set up generator for reproducibility generator = torch.Generator(device=pipeline.device) if seed is not None: generator.manual_seed(seed) print(f"Generating images for {len(prompts)} prompts...") for i, prompt in enumerate(prompts): print(f"\nPrompt {i+1}/{len(prompts)}: {prompt}") # Generate images with torch.no_grad(): result = pipeline( prompt=prompt, negative_prompt=negative_prompt, num_images_per_prompt=num_images_per_prompt, num_inference_steps=num_inference_steps, guidance_scale=guidance_scale, height=height, width=width, generator=generator ) images = result.images all_images.append(images) # Save images for j, image in enumerate(images): filename = f"prompt_{i+1:02d}_image_{j+1:02d}.png" image_path = output_path / filename image.save(image_path) print(f" ✓ Saved: {filename}") return all_images # ... (Additional generation functions would continue here) Quick Generation Example from diffusers import StableDiffusionPipeline import torch pipe = StableDiffusionPipeline.from_pretrained( "SG161222/Realistic_Vision_V5.1_noVAE", torch_dtype=torch.float16 ) pipe.unet.load_attn_procs("output") prompt = "Pikachu with electricity, yarn art style" image = pipe(prompt).images[0] image.show() Key benefits: Plug LoRA into any Stable Diffusion pipeline Prompts immediately reflect the trained style/domain Step 6: Evaluate Results You'll know training worked if: Prompts like "Flowers, yarn art style" generate consistent yarn textures New prompts (unseen in training) still follow the style Generated visuals feel coherent and high quality Results Using the public Yarn-Art dataset, LoRA produced: Pikachu in yarn-style Eiffel Tower rendered as yarn For the travel client, the same pipeline generated: Realistic yet stylized destination previews Integrated into itineraries as visual journey previews Increased user engagement and bookings Training Tips and Best Practices Dataset Preparation Curate high-quality images (minimum 50-100 images for good results) Write descriptive captions that capture the style and content Maintain consistent image resolution (512x512 or 768x768) Balance your dataset across different subjects and compositions Hyperparameter Tuning Learning Rate : Start with 1e-4, adjust based on loss curves LoRA Rank : Higher values (8-16) for complex styles, lower (4-8) for simple ones Training Steps : Monitor for overfitting, typically 1000-3000 steps Batch Size : Limited by GPU memory, 1-4 usually sufficient Evaluation Metrics Visual Consistency : Generated images should maintain style coherence Prompt Adherence : New prompts should produce expected content Quality Assessment : Use FID scores or manual evaluation Business KPIs : Track engagement, conversions, user satisfaction Installation and Setup Requirements pip install torch torchvision pip install diffusers transformers accelerate pip install peft pip install pillow numpy tqdm pyyaml pandas Hardware Requirements GPU : NVIDIA GPU with 8GB+ VRAM (RTX 3070 or better) RAM : 16GB+ system RAM Storage : 10GB+ for models and datasets Alternative : Use Google Colab Pro for cloud training Environment Setup # Clone or create project directory mkdir lora_training_project cd lora_training_project # Create directory structure mkdir -p data/yarn_art_export mkdir -p training_data mkdir -p output # Set environment variables export CUDA_VISIBLE_DEVICES=0 # Use first GPU export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 Conclusion LoRA fine-tuning turns generic AI models into business-specific content engines. For businesses, this means: Lightweight training on smaller datasets Customized styles & domains that align with brand identity Direct impact on engagement, conversions, and brand identity Cost-effective scaling compared to traditional fine-tuning methods Next Steps Start small : Begin with 50-100 high-quality images in your target style Iterate rapidly : Train multiple versions with different hyperparameters Measure impact : Track business metrics like engagement and conversions Scale gradually : Expand dataset and refine based on results Integration : Build APIs or workflows to generate images on-demand The combination of LoRA's efficiency with Stable Diffusion's power creates unprecedented opportunities for businesses to create personalized, on-brand AI-generated content at scale. The author is the Founder of Shark AI Solutions which specializes at building production grade value added solutions using AI