How to Train (Image) the Model on Civit

Hi everyone. I'll share my work to you "How to train model using tag style and caption style". You can save and give some tip too 😉

Disclaimer: I'm not programmer or similar like that. I'm also not a master too and still far from that. but i just wanna share my experience to you since i love to make model.

1. Collecting Sample

You can choose any image that you want to train. Can be result from Ai Generator, Pixiv, X, Instagram, Youtube, Fan art/Original Art, Movie, Series, Rule34, etc.

2. Tagging System

After you choose, You can tag any part of the image.

Example:

I'll seperate it based on what i see. First main character/focus character, second side character, last enviroment. This image will contain

First

1girl, pink hair, blue eyes, white dress, yellow capelet, long hair, single braid hair, flower ornament.

Second

1other, cat, tabby cat.

Last

blue sky, cloud, sunny, flying leaf, flower field, from below.

For best result you can add special prompt

This man using unique color shirt. You can use multicolored shirt, red shirt. Red shirt is the main color, you don't need add any other color since multicolor already included them.

You can check tag style on Danbooru for Illustrious model or e621 for NAI.

For pony, i don't know but seems pretty similar with Illustrious with extra detailed prompt.
For SDXL, it's more like phrase type rather tag (but you also can use tag) and more imaginative rather "their daughter". not available for natural language.

3. Caption System

As far i trained Flux and Chroma, they are pretty similar and mostly i used auto caption. but at first i will make it pretty similar like

These image contain phrase victorian style. Even the sentence, You just need focus to main topic and these image focus on victorian style even i change the race/gender of character.

4. Setting for Training

To me, sometimes you don't need higher step to make it good. You need optimal setting to make it good.

Epoch

SDXL and their daughter: 10-20 (depends how complicated the model)

Chroma and Flux: 7-10 (for character). I never train style on Chroma or flux but i think it's still pretty similar.

Step

I think you don't need higher step for good result too less than 2000 or 3000 is enough.

Train batch

As far i trained my model. Best for character is 2 but sometimes need 1 or 3.

1 = more detailed but can be overbaked

2 = best and safe option. but for very detailed model like attire/costume, i prefer you choose 1.

3-4 = might ignore the style of character and also will make it missed replicated the character. good for style. 3 for focus on change character visual, 4 focus on scenery, architecture, etc.

Token

Since civit available 3 token. It should give you best experience. Sometimes i always use 3 for character. 1 token for trigger word, 2 token for special tags like barth0_black hair, barth0_brown eyes.

Why am i doing this? i have doing this with 1, 2, and 3 token model

The result will come when you see the character not mixed with other character.

I love bara so i used nanami kento as my experiment character. To me, nanami has big influence to another character. if your model not prompt it well, nanami will take over the visual character. Or if your model too strong, it will change nanami into the model character. So, important to always check your training tag.

Clip skip

I think you guys already know about clip skip.

1 = Realistic

2 = Anime/Toon

But i have tried both when my sample is anime style and use clip skip 1. it's not give significant result (for illustrious). But stick to the rules guys 🤣

Shuffle Tag

The part i really like it. I'm super lazy and always know the result will come good if the sample good too. so i will shuffle the tag. But for less sample, i prefer to reprompt it

Screenshot_2025-12-02-17-45-43-593_com.android.chrome.png

on this one.

Network Dim and Alpha

8/4 = for single character. Couple character can also but the character should be divided clearly. Manhwa use similar face style so it will mixed.

It also will make your model less size

32/16 = normal for everything. even the style model.

64/32 = i have tried this but really make full my memories.

For less sample i prefer 8/4 or 32/16

it means 8 for Network Dim, 4 for Network Alpha.

Optimizer

Prodigy: For less sample and tags. I prefer use prodigy if you focus on character, poses, or costume.

Adafactor: For 20 or more sample. Sometimes give different result.

AdamW8bit: For Big sample.

Here the sample.

Screenshot_2025-12-02-17-54-08-648_com.android.chrome.png

Screenshot_2025-12-02-17-54-18-767_com.android.chrome.png

Sample : 60, 3 character, not couple. Illustrious.

For NAI i'm still doing same setting. But for pony and sdxl i will choose basic that civit give setting to me.

Screenshot_2025-12-02-17-59-32-699_com.android.chrome.png

Screenshot_2025-12-02-17-59-00-214_com.android.chrome.png

Sample 10. Chroma and Flux

Note:

The key of trianing is how many similar word, phrase, or sentence that can imagine your image perfectly.

sometimes is not less prompt problem but the visual not available on base model that always give different result.

Screenshot_2025-12-02-17-42-24-354_com.android.chrome.png

I have try this one too. And i cant publish/download. Also the result far from i imagine even just from training image.

Hope this is help you to train your own model.

You can ask me or give advice too on comment. Thank you 😇