Tagging + Data Prep: Methods and Tips

I wanted to make an article to take note of any methods or tips I found around the site for tagging a dataset. If anyone has any good ones to add, please comment.

For my guide on training settings in the on-site generator, see the link here.

Methods:

Brute Force: Gather a dataset full of the desired subject. Either do not use any tag at all, or tag only with an activation phrase [1] in the hopes that it will pick up on whatever you are trying to train it on. Not advised with a small dataset.
Standard tagging: Gather a dataset full of the desired subject. Tag images with an activation tag if needed [1], then add tags to the images for necessary details.
1. Note: Most models use Danbooru tags. Here is the tag wiki. https://danbooru.donmai.us/wiki_pages
Difference method: The subject of the lora must have a dataset of images in "before" and "after" pairs. For example, Weeliao took used images where the "before" images are sets of characters as they normally are, and the "after" images are those same characters in evil-looking dark clothing with presumable little other difference in pose, background, or camera angle. First, tag the "before" dataset. Second, copy all of the same tags into the "after" dataset (even if those tags would no longer apply) and add your activation tag to the "after" dataset. Third, take the "before" dataset and train the lora on it until it becomes overfitted. Lastly, stop the training, substitute the "before" for the "after" dataset, and then resume. Once the lora is trained, the activation phrase will be pretty strong, having the ability to overpower relevant tags.
1. Note: I had to Google translate this, so I hope I am interpreting this correctly.
2. Note2: Would it work just as well if you trained both datasets in the same session?
3. Source: https://civitai-proxy.pages.dev/models/1350758/conceptcorruption
  1. Archived quote: "最近登不上civitai，现在才看到。
    我用的差异法，找了很多质量不错的corruption前后对比的图。将corruption前的图片训练到过拟合，然后在这基础上训练corruption之后的图片。" "忘了还有这一点，我把corruption前的tag直接复制给corruption后的图片，并加上“corruption, waruochi”".
Lora Subtraction: If you create a lora that is can only produce images of one style (like one would expect if the images all come from one source like a videogame), create a second lora to match the style of the source material, then merge the loras at negative strength.
1. Note: A lora trained exclusively on 3D images can avoid this if you just tag all of the images as "3D".
2. Source: https://civitai-proxy.pages.dev/articles/21718/training-character-loras-with-low-dataset-diversity

[1]: This source says that using an activation tag for a style lora is not needed since you would always want it to be active. I just give it the activation phrase anyways. https://civitai-proxy.pages.dev/articles/4/make-your-own-loras-easy-and-free

Tips:

May be essential: In character loras, do not tag for eyes unless a character has different eye colors or details.
1. Note: For my Sinner lora, all of the lora attempts had blurry eyes until I removed eye tags altogether. The fact he has a distinct eye shape with a gradient combined with the low dataset may have contributed to this, but still... Removing those tags outright fixed it.
2. Source: https://civitai-proxy.pages.dev/articles/8370/nanashianons-cool-lora-training-guide-for-attractive-people
May be essential: Especially in character loras with small datasets, take full body images and crop+upscale them to include tags like "upper body", "portrait", or "head out of frame". Be sure to remove tag like "[color] shoes" that would no longer be applicable.
1. Note: The actually training process involves compressing down the source image, adding noise, and scaling up a denoised version of the image to compare against the OG copy. Having a more zoomed in upper body or portrait shot sounds like it would help capture smaller details faster. I am guessing that is the basis for this, in addition to just giving it more camera position tag exposure.
2. Source: https://civitai-proxy.pages.dev/articles/19619/how-to-train-lora-for-beginners-or-guide and many other guides.
Optional: In character loras, prune tags from the dataset that are inherent to the character so they may be absorbed into the activation tag.
1. Note: With the exception of the eye point discussed before, this one is less important. If you have a character with white hair and tag "white hair" it will not matter so much.
2. Source: https://civitai-proxy.pages.dev/articles/4/make-your-own-loras-easy-and-free

Software to help:

Auto-tagger and dataset prep: https://civitai-proxy.pages.dev/articles/2079/dataset-all-in-one-tools-windows-and-linux
In addition to other resource, this document has a basic img-txt viewer that lets you see them both on one screen with minimal bloat. It auto-corrects to the correct tag too: https://civitai-proxy.pages.dev/articles/14075/my-tools