Sign In

Exploring guidance techniques: Clear ways to use in ComfyUI

19

Mar 27, 2025

(Updated: a day ago)

tool guide
Exploring guidance techniques: Clear ways to use in ComfyUI

Read It First

This article lists part of the available nodes about generation guidance in ComfyUI.
Those without paper reference won't be included here.

The important information about the nodes under the advanced perturbed-attention repository (PAG, SEG, SWG) has been put into the PAG section.

Simple cover is generated by flux.

Known misinformation in PAG section has been corrected, sorry for any confusion.

Self-Attention Guidance (SAG)

date: Oct 2022
arXiv: Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
node repo: comfyanonymous/ComfyUI

Repair image by feeding model with its blurred attention masks.

It has two parameters:

  • scale: 0.1-weak 0.5-moderate 1.0-strong

    • don't go below 0 or above 1, or it will break the image.

  • blur_sigma: higher value blurs the mask more, slight affect.

FreeU

date: Sep 2023
arXiv: FreeU: Free Lunch in Diffusion U-Net
node repo: comfyanonymous/ComfyUI

There are two versions, v1 and v2,
Both of them manipulate backbone and skip feature in the first two decoder block in U-Net.
v2 (an updated version) was introduced in Oct 2023, adding dynamic adjustment feature to b factor.

Cited from official repository, the node provides 4 parameters:

  • b1: backbone factor of the first stage block of decoder.

  • b2: backbone factor of the second stage block of decoder.

  • s1: skip factor of the first stage block of decoder.

  • s2: skip factor of the second stage block of decoder.

increasing b factor burns the image and may improve quality.
increasing s factor weaken the skip feature, it is more complicated than you think.

The best settings vary between different models, uncertainty is guaranteed.

Characteristic Guidance (CHG)

date: Dec 2023
arXiv: Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale
node repo: redhottensors/ComfyUI-Prediction (outdated)

Fill the non-linearity aspect of guidance, restore the image at high CFG.

It is said to consume much more time to process, but an official turbo update is released in July 2024 (~2x faster), right after the latest node update (May 13 2024). Judging from the plot from official repo I think it's not cost-effective for practical use.
If you want to test it at this moment, you should switch from ComfyUI to forge or sd-webui.

The title is on hold until a new functional version comes out.

Adaptive Guidance (AG)

date: Dec 2023
arXiv: Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models
node repo: asagi4/ComfyUI-Adaptive-Guidance

When it shows convergence, stop using high CFG to speed up.
It works good when the scheduler steps is set high enough.

To use it, you have to combine scheduler, sampler, random noise and adaptive guider (nodes).

Here's the parameters:

  • threshold: lower it down will stop high CFG ahead of time, 0.99~1 is recommended.

  • cfg: Just CFG, not anything else.

  • neg_scale: 0-nothing 2-moderate 4-strong

    • It belongs to PrepNegAdaptiveGuider, default is 1 (no scaling).

    • leave it above 4 will burn the image up.

  • uncord_zero_scale: if not 0, this ancillary function will be on when CFG has lowered to 1.

  • cfg_start_pct: decides when to raise CFG from 1, I recommend not setting it above 0.3.

Perturbed-Attention Guidance (PAG)

date: Mar 2024
arXiv: Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
node repo: comfyanonymous/ComfyUI & pamparamm/sd-perturbed-attention

Based on the technique used by Self-Attention Guidance, an enhanced version.
It replace model's properly selected self-attention maps with Identity Matrix to guide model away from bad image structure.

Basic:

  • scale: 1-decent 4-moderate 7-strong

    • go beyond 7 you'll start to see burnt spot.

    • push it further to 15 will definitely burn up image if no extra measure is taken.

Advanced:

  • adaptive_scale: it dampens PAG, ranging from the last step towards first step based on how big this percentage is.

  • unet_block: input / output / middle

    • middle is suggested.

  • unet_block_id: leave it 0 if clueless, look at unet_block_list of this section for reference.

  • sigma_start / sigma_end: activate guidance between sigma_end_and sigma_start.

    • read this article if you want to know what sigma really is.

    • if they are not negative, the node is disabled until sigma hits sigma_start, then disabled when sigma hits sigma_end. In usual case, sigma will only decrease in generation process.

    • set them both negative disable this feature.

    • by only setting sigma_start negative, guidance starts from the beginning.

    • by only setting sigma_end negative, guidance stops in the end.

    • flipped value (sigma_start<sigma_end) will stop the node from taking effect.

  • rescale: based on the algorithm from Common Diffusion Noise Schedules and Sample Steps are Flawed, enlarging it reduce overexposure (burns), does nothing when rescale_mode=snf.

  • rescale_mode: full / partial / snf

  • unet_block_list: override target U-Net blocks, in the form of tag prompt (refuse period).

    • d means input, m means middle and u means output

      SD1.5 U-Net has layers d0-d5, m0, u0-u8.
      SDXL U-Net has layers d0-d3, m0, u0-u5.

    • in SDXL, d0, d1, u3, u4, u5 have 0~1 index values, other blocks have 0~9 index values (pamparamm wrote this wrong).
      it looks like m0.7 ,u0.4, d1.0, d2.2-9.

    • here is an example: d1.0, m0.1, u2.9.

These nodes won't react to negative prompts, like flux distilled guidance.

Semantic-Aware Guidance (S-CFG)

date: Apr 2024
arXiv: Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
node repo: shiimizu/ComfyUI-semantic-aware-guidance

This node doesn't have any config.
It sees the latent as several semantic units and uses them for guidance.
By using this your waiting time will be about 1.5 to 3 times longer.

This node is good at correcting unfaithful shapes but fails to improve inpaint / outpaint quality, not recommended.

Smoothed Energy Guidance (SEG)

date: Aug 2024
arXiv: Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention
node repo: logtd/ComfyUI-SEGAttention / pamparamm/sd-perturbed-attention

A novel training and condition-free approach that leverages the energy-based perspective of the self-attention mechanism to enhance image generation.

Two nodes of implementation in ComfyUI,
simple version provided by logtd, advanced version provided by pamparamm.

They both have three important factors:

  • scale (γ): 3 is recommended in the paper, set it over 7 will burn up image.

  • blur (σ): for convenience you can set it to ∞, set it higher will strengthen the process.

  • inf switch:
    Simple version has a switch called 'inf_blur', turn it to true will override blur with ∞.
    In advanced version, setting blue_sigma below zero makes it ∞.

Advanced version has additional parameters:

  • unet_block: input / output / middle

    • middle is suggested.

  • unet_block_id: leave it 0 if you are clueless, if really curious go to PAG section.

  • sigma_start / sigma_end: the step to end guidance at / the step to start guidance from

  • rescale: enlarging reduce overexposure (burns), does nothing when rescale_mode=snf.

  • rescale_mode: full / partial / snf

  • unet_block_list: override target U-Net blocks.

The simple one will apparently dampen the sensitivity of CFG.
The advanced one doesn't affect CFG that much, but the blur parameter seems broken.
Keep it -1 if you are lazy. Otherwise append .contiguous() to this line in seg_attention_wrapper of guidance_utils.py (dumb fix):

q = q.reshape(bs, inner_dim, -1).permute(0,2,1)

Adaptive Projected Guidance (APG)

date: Oct 2024
arXiv: Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
node repo: asagi4/ComfyUI-Adaptive-Guidance from MythicalChu/ComfyUI-APG_ImYourCFGNow

Focusing on orthogonal projection of CFG update direction, this component aims to prevent image from over-burning through adaptive CFG manipulation.

There are 5 keys:

  • momentum (β):
    the momentum for CFG update direction,
    negative value pushes guidance away from common route and helps reduce glitch.
    from the paper, setting around -0.75 is optimal.

  • eta (η):
    this stable thing reduce the strength of the 'parallel component' in calculation.
    lower value help stop burning, 0 is recommended unless you need burning.
    from researcher's testing chart, ≤0.5 is the optimal setting.

  • norm_threshold (r):
    lifting this helps the guidance stick to original prediction of the model.
    If lifted too high, then there's no threshold, image will be allowed to get burnt as usual.
    from the paper, setting 2.5 gives the best fidelity, but I suggest leaving it default (15),
    because this value will always give you greyish or bad output in SDXL.

  • mode:
    it has two options, 'normal' and 'denoised'.
    'denoised' option can cool the image down.

  • adaptive_momentum:
    this artifact remover gradually brings momentum towards 0 every step.
    momentum will reach 0 near the end when set to 0.18, this value is recommended.
    momentum will reach 0 around the middle of generation when set to 0.19.
    high value makes it unresponsive to nudge.

Please note that if you let CFG go beyond the border of 15, your image will possibly still be fried.
Setting 'denoise' value under 1 in scheduler brings out the bug, look up at APG node repo for details.

Sliding Window Guidance (SWG)

date: Nov 2024
arXiv: The Unreasonable Effectiveness of Guidance for Diffusion Models
node repo: pamparamm/sd-perturbed-attention

Use sliding windows to generate multiple cropped tiles from image data,
aggregate them with averaged overlap into a negative noise predictor.
With the help of a formula, positive noise predictor and negative noise predictor is combined and used for guidance.

It has 5 keys:

  • scale (w): the paper said it is optimal around 0.2.

    • same as PAG, it has 7/15 temperature border, staying there takes the risk of burning.

  • tile_width (l): the width of cropped tiles.

  • tile_height (k): the height of cropped tiles.

  • tile_overlap (s): the size of overlap (pixel), sensitive.

  • sigma_start / sigma_end: see PAG section at the page.

Set l=k=2s or l=k=3s, then you will have a good time.
Process time depends on tile number, more means longer.

I don't recommend using this for inpaint / outpaint, it drags performance down and doesn't prove to boost quality.

PLADIS

date: Mar 2025
arXiv: PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
node repo: pamparamm/sd-perturbed-attention

This method replace the cross-attention with a combination of dense and sparse attention calculated from itself. It's a resource efficient approach with distilled model support.

  • scale: (λ): see Figure 5 and Figure 8 in paper. default (λ=2) is the best.

    • I personally recommend set it under 1, reason is shown below.

  • sparse_func: you can choose between entmax1.5 (default) and sparsemax.

    • α=1.5 for entmax1.5 and α=2 for sparsemax if annotation is right.

    • see Figure 4, Table 5 and "The Effect of α" in chapter 6 in paper, default is recommended.

  • unet_block_list: see Table 9 and appendix G.3. in paper, all > all up > all down / mid.

PLADIS may not be friendly to end-users, if you enable cpu offload, details will trun into a mush..

If you're facing this issue, here's a fix that might help, though it may not align perfectly with the original approach described in the paper:

# insert the code below into pladis_utils.py in the root folder of the node
# What happened around the pladis_sim matrix in the fix code:
#     reset negative values to 0 -> create row_sums to ensure value < 1
#  -> avoid division by zero     -> division

#       pladis_sim = pladis_scale * sparse_sim + (1 - pladis_scale) * dense_sim

        pladis_sim = torch.clamp(pladis_sim, min=0.0)
        row_sums = pladis_sim.sum(dim=-1, keepdim=True)
        row_sums = torch.clamp(row_sums, min=1e-6)
        pladis_sim = pladis_sim / row_sums

#       out = pladis_sim.to(v.dtype) @ v

Or you can take the sigmas controlling approach:

## In pladis_nodes.py
# Insert sigma_start and sigma_end ui controls into INPUT_TYPES(cls)
                "sigma_start": (IO.FLOAT, {"default": -1.0, "min": -1.0, "max": 10000.0, "step": 0.01, "round": False}),
                "sigma_end": (IO.FLOAT, {"default": -1.0, "min": -1.0, "max": 10000.0, "step": 0.01, "round": False}),
# Insert sigma_start and sigma_end parameters into def patch()
        sigma_start=-1.0,
        sigma_end=-1.0,
# Replace 'pladis_attention = pladis_attention_wrapper(scale, sparse_func)' with
        sigma_start = float("inf") if sigma_start < 0 else sigma_start  
        pladis_attention = pladis_attention_wrapper(scale, sparse_func, sigma_start, sigma_end)

## In pladis_utils.py
# Insert sigma_start and sigma_end parameters into def pladis_attention_wrapper()
, sigma_start=float("inf"), sigma_end=-1.0
# Add sigma checking logic by replacing these lines
# def _pladis_sparse_attention(  
#    q: torch.Tensor,  
#    k: torch.Tensor,  
#    v: torch.Tensor,  
#    extra_options: dict,  
# ):
# with
	def _pladis_sparse_attention(  
		q: torch.Tensor,  
		k: torch.Tensor,  
		v: torch.Tensor,  
		extra_options: dict,  
	):  
		# Check sigma range  
		sigma = extra_options.get("sigmas", None)  
		if sigma is not None and not (sigma_end < sigma[0] <= sigma_start):  
			# Return normal dense attention if outside sigma range  
			heads = extra_options["n_heads"]  
			attn_precision = extra_options.get("attn_precision")  
			  
			b, _, dim_head = q.shape  
			dim_head //= heads  
			  
			scale: int = dim_head**-0.5  
			  
			q, k, v = map(  
				lambda t: t.unsqueeze(3)  
				.reshape(b, -1, heads, dim_head)  
				.permute(0, 2, 1, 3)  
				.reshape(b * heads, -1, dim_head)  
				.contiguous(),  
				(q, k, v),  
			)  
			  
			sim = q @ k.transpose(-2, -1) * scale  
			del q, k  
			  
			dense_sim = torch.softmax(sim, dim=-1)  
			out = dense_sim.to(v.dtype) @ v  
			  
			out = out.unsqueeze(0).reshape(b, heads, -1, dim_head).permute(0, 2, 1, 3).reshape(b, -1, heads * dim_head)  
			return out

Normalized Attention Guidance (NAG)

date: May 2025
arXiv: Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models
node repo: pamparamm/sd-perturbed-attention / ChenDarYen/ComfyUI-NAG

//TODO

"Better negative guidance" addon, being competitive to Negative Prompt in Prompt (NegPiP) and Perp-Neg.

Three core parameters:

  • scale (ϕ): see figure 10 around page 9 in the paper.

  • tau (τ): normalization threshold

  • alpha (α): refinement factor
    see appendix F around page 16 in the paper.

Token Perturbation Guidance (TPG)

date: Jun 2025
arXiv: Token Perturbation Guidance for Diffusion Models
node repo: pamparamm/sd-perturbed-attention

//TODO

This module does perturbation to token representations within the diffusion model.
See Table 1, Table 2 and chapter 6.2, it generated less abstract patterns and may offer superior quality.

  • scale (γ):  default is 3, should be fine if you don't adapt monsters like cfg++ sampler..

  • sigma_start / sigma_end / rescale / rescale_mode: see PAG section in this page.

  • unet_block_list: default is d2.2-9, d3, which is fine. pamparamm must have tested it.

Frequency-Decoupled Guidance (FDG)

date: Jun 2025
arXiv: Guidance in the Frequency Domain Enables High-Fidelity Sampling at Low CFG Scales
node repo: sdtana/ComfyUI-FDG / silveroxides/ComfyUI_FDGuidance

//TODO
Like Guidance Interval (Guidance Limiter in ComfyUI-ppm), it adapts CFG scale.

These nodes will override your CFG scale.
ComfyUI-FDG provides fdg_steps, which function similarly to sigma_end, allowing CFG to continue processing after the FDG runs for a specified number of steps from the beginning.

Three core parameters:

  • levels: a variable for Laplacian pyramid algorithm
    when set to 1, you are using cfg scale=guidance_high.
    when set above 2, linear interpolation will be used for setting the guidance scale.

  • guidance_high: guidance scale for high frequency data. set it big for quality.

  • guidance_low: guidance scale for low frequency data. set it small for diversity.

19