Benchmarking RunPod GPU's with Yet Another Workflow

It's been a long time since I've active tested different GPU's on RunPod (this link will kick us both some credit if you sign-up with it). My original GPU testing was done with someone else's workflow "a long time ago" and using advice from when I was early in my learning. After prompting from a conversation on Reddit, I decided it was time to a fresh round of testing with Yet Another Workflow, my take on an artist-forward and beginner-friendly workflow that treats you like an intelligent, curious person.

I have written an article about getting started with my workflow using RunPod's cloud GPU service. Both articles contain some write-ups of cost, which I'll be updating soon based on this testing.

I've been recommending the L40S, which is quite cost effective. However, where does it sit with this workflow, in my own template with my own workflow, at November 2025 prices?

Let's find out!

TL;DR

Best Performance for the Price: RTX 5090
Best High Performance for the Price: H100 SXM

Methodology

I had a very simple approach to this: Run the default prompt for the workflow in v0.36 using the standard sampler setup at different resolutions and steps.

Resolutions: 1280x768, 2:3 @ 768 (640 × 896), 2:3 @ 660 (512 × 768)
Steps: 10, 8, 4
Frames: 61
Post processing: the default workflow chain with 32fps GIMM interpolation

That's 9 videos per GPU. I used the same seed for each (561278855360750), which is equivalent to this video. For this test, I chose the following GPU's with CUDA 12.8 and SageAttention enabled:

H200 SXM
H200 NVL
H100 NVL
H100 SXM
RTX PRO 6000
RTX PRO 6000 WK
RTX 5090
L40S

I've only selected cards that can load the full 14B models without blockswap. With blockswap, you can use slower GPU's, but we're looking at performance for the price, so the L40S is our minimum here.

Also worth noting, all of the times were collected excluding the initial model loading time, which, loosely, is faster or slower proportional to general card power.

Results

The RTX 5090 is a very good value for the rental price. (I did exprience particularly slow spin-up time for this one while testing.) It's both cheaper and more performant than the L40S. It's the best value per video, by a good measure.

The H100 SXM, however, is an extremely good deal if you don't want to wait as long and are willing to spend little more per video. At 1280x720, there's a notable jump from 243 seconds down to 172 second between it and the RTX PRO 6000. It's a huge performance boost, and seems to offer the best ratio of videos per hour to the cost. While you are paying more, your throughput for time per video is still excellent, especially at larger sizes.

H100 SXM averages about 56 videos per hour ($2.69).
RTX 5090 averages about 39 videos per hour ($0.89).
L40S averages about 21 videos per hour ($0.86).

Videos per hour here is calculated by averaging the number of videos at each resolution per hour based on the gen times of each version.

I have attached the spreadsheet for the data I collected for this with the generation times and costs.