Hello everyone, I never thought I would be where I am today, grateful towards my 12K followers and getting AI contract work, and most importantly, getting free stuff 😁. Before I get started, if you think making videos isn't something you can do, I recommend that you give it a shot, I have a much more generalized article that I think is helpful if you never made a video before:

Also, this is more AI related than it is video editing related and maybe you'll find some of these tips helpful for things outside of video creation.

I wanted to write this article, I developed some new processes while working on a recent contract that I really felt like brought my stuff to the next level.

Also, I'm pretty pumped about Vidu's new model, Q2. Full disclosure: I am on their creative partner program but they didn't ask me to write this, nor did they see it before publication.

I was provided a referral link, if you have a new account use code: SCHMEDE

You will get 100 credits, which is about 4 ref2vid generations or about 3 Q2 cinematic img2vid generations. This code only works on a brand new account (I asked).

I was recently working on a video in order to show off what I'm able to make now, and halfway through Vidu Q2 dropped. I may be biased, but I think that Q2 is amazing. What I then ended up doing is starting over using Q2. So this article will be about two things, tips about video creation and showing of Vidu Q2. At the time of writing (09/29/25) it isn't available yet onsite on Civitai but I'm sure it will be at some point.

Vidu Q1/Q2 Comparison

First let me show you a side-by-side comparison of my first draft using Q1 and then what I was able to do with Q2:

If you want to see the full finished video:

While this isn't a direct comparison, there are many things I wouldn't have bothered trying until I saw how much Q2 could do. This is showing what I knew how to do with Q1 versus what I've learned up until this point of time Q2 is able to do.

For a more fair comparison, I created this video, which was a previous submission, with exactly the same generation data, same image input same prompt. The point is, if I had Q2 then and knew what it was capable of, I would have went for a lot more:

My observations is that it's much more expressive, and the prompt adherence is much better, (being able to generate up to 8 seconds is nice too!).

The prompts I used for the original don't take advantage of what Q2 is capable of, but some of the things I'm impressed with can still be seen above:

0:58 - this is a mild version, but I've noticed Q2 has a more "cinematic camera", it's really noticeable when you do fight scenes and the perspective shakes synced with an impact. Civit has trouble linking shorts but here are some examples: example 1 | example 2
1:14 - I did prompt for gunfire but was unable to get it (you can do this with Q1 but this was early on when I was using it) and I did prompt it a few times originally and wasn't able to get anything but wasn't an issue at all for Q2
1:57 - Actual crying

If you want to see all the generation data, I have a collection with the source clips with the generation data with each post here.

Tips and Tricks

Character Consistency

Now having done some contract work, I'm surprised the amount of people that don't know how to inpaint or train models, which is really important to meeting requirements that want consistency and specifics. I started doing this before things like Nano Banana came out, I still find myself liking the following process because this gives you a much greater degree of control precision instead of just generating a new image if you get close to what you need, but not exactly. Like I said earlier, I am in the Vidu Creative Partner Program, and I'm not saying that only Vidu can do the following, but it's what I'm familiar with using. I'm sure you can apply these concepts to similar tools, but this is the one that I'm familiar with. I've had the luxury of trying a lot of different stuff with Vidu without cost (thank you, Vidu) so hopefully my findings will help you save some credits/buzz if you decide to use Vidu.

One thing to note is that I believe that using Vidu ref2vid on Civit requires a minimum of two images, whereas on Vidu's site you can use a single image for ref2vid, which has grown to be my preference (at the time of writing 09/29/25, Q2 is only available for img2vid), as you can change the aspect ratio of a generation with ref2vid while keeping all of the elements of the input image, as well as having more flexibility with the video generation.

I like to use Vidu ref2vid in order to create a reference so that I can begin creating a consistent character. If you follow me for my models, you may have seen my Illustrious XL model that is based on my Avatar, which is an image I generated a long time ago with ChatGPT that I was never able to replicate, until Vidu ref2vid. I used the same method in order to create my Illustrious XL model:

This means that all you need is one image and you have the tools you need in order to create that character consistently. Here is the reference I used for my video:

I've founded naming the character in the description of the reference is useful when using multiple references, so you can just refer to them by name in the prompt when using multiple references, or to ensure you get all the elements you want when you have two references that share similar characteristics.

Reference Creation

So that reference is already done, so it already has three different images from different angles showing the details that I want present on the character, but this is what I would prompt in order to get those different angles, so that the reference has a good understanding of what the character looks like no matter the angle:

Frame Extraction / Generation Data

Resulting in something like this, where I can find the best frame out of all of these generations in order to build a strong reference:

I then use frame extraction in order to get the images in order to build the new reference:

I've made a bunch of videos with this reference, and I have the luxury of being a creative partner, but if you are trying to be as efficient as possible, I would leverage that one reference into creating individual generations of the character in different settings and angles focusing the prompt to rotate around the character so that you can get far more generation data from a single generation. You can also reduce the amount of images needed if you have to by using flip augmentation during training.

Model Training

What I do is quickly group the images where certain features aren't visible (blue eyes I exclude from the batch tagging when she is facing away or her eyes are closed) as a quick-and-dirty training approach like this:

I typically train models for Illustrious XL, I don't really do SDXL that much anymore, so I just trained a model like I do for Illustrious, where I'm looking for about 50 images, and train up to 1,000 steps (this is the amount of steps before the cost starts increasing). With all default settings except for keeping 3 tokens (the max amount allowed by the Civit trainer for XL based models).

The only things I'm changing from the default SDXL settings is increasing the number of repeats until it's about 1,000, checking 'Shuffle Tags' and 'Keep Tokens': 3.

The purpose of this is mostly in order to touch of still images so that I can use them as inputs for img2vid, because if I can really define the specific detail I need present in the start and end frames, I can really anchor the character consistency.

I'm not too worried about the training as I'm just using it for inpainting, so it doesn't really need to be that good when I'm inpainting an image that is basically 90% of what I need it to be anyways:

I've never actually just used these models to generate an image by itself outside of inpainting but I think it turned out pretty, here are some generations from that model:

Inpainting

The point of all of this is if I have a scene where I feel like it's not quite there, I can extract frames and use my model to inpaint it to fix what I feel is necessary:

This is more helpful for wider shots where the character is further away so some of that detail might now show up when they are far away.

Here is a before and after example:

Before:

After:

I've found the clarity of detail is important as the longer the video generation goes, the more likely it will lose the defining features if the details aren't very clear, especially for a wider and darker shot like this.

Using inpainting is also very useful for when the framing is correct but detail is all lost (or to fix "mistakes") like in this example:

Before:

After:

Daisy-chaining multiple clips for a long sequence

The concept I'm referring to here is when we take the last frame from a video and use it as the first frame from the next. I'm sure some of you have seen examples of this to varying success.

If you haven't and give it a shot, what you'll find is that it's usually very obvious where one clip ends and where the next begins and it doesn't look very good. In my opinion, this is due to two things, usually the first frame of an img2video generation will look slightly different than the subsequent frames, the second is due to there usually being a very sudden change in character / camera movement.

The first long sequence in this video is 3 videos that are daisy-chained together, it still looks pretty obvious where it happened, but trust me, it looked a lot worse before I did any editing:

To get it where I got it to is pretty simple, there are three things I think make it look a lot better:

Identify and Remove inconsistent frames

This one is pretty simple and sometimes you get lucky and you don't have one (more likely to not have a first inconsistent frame(s) if the video isn't realistic). Sometimes it can happen on the last frame, as in the last few frame(s) have to be removed as well in order to find the latest frame in the first clip and the first frame in the second clip that are consistent in color.

I've tried to just use color correction in order to get rid of this issue, but I'm not good enough at that to be able to do it.

Just go frame-by-frame and compare the clips by playing through the end of the first and the beginning of the second until the color looks consistent across the transition. Sometimes the amount of frames you have to cut out is just too much and it might be worth it to try to generate a new video, depending on which one is the bigger problem. This is hard to show visually without a video but I'll have pictures for the next few ones.

Cross dissolve the two clips

I can't believe I didn't think about this before but even if it's not a seamless transition, a very short cross dissolve might just enough to blend it all together.

I've been trying to make stitching together clips look better for awhile now, basically since I started getting into making videos and this has been far more successful than anything else I've tried, and I've tried a lot considering how little I know about video editing.

Camera movement between clips

I realize now, after the fact, it would be a good idea to plan out the camera movement first. A continuation of the movement is going to be hard, as hoping it moves the same exact way is just rolling the dice. I think the sudden change in camera movement is one of the most jarring and noticeable seams in stitching clips together.

In my opinion, it is better to plan a change in camera movement, and even then it can be very jarring because it's sudden. After realizing this, I think trying to smooth out that change in direction is what makes it look better:

I'm not sure how it's done exactly in Davinci Resolve or whatever else popular video editing software you may use, but messing with the speed prior to the transition so that it slows down in a gradual manner makes the whole thing look a lot more natural.

Here I have the two clips nested, but you can see where I'm adjusting the speed, but another tip is to use effect controls on the subsequent clip in order to make the transition more natural. In the preceding clip, the camera is moving forward, and I was getting a very sudden stop on the next clip. It's still kind of sudden but it's smoother than it was before. I'm zooming in a bit after the second clips starts and I had to play around with it a bit until I found the right amount and for how long but the end result is a lot less jarring and conspicuous than it was before.

Removing small imperfects, frame-by-frame

This one is very simple and it's nice when there is just few random imperfections that you notice.

I noticed that there was this imperfection at the bottom right of this clip for just a moment, it was only three frames. The brightness of the imperfection made it very noticeable, even though it was a very short amount of time.

I just exported that frame into an image:

And just using Microsoft's image viewing app (I don't even know what it's called), i used the AI erase feature to get rid of it and for the other two frames:

Then I just put it over the clip in the timeline and it was like it was never there, super easy.

Anyways, I hope some of that was helpful to you. If you haven't been following my journey, I never did any video stuff before AI, and I got into AI during SDXL, so if you think you can't do stuff like this, I didn't know how to do any of this before. If you think this is all super basic and easy, then feel free to reach out to me and tell me how to do it better so I can learn 😂 I'm not an expert and I just want to make cool stuff.

If you like my stuff and want to support me, what would mean the most to me is if you followed my Youtube

or Twitter account

I never really developed them and I don't have much following there compared to my Civitai account so that would be really helpful to me.

See you all next time ✌️

High effort video creation - tips & tricks / Vidu Q2