Fine-tuning MidJourney Image Outputs with ChatGPT

By Laurynas Ramanauskas 5 min read
Fine-tuning MidJourney Image Outputs with ChatGPT

MidJourney is great at vibes, but not details.

You might get the mood right, the camera angle spot-on, the colors exactly how you pictured them. But then the model gives your subject an extra finger. Or the lighting looks like it came from five different suns. You end up rerolling the prompt over and over, trying to guide it back toward what you wanted in the first place.

That’s where ChatGPT, especially with GPT-4o, becomes a shortcut. You can drop the image in, tell it what’s wrong, and ask it to adjust it to fix the issues. It’s fast, and more often than not, it gets you much closer on the next try.

Let’s Walk Through a Real Use Case

To make this less abstract, I’ll show you a real example of a DJ mid-performance that I was working on. 

The first version from MidJourney had a strong visual identity. It looked like a promo photo you’d see in a club lineup poster. From a quick glance everything feels right, but taking a closer look showcased its deficiencies. His headphones disappeared in the middle of his head and his hands are missing fingers.

Here’s how the conversation with ChatGPT went:

1st prompt: “First of all the DJ is wearing botched headphones. I want you to put actual headphones on the DJ that have a cable which goes to the DJ equipment.”

The model added realistic headphones and even gave them a cable that connected naturally to the mixer. This was a big step up already, and it anchored the image in reality. It’s quite impressive that I didn’t need to prompt it to fix the finger issue, ChatGPT fixed it itself.

However, it introduced a very “plasticky” feeling to the image, as the photorealistic effects such as ISO grain and depth of field blur disappeared. I needed to fix that.2nd prompt: “Add grain to this image so it looks very similar to the original image I’ve uploaded.”

But ChatGPT gave me way too much grain. It looked like someone dumped an 80% noise filter across the whole thing. The style was gone. It felt too harsh.

3rd prompt: “It’s a bit too grainy. Can you tone it down a little bit?”

This time it backed off. The texture got better, but everything was still in focus, which makes it feel like a 3D render.

4th prompt: “Now can you add depth of field/lens blur so the DJ equipment is gradually blurred towards the camera?”

I wanted the image to feel more realistic like it was taken with a cameral lens with a wide aperture. It completely missed this time and added blur which looked super unrealistic, like someone just plopped a layer with a background blur effect applied on top of the picture.

5th prompt: “I feel like it’s gaussian blur, not lens blur. Can you fix it?”

This seemed to have fixed it. This is where specificity helps. If you have at least surface level understanding of photography, being specific about what you want without writing three paragraphs of prompts works better. What it did however, is introduce pixelation which made the image look unrealistic again. A simple prompt to fix it worked.

6th prompt: “The blur you added introduced weird pixelation, could you fix it?”

Finally, it got there. The final version had realistic focus, a believable sense of depth, and detail where it mattered. It wasn’t perfect, but it was close enough to use it.

And we have our final image. It’s not perfect, but guess what, photorealistic images aren’t either.This kind of iterative back-and-forth might sound slow, but it took less than five minutes. The best advice I can give is to use these formulas: “this looks weird, add/remove this” and “okay, try this instead.”

Why This Works So Well

ChatGPT understands structure. It’s not just guessing how a body should look. It’s trained on massive amounts of visual and anatomical context. So when it sees something that looks off, it can suggest ways to clean it up without killing the mood of the original image.

The workflow is simple:

  1. Generate your image in MidJourney
  2. If something’s broken (hands, posture, lighting, etc), download the image
  3. Upload it into ChatGPT with a short message about what feels off
  4. Iterate on the generation subtly.

Repeat as needed.

Things to Watch Out For

Like any workflow hack, there are a few quirks. Here’s what to expect.

1. ChatGPT can flatten the image style

Sometimes the improved version loses the little bits of charm – such as grain, depth of field, or lens glow. You can fix that by asking for things like “soft film grain” or “shallow lens blur;” if you forget, you might get something that feels too clean.

2. Don’t keep poking the same prompt

If it doesn’t work after a couple of tries, open a new chat and explain it again. Clean prompts work better than patched-up ones. You’ll get better results starting fresh than trying to edit a broken prompt over and over.

3. Specific is good until it isn’t

If you try to over-control the outcome with a wall of description, the model gets confused. Try to fix one or two things at a time, and leave enough breathing room for the image to evolve naturally.

Final Thoughts

This workflow doesn’t replace your creativity. It just helps shape it into something more reliable. MidJourney creates the spark. ChatGPT helps you fine-tune it.

The next time your image is close but not quite right, drop it into ChatGPT and have a quick conversation about what went wrong. Chances are, the next version will be the one you were actually trying to make in the first place.