Reverse Transcription – Daniel Miessler

April 23, 2023 3 min read

What happens when everyone can become a video star just by having a script?

Created/Updated: November 3, 2022

There are dozens of reasons to be bullish on AI right now, especially in the GPT space where we have AIs producing so much extraordinary art. But I’m excited about something else that we’re naturally evolving GPT into, which I’m calling Reverse Transcription.

A bit of background

Right now there’s a massive push towards content creation and content creators. Millions of people are either watching people on TikTok or YouTube or Twitter or Substack, and many of them—especially young people—are thinking they want that to be their career.

MKBHD doing his thing

But there’s a chasm between those who can write, vs. those who can also make a podcast, vs. those who can also make videos. And then there’s the unicorn people who can do all of that, but they also feature themselves in the videos in addition to having the best production in the world. That’s people like MKBHD.

So, you know how GPT-3 can create images from text? Well it can also do that with video. Here’s a company that’s doing this commercially already, called Synthesia.

That’s an avatar speaking the text that you give it. And it looks like a real person.

Now imagine MKBHD doing that, but passing along what his studio background looks like, and what kind of t-shirt he’s wearing, and what stylization he wants in the video.

The future of video production

One of my professional videos with my RED camera and Neumann mic, wearing one of my merch shirts, speaking in my most energetic voice, excited, optimistic

So you pass it a script that you want it to read, along with this prompt, and a few seconds later you have a full video. With bokeh, with all the details that make it look like your own set. And the avatar on the screen looks exactly like you. The speech. The mannerisms. Everything.

How? Because you pointed it to all your previous videos, and it just figured out what “youness” actually means.

What’s so crazy about this is that if you need to cut a word out, add a sentence, or whatever, you just edit the script and resend it. Even better, you can change what you’re wearing, change the studio, or put yourself speaking from the beach.

The hard parts of video production become easy, which will bring all the focus back to the content iself.

As the AI improves it’ll do the prompt engineering as well.

Of course there will still be people who are better and worse at doing this. People better and worse at using these tools. Etc. And just like with AI Art, the people who are best at it are those who actually know how to make the stuff organically. But that will be more true towards the beginning. The better the tech gets the more that gap will close.

What we’re about to see is extraordinary.

The ability to go from text to a perfect podcast, or a perfect YouTube video. Without any audio or video work being done by the creator.

Think of how much new content is about to be created. And how it’s going to fundamentally change the creator space.

Source link