The company behind the ChatGPT chatbot and the still-image generator DALL-E, OpenAI made even more news this past Friday when the San Francisco based generative AI firm announced that it had recently completed a deal that values the artificial intelligence leader at a staggering $80 billion. The jaw-dropping figure is $50 billion greater than the Thrive Capital valuation released publicly in early 2023. Under the new deal, OpenAI employees should be able to cash out their shares, rather than having the company complete a traditional funding round.
Suffice to say, AI isn’t the technology of the future; it’s already here, driving the discussion surrounding everything digital today. Just take a look at Dr. Richard Kerr’s recent AI report from Davos for further proof.
OpenAI’s Sora sneak peek
So what is Sora, besides the Japanese word for sky, and why has it led to an astronomical increase in OpenAI’s valuation? Let’s take a look.
Sora can create videos of up to 60 seconds featuring:
- Highly detailed scenes translated from text.
- Lifelike renderings oftentimes indistinguishable from captured video.
- Astonishingly complex camera motions/movement.
- Multiple characters with vibrant emotions.
Seen here along with their coinciding prompts, Sora is an end-to-end, diffusion transformer model and its videos are so alarmingly lifelike, and its animation so splendid, that OpenAI has opted to promote its latest technology to a small group of researchers and academics, holding off on Sora’s release to the general public.
Instead, the company maintained it was still working to understand the system’s potential dangers–especially regarding political disinformation in a general election year. The aim was to “red team” Sora, a term coined for addressing and troubleshooting ways in which the technology can be misused, all the while providing an enticing glimpse into the platform’s capabilities in an aim to drum up enthusiasm. Or in this instance, much more than enthusiasm.
In an effort to deter its company’s detractors, OpenAI announced last week that its A.I. image-generation tools would soon add watermarks to all images, according to the Coalition for Content Provenance and Authenticity (C2PA) standards, a Joint Development Foundation project formed through an alliance between Adobe, Arm, Intel, Microsoft and Truepic.
Images generated by OpenAI’s online chatbot, ChatGPT, and the stand-alone image-generation technology, DALL-E, will include both a visual watermark and hidden metadata designed to identify the platforms’ works as created by artificial intelligence. Competitors Google and Meta have expressed solidarity, promising commitments to match OpenAI’s efforts to combat its technologies’ misuse.
How is Sora different?
Sora is built upon a foundation of prior studies in image data generation modeling. Previous research had employed various methods such as recurrent networks, Generative Adversarial Networks (GANs), autoregressive transformers, and diffusion models. However, those models have often focused on a narrow category of visual data, shorter videos, or videos of a fixed size. Sora surpasses these limitations and has been significantly improved to generate videos across diverse durations, aspect ratios, and resolutions.
Duration and image quality aside, AI video generation faced challenges maintaining consistency and reproducibility across different scenes. The language model to video generation had a difficult time interpreting multiple perspectives and points of view from a user’s described prompt – describing a scene, circumstance, and/or location.
However, Sora can generate videos with dynamic camera motion. As the camera shifts and rotates, people and scene elements move consistently through three-dimensional space. This results in a host of sweeping tracking, dolly, and aerial shots as well as multiple vantage points.
Furthermore, Sora’s video generation model maintains narrative consistency by combining a deep understanding of language with visual context, allowing the platform to interpret prompts accurately.
Simply put, Sora possesses the comprehensive knowledge of language necessary to capture and replicate the emotions and personalities of characters from its given prompts, portraying its video’s subjects as expressive characters. This is a quantum leap forward in Gen AI video construction. It’s as if Sora functions as its own autonomous filmmaker, depicting a script laid out by the user and magically evoking a self-contained narrative with an Artistotilean, emotional journey.
In fact, this may be the key to fully comprehending the public’s awestruck response to the recently released technology. A written prompt can now be imbued with more than subjects and actions; now, video generation can distinguish adverbs and adjectives which evoke and replicate, or in the case of animated characters, personify human emotion.
Gamechanging Gen AI: Beyond Prompt to Video
While OpenAI’s Sora has been promoted as a Gen AI technology for creating video sequences in mere seconds, Sora, or its underlying technology, may be bound to make an even more significant impact worldwide.
The impact of this breakthrough is expected to span across various aspects of video creation, but it is predicted that it may likely evolve from video to further advancements in 3D modeling. If that becomes the case, not only video creators but also the production of visuals in virtual spaces like the metaverse could soon be generated in real time by AI.
Currently, Sora is perceived as ‘merely’ a video generation model, but some early prognosticators analyzing its underlying technologies have implied Sora might be a data-driven physics engine. This suggests the possibility that Sora’s AI culled from a vast amount of real-world videos, akin to Unreal Engine, might understand physical laws and phenomena.
If so, the emergence of text-to-3D modeling in the near future is probable.
What does Sora mean for the creative class?
Our take on AI in general at Worky has been that it’s a tool, not really a replacement for human talent, especially creative talent. But seeing Sora in action does stretch the limits of that definition. With high-quality video— inclusive of narrative storytelling and emotion, a mere prompt away — one can certainly see a world where AI begins to replace a number of skilled human creatives. So far, where the written word is concerned, AI has merely eliminated the blank page. A thoughtful article that is insightful and instructive, not just a roundup of readily available research, still requires a human touch.
Technology has a long history of enabling and democratizing creation from the bottom up.
Initially, we do expect this type of tool to improve and accelerate creative work. While this might be a quicker evolution of Gen AI than expected, the ability to cut costs and simultaneously ramp up production across social channels, advertising, gaming, even engineering and product design may never be the same. The ability to generate content, simulations, prototypes and models in any field that benefits from affordable 2D or 3D suddenly feels attainable.
Creating visual experiences, quickly and affordably, means bringing more production in-house for brands and agencies. It also enables a new set of capabilities among creators, visual effects professionals, and animators. There will be advances and implications for PropTech, MedTech, and other sectors where a blended physical and digital world are already converging.