Select your language

The Future of AI Video (I)

El futuro del video con IA (I)

The first part of this article discusses the milestones for the industry and the future of video with Artificial Intelligence, AI.

By: PhD. Luis Fernando Gutiérrez Cano and Mag. Luis Jorge Orcasitas Pacheco*PhD.

In a recent announcement, OpenAI has introduced Sora, a technological innovation whose primary purpose is to translate every detail of the textual description provided by the user for the generation of a real video that provides an imagined vision. The instructions for creating these videos can vary, from general descriptions to detailed instructions that offer special attention to every detail in the generation of the video.

OpenAI's demonstration of this new technology showcased how it works and advances made during the research, development, and innovation process, setting a significant milestone for AI in the field of video generation. In the image above, an elegant woman is shown walking down a Tokyo street, imbued with warm neon lights and lively street signage. She wears a black leather jacket, a long red dress, black boots, and a black bag, accessorized with sunglasses and red lipstick. Its passage is safe and carefree, while the street, damp and reflective, creates a mirror effect of the colored lights. Numerous pedestrians pass through the center of Tokyo.

- Publicidad -

In the videos available at the following link provided by OpenAI (https://openai.com/research/video-generation-models-as-world-simulators), you can see a surprising temporal coherence, an exceptional level of detail in each element, perfect reflections and a consistency in the camera's perspectives. However, it is important to note that deficiencies are also observed, such as incoherent movements or the appearance of foreign elements.

That said, the results suggest that scaling video generation models is a promising path toward building general-purpose simulators of the physical world.

Real-world simulators
Large-scale video generation models offer promising tools for building general-purpose physics-world simulators, as OpenAI states. A prominent example of its potential is the recreation of complex phenomena, such as waves, which have often been difficult to depict, as James Fam, a route to market (https://www.linkedin.com/in/james-fam-0b6a8916a/) executive, says. However, models like Sora can generate remarkable realism, as can be seen in the image below. Although they are under development, these AI models offer great potential to generate realistic simulations of the physical world.

In the article Video generation models as world simulators OpenAI (2024), it is explained that this technology works by means of large-scale training of generative models for video data and uses an architecture that operates in spatio-temporal patches of latent codes, which allows generating one minute of high-quality video. This technological innovation revolutionizes content production, allowing anyone to create videos of unprecedented high quality and efficiency, with simple instructions, however, its ability to generate misinformation and question authorship poses important ethical and economic dilemmas. Figure 3 shows the procedures and reference criteria that each development provides to the audiovisual industry and to the users of this innovation.

The effectiveness of this novel approach in artificial intelligence is due in part to the use of units that combine various forms of content, such as code, mathematical expressions, and several natural languages. In this context, we investigate how visual data generation models can take advantage of these advantages. In contrast to traditional models that employ text units, Sora relies on visual patches, proving to be a scalable and effective strategy for training generative models on a wide variety of images and videos.

It is critical to note that Sora is a fuzzy transformer (a type of neural network architecture that transforms or changes an input stream into an output stream), and these types of models have been shown to be appropriately suited for video creation. As the training progresses, there is a noticeable improvement in the quality of the video samples with fixed seeds, i.e. a specific set of parameters or initial conditions that are used to initiate an algorithmic generation process. These fixed seeds ensure that the process starts consistently and reproducibly every time it is performed and custom inputs.

Compared to previous methods for image and video generation, where videos were often resized and cropped to a standard size, training with data at its original size has been found to offer multiple advantages. Sampling flexibility allows Sora to create content for different devices while maintaining their original aspect ratios, such as standardizing content into smaller sizes before generating a full resolution, all with the same model. Training with videos while maintaining their native aspect ratios has been proven to improve composition and framing.

- Publicidad -

El futuro del video con IA (I)In addition to generating text-to-video samples, Sora can also be trained with other inputs, such as pre-existing images or videos. This capability expands the range of image and video editing tasks that Sora can perform, such as creating videos with perfect loops, animating static images, and extending videos forward or backward in time. So far, Sora is not available to the general public, because it is at a crucial stage where it is essential to ensure that it will not be misused to provide security to future users. In addition, it has not yet been confirmed whether Sora will have a pricing plan or a free version.

The video generation developed by Sora is presented as a versatile solution in various fields, such as creative content production, marketing, education, and entertainment. Sora is emerging as the ideal choice for those looking to produce high-quality video content efficiently. By leveraging artificial intelligence models, it significantly improves efficiency in video creation, becoming a very useful tool for advertising agencies, marketing departments, educational institutions, and entertainment industry professionals.

The integration of artificial intelligence in video generation, as evidenced by Sora, marks a significant milestone in technological evolution. From its modest beginnings with ELIZA to current generation models, artificial intelligence continues to redefine the concept of creativity and digital interaction, promising a future full of digital possibilities and tools.

Significant advancement in every development for the industry.
Sora is a representative advance in the generation of visual models as a generalist model capable of producing videos and images with a wide range of durations, aspect ratios and impressive resolutions, managing to generate a full minute of video in high definition. This progress not only expands the creative and production possibilities in the AV industry, but paves the way towards creating more accurate and realistic simulations of the physical world.

First milestone for the audiovisual sector: This advance represents a breakthrough for the audiovisual sector, with a series of benefits, challenges and opportunities. Video compression networking offers advantages by decreasing the dimensionality of visual data, which can improve Sora's efficiency and performance in content generation. However, it poses challenges in terms of quality and the process of interpreting the compressed data. Despite these challenges, this technique enhances Sora's ability to create high-quality visual content in a compressed latent space, which could lead to significant advancements in the AV industry.

Scalling transformers for video generation
Sora is a diffusion model trained to predict "clean" patches from noisy input patches, being a diffusion transformer that exhibits remarkable scalability properties for video generation, with a significant improvement in sample quality as training capacity increases (Figure 5).

- Publicidad -

*Luis Fernando Gutiérrez Cano and Luis Jorge Orcasitas Pacheco, are professors and researchers at the Universidad Pontificia Bolivariana headquarters Medellín, in the undergraduate and postgraduate programs of the Faculty of Social Communication-Journalism. In this edition, it has the support of students Laura Sofía Arboleda Ortega and Mariana Giraldo Correa.


No comments

• If you're already registered, please log in first. Your email will not be published.

Leave your comment

In reply to Some User
A new Alegria Party at NAB Show with a full house

A new Alegria Party at NAB Show with a full house

NAB. A new NAB Show, a new Fiesta Alegría. It is always a pleasure to meet colleagues and friends from the Latin American broadcast industry who attend the invitation of TVyVideo+Radio every year.

Sony recognized its Latam strategic partners at NAB 2026

Sony recognized its Latam strategic partners at NAB 2026

NAB. As part of its participation in NAB Show 2026, Sony Professional Solutions Latin America (PSLA) held its long-awaited Broadcast Reseller Meeting, a key space to strengthen the relationship with...

305 Broadcast and SCMS seek to strengthen their presence in Latam

305 Broadcast and SCMS seek to strengthen their presence in Latam

Latin America. 305 Broadcast, founded by Alfonso Lopez and recognized for more than 18 years of service to the broadcast industry, announced a strategic alliance with SCMS, a major U.S.-based...

Netflix presented creative training initiatives at the FICCI

Netflix presented creative training initiatives at the FICCI

Colombia. As part of the "Industry Night" of the Cartagena de Indias International Film Festival (FICCI), Netflix reaffirmed its commitment to the Colombian creative ecosystem, announcing four new...

Nacho Carballo, new Global Managing Director of EFD Studios

Nacho Carballo, new Global Managing Director of EFD Studios

Latin America. EFD Studios announced the appointment of Nacho Carballo as the new Global Managing Director, in a decisive commitment to transatlantic collaboration and operational integration.

Campaign launched against piracy of audiovisual content

Campaign launched against piracy of audiovisual content

Argentina. ATVC and CAPPSA presented an awareness and prevention campaign aimed at making visible the direct impact of the consumption of pirated content on users, with special emphasis on the risks...

Music business for productions is transformed

Music business for productions is transformed

Slipstream's catalog exceeds one million tracks, in addition to more than 300,000 sound effects. Richard Santa

Lawo introduced converged video and audio stagebox

Lawo introduced converged video and audio stagebox

Latin America. With Edge One, Lawo opens a new chapter in audio and video connectivity for broadcast and professional audio/video workflows. Edge One offers great flexibility on the I/O side,...

Blackmagic Announces Davinci 21 and More News for NAB

Blackmagic Announces Davinci 21 and More News for NAB

NAB. Blackmagic Design made several announcements ahead of NAB Show 2026. Among them is DaVinci Resolve 21, of which its public beta version is now available for download.

Atomos acquires Flanders Scientific

Atomos acquires Flanders Scientific

Latin America. Atomos announced the acquisition of Flanders Scientific, one of the most prestigious brands in professional benchmark monitoring. This strategy reinforces Atomos' long-term commitment...

Suscribase Gratis
Remember Me
SUBSCRIBE TO OUR ENGLISH NEWSLETTER
DO YOU NEED A PRODUCT OR SERVICES QUOTE?
LATEST INTERVIEWS
SITE SPONSORS










LATEST NEWSLETTER
Ultimo Info-Boletin