For help Email Us

The Importance of AI Driven Video


According to Trustradius, since the Covid-19 outbreak began, the web and video conferencing category for business technology saw a 500% increase in buyer activity. In addition, 67% of companies increased their spending strategy for video conferencing. These statistics, among others, show the need for video in today’s disparate digital landscape. 

Video is one thing that can still bring us together, despite the caveats of an online work world. In the previous blog, we defined what an incidental content creator was, and how there are unprecedented amounts of content out there with untapped potential. We also talked about the current disparities in video organization and tagging. Following up from that, let’s dive deeper into what alternative solutions are available in this brave new world of computerized communication, and how artificial intelligence (AI) applied to video can help. 

The Basics of How Ziotag AI Works

AI is a broad technology sector with many facets and applications. One of the many values of AI is the ability to ‘codify’ what a subject matter expert might do in a given situation or job function and then to insert that choice or decision directly into a process. That is a significant part of the definition of AI Driven Video, we use AI technology to provide viewers with a viewing experience that’s more directed, controllable, shareable and actionable. This is done by inserting AI into various parts of the recording and viewing process. In Ziotag’s case the process begins by taking the audio from a video and creating a word transcription – this is nothing new in video technology, but is an essential first step for us. What is cutting edge is having the ability to not only transcribe the spoken word, but to also transcribe the text derived from the visual elements in the video as well. 

Ziotag does this through the use of OCR technology, or optical character recognition, which can read the still snapshots in your video and determine if there is relevant text to be drawn out—such as from PowerPoint presentations or words written on a whiteboard behind the speaker. Ziotag’s AI combines both an audio transcription and the textual elements derived from OCR to gain the content it needs to let the AI do the magic.  

This all-encompassing video content allows for Ziotag’s AI to understand deeper and recognize not only words but entire contextual concepts. So, rather than simply taking a syntax snap from a hard-coded script, it creates a cohesive and comprehensive sentence model, fleshing out a concept that can then be applied. The AI also understands the changing of a speaker from the image. Then, informative sections and chapters are created based on a change in the topic being discussedtimestamping the different groups of conceptual texts. This makes for a significantly more accurate classification of all of the content that lives in the video media.

How Artificial Intelligence Makes a Difference in Video

Ziotag’s AI looks at the transcription, OCR-defined time stamps, and what is known as ontology tags.  This mulit-faceted model is built using a form of AI known as  Deep LearningDeep learning imitates the human brain in the way it processes data and creates patterns for use in decision-making.  

An Ontology is a set of concepts and categories in a subject area or domain that shows common properties and the relations between them. Ziotag’s AI uses this ontology to create tags within video media—starting first with an allocation of all the different terms that people might talk about on a certain topic. The AI in Ziotag’s ontology tags has procured more than 50,000 concepts and how they might relate to each other. 

For example, if someone was having a conversation about “Killer whales”—orca, biology, food chain, or whale pod could be subjects that relate to that topic. For every sentence that someone has said in the transcription, the Ziotag AI creates a vector—or in other words, a point in space that acts like an arrow pointing to all relative terms the sentence could be connected to.   

The deep learning process goes further to analyze and organize all of the common text into a titled “chapter”—outputting a phrase or blurb that is the best-suited title for the content. This model can recognize that the context has changed and starts a new chapter accordingly. From there, an entire table of contents is manifested for each video recording, with all of the information accurately and efficiently organized. This ability to completely understand and classify video content is what we have come to term as the Ziotag “Media Contextualization Engine”(MCE).

This MCE allows you to think about videos like books. Most times, when you’re looking for specific information from a book on a certain topic, your desire isn’t to read every page of every book. The table of contents allows you to find the chapter on a topic and more efficiently get to the content you’re looking for.  

The MCE also allows for superior indexing. Let’s say you need to find a topic but can’t remember the exact words or terms that were said. Using ontology technology, the MCE recognizes the descriptors such as keywords that you feed it to find the meaning vectors related to the subject you are referring to. For example, let’s say you search ‘Reported earnings are X’—the Ziotag AI will be able to understand that you are looking for sales numbers related to X, and show you all of the results in order of relevance.  

Ziotag AI Applied 

Through this systemized technology, you have a library of content that can be absorbed in whichever way is most convenient for you—expanding chapters, searching subjects with ease, reading, or listening at your pleasure. Since everything is represented in meaning vectors, you can tailor the organization of your video to your needs. Rather than searching on a temporal basis which has been typical in video to date, you can search on the interface of things most interesting to you, such as by image, topics, ontology tags, keywords, or meanings. Once the AI gets to know a person, it will customize the user experience increasingly over time—improving with every interaction. 

This can be transcended further to the enterprise or university level. At universities, lectures can be recorded and archived—making all of the various subjects within academia easily searchable and shareable by both professors and students alike. In enterprise, meaning vectors can be assigned to job descriptions, important topics from business meetings, or even the things that employees talk about and describe on a daily basis. Then, the videos can be grouped and delegated for multiple uses in the company such as training new hires, updating employees in current roles, or even getting an employee who missed an important meeting up to pace.

Ziotag for the Future 

Ziotag’s combination of technologies and AI is yet another multiplier on how quickly people can learn from video. With the ability to compartmentalize information accurately, search topics easily, and absorb video content rapidly— AI, as applied to video technology, will propel the remote work world into the future. 

Post a comment