NJIT Hosts Multimedia Retrieval Forum to Make Smarter Info Networks
New Jersey Institute of Technology hosted the 12th ACM International Conference on Multimedia Retrieval with in-person and remote participants focusing on critical, societal and technical presentations in the art of searching for vivid data online.
It's easy to find single-type data such as text, whether you're using a consumer-oriented tool like Google or a professional application made for investigators and scientists, but the state-of-the-art for retrieving multimedia data is murky, experts at the conference said. Years of experiments and papers have led to simple functions such as grouping similar pictures into albums, and perhaps intelligence applications that can't be widely discussed, among a sea of academic research.
Alajandro James, chief scientist at New York-based Dataminr, cautioned attendees to learn from iconic mistakes in the past few decades of multimedia search-and-retrieval methods.
James noted that information retrieval is often confused for simply searching and browsing, rather than being properly understood as the art and science of finding accurate results. He cited the example of a user searching for pictures of cats by entering a typical line drawing. The software would comically fail at Internet Pictionary, he joked, because line drawings can't make Google consider the colors or textures of fur, let alone the distinction between cats and every other four-legged animal. Humans, however, would instantly recognize the house cat.
James said Google and Facebook are both working on ways to add context, not just keywords, to multimedia searching. It has a long way to go, as a Google Photos app on your phone might successfully suggest grouping your vacation pictures into an album, but would then confuse a photo of men in Western suits and men in Middle Eastern white robes as being groomsmen and a bridal party.
James' employer, Dataminr, takes a different angle on multimedia retrieval. Its software looks at factors such as social media posts and extracts hopefully relevant data to share with first responders, theoretically helping to save lives by adding context to emergency calls.
AI fighting disinformation is like a cat-and-mouse game, where the mouse is winning
The conference also included a new workshop, the International Workshop on Multimedia AI Against Disinformation, or MAD2022. "The project addresses media, which is a crucial sector with a great power that can shape societal values, debates, opinions and could nurture a democratic society or, if things go wrong, lead to polarization and crises. … While ensuring that the European values of ethical and trustworthy AI are embedded in future AI deployments. Therefore, using AI to fight disinformation is a core interest," explained conference organizer Bogdan Ionescu, of Politehnica University of Bucharest.
"The objective is to encourage and discuss cutting-edge solutions to fighting disinformation at all levels such as deep fakes, fake news and manipulated content," Ionescu said. "AI fighting disinformation is like a cat-and-mouse game, where the mouse is winning. Disinformation is generated mostly by machines which are also used to uncover this content. Even worse, machines are trained to avoid being caught cheating and generating this fake content. For instance, general adversarial networks use a dedicated network which tries to spot the artificially generated content. This process is optimized and optimized until it fails to detect it. This is a specific example for deep fake generation."
"Nevertheless, there is good news as everything manipulated leaves a trace, like in real-life forensics. The researchers are catching these leaks with their algorithms. For instance, [optical measurements] can be used to measure the heart rate from a video of a person's face. When forged, these videos fail to provide the same continuity and thus heart rate estimation fails and has a specific, unnatural signature. Therefore, AI has to be always a step ahead of the forging techniques," he added.
Vincent Oria, chairman of the computer science department in NJIT's Ying Wu College of Computing, served as the conference general chairman and said he was pleased with the turnout and international participation. Expectations were for most people to attend in person with a few joining virtual, although it turned out the opposite, he noted.
Oria said one of his personal favorite parts of the event was the best paper session. The winning professional paper was Cross-Modal Retrieval between Event-Dense Text and Image, led by researchers at Wuhan University of Technology. The winning student paper was Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval led by Jiangxi Normal University.
Recordings of many conference sessions are available through the ACM Digital Library.