If AI was so smart, why do they struggle to write and count?

If AI image generators are so smart, why do they struggle to write and count?

Published by Seyedali Mirjalili at July 6, 2023

AI’s limitations with writing

Humans can easily recognise text symbols (such as letters, numbers and characters) written in various different fonts and handwriting. We can also produce text in different contexts, and understand how context can change meaning.

Current AI image generators lack this inherent understanding. They have no true comprehension of what any text symbols mean. These generators are built on artificial neural networks trained on massive amounts of image data, from which they “learn” associations and make predictions.

Combinations of shapes in the training images are associated with various entities. For example, two inward-facing lines that meet might represent the tip of a pencil, or the roof of a house.

But when it comes to text and quantities, the associations must be incredibly accurate, since even minor imperfections are noticeable. Our brains can overlook slight deviations in a pencil’s tip, or a roof – but not as much when it comes to how a word is written, or the number of fingers on a hand.

As far as text-to-image models are concerned, text symbols are just combinations of lines and shapes. Since text comes in so many different styles – and since letters and numbers are used in seemingly endless arrangements – the model often won’t learn how to effectively reproduce text.

The main reason for this is insufficient training data. AI image generators require much more training data to accurately represent text and quantities than they do for other tasks.

The tragedy of AI hands

Issues also arise when dealing with smaller objects that require intricate details, such as hands.

AI Hands 2 — Canva’s Text to Image generator response to “Young woman holding up two hands, five fingers each” prompt.

In training images, hands are often small, holding objects, or partially obscured by other elements. It becomes challenging for AI to associate the term “hand” with the exact representation of a human hand with five fingers.

Consequently, AI-generated hands often look misshapen, have additional or fewer fingers, or have hands partially covered by objects such as sleeves or purses.

We see a similar issue when it comes to quantities. AI models lack a clear understanding of quantities, such as the abstract concept of “four”.

As such, an image generator may respond to a prompt for “four apples” by drawing on learning from myriad images featuring many quantities of apples – and return an output with the incorrect amount.

In other words, the huge diversity of associations within the training data impacts the accuracy of quantities in outputs.

Will AI ever be able to write and count?

It’s important to remember text-to-image and text-to-video conversion is a relatively new concept in AI. Current generative platforms are “low-resolution” versions of what we can expect in the future.

With advancements being made in training processes and AI technology, future AI image generators will likely be much more capable of producing accurate visualisations.

It’s also worth noting most publicly accessible AI platforms don’t offer the highest level of capability. Generating accurate text and quantities demands highly optimised and tailored networks, so paid subscriptions to more advanced platforms will likely deliver better results.

Seyedali Mirjalili, Professor, Director of Centre for Artificial Intelligence Research and Optimisation, Torrens University Australia

This article is republished from The Conversation under a Creative Commons license.

Opinion 2024: Unlocking the Future of Communication

Featured Leaders 2024: Paving the path for impactful, purpose-driven messaging

Online Safety Amendment Bill ensures age-appropriate social media use

NSW applauds stricter regulations on social media for children

Not available

Opinion 2024: Unlocking the Future of Communication

Featured Leaders 2024: Paving the path for impactful, purpose-driven messaging

Government reviews online safety framework to strengthen protection

Study highlights GenAI’s role in supporting critical thinking in education

Equal visibility for free-to-air TV apps alongside streaming services

Five-year funding terms for broadcasters to be legislated for stability

Improved mobile connectivity to enhance safety and emergency response efforts

Opinion 2024: Unlocking the Future of Communication

Latest Events

20th Annual Social Media for Gov

Women in Media Industry Insight Report 2023

Your Security Checklist for Email Service Providers (EB)

Third-party data remains crucial to digital marketing strategy, study finds

Thought leadership: Scoring authority among your audience (EB)

Combating communication fatigue through a targeted engagement approach with Tanya Bowes, CSIRO

Building and retaining public trust through transparent communications with Nicki Drinkwater, NBN Australia

Implementing a multi-cultural communication strategy to reach out to diverse audiences with Jody Gleeson

Promoting positive behaviour change through creating fit-for-purpose communications with Karen Markwort

Seyedali Mirjalili

Categories :

CONTENT TYPES :

Have you got a news idea or an article you would like to share with us?

We love who we are and we are very proud to be a part of your business

Quick Links

Events

20th Annual Social Media for Gov

FOLLOW US

CommsRoom acknowledges the traditional custodians of the lands where we live, learn and work. We pay our respects to Elders past, present and emerging.

© 2024 CommsRoom. All Rights Reserved. Design by: Akolade

Not available

Latest Events

If AI image generators are so smart, why do they struggle to write and count?

AI’s limitations with writing

The tragedy of AI hands

Will AI ever be able to write and count?

Related posts

We love who we are and we are very proud to be a part of your business

Quick Links

Events

FOLLOW US