Thoughts on AI Images: Art, Very Convincing Nonsense, and Visual Literacy

The world of art, of which the world of photography is a minor, rather insignificant part, prefers to hype up the latest technological advancements. Thankfully, the craze over so-called NFTs died down pretty quickly after “crypto”, a form of fake money that combines libertarian thinking with old-fashioned Ponzi scheming, imploded. Just a short time later, advancements in what is called “artificial intelligence” (AI) resulted in the release of new tools that are able to produce text that reads like real text (more on this later) and pictures that look like… well, not quite real pictures, but they seem to be getting there.

So far, the talk in the art world about AI images reminds me of when artists started exploring Second Life. There was a lot of hype, a lot of rather half-baked work. And then the fad disappeared pretty quickly. I’m thinking that AI images might stay around a little longer, but I could easily be wrong.

It’s worthwhile to point out that part of the panic that erupted over the tools bears similarities to how the world of photography treated the advent of digital cameras. I do not mean to dismiss the criticism that’s leveled at AI. But some of it seems misguided to me. Maybe I will start with the latter. There is considerable merit to discussing the problem that new tools lower costs for people buying the products and that they end up replacing human workers. (Obviously, if you’re a neoliberal capitalist, these aren’t problems but features.)

While I do have thoughts about all of that, it’s a politics problem, not a photography one. As long as we don’t pressure law makers to mitigate the effects of a rampant capitalism, we’re not going to solve problems like this. Copyright — AI images use other people’s pictures — obviously is a huge problem as well. In the US, the people who can fix this problem sit in Washington, DC.

It’s useful to keep in mind, though, that in the world of photography, we’re not entirely blameless ourselves. If rates go down all the time, if there’s a race to get everything cheaper and cheaper, then the fact that someone will be willing to work for that lower rate (or even for free) is not helping the overall cause.

In much the same fashion, to pick a completely unrelated example, if the market for photobooks isn’t growing because it’s mostly photographers buying other photographers’ books, then making more books for photographers also isn’t really a solution to the problem. Like I said, these are basically questions of politics and/or community, and we might want to treat them that way.

Coming back to photography, over its relatively short history, new technologies were created at a rapid rate, leading to new tasks arising and then made unneeded at a relatively rapid rate. As far as I can tell, in photography discussions over new technologies have always involved the kinds of discussions we’re now seeing in the context of AI. But when it comes to AI I do think that there are a few things that are interesting to talk about, in particular because they have repercussions beyond photography.

I should briefly preface the rest with what I’ve seen in terms of AI images. I’ve seen a lot of them, and so far I haven’t seen anything that has a lot of substance. I also tried an AI image tool for my own work and came to the conclusion that while occasionally, there are interesting images, AI falls way short in a larger, important sense: it’s not capable of producing something that is coherently speaking of its maker’s vision. For example, the visual narration in my book Vaterland completely evaporated when I replaced my pictures with AI ones.

But I also thought that it would only be fair enough if I elaborated on my own thinking around AI images. So that’s what I want to do in the following. After all, it’s not that theoretically, I’m not interested in AI images. But for me, they have to cross a certain bar to become art.

To begin with, creating images in a computer is not new. After all, there is computer-generated imagery (CGI). This article explains why furniture maker IKEA uses CGI for their catalogues. When you use CGI, you will have to start from scratch or from what you have already set up.

AI image tools offer something completely different, in that they assemble new images from a database of already existing ones. If you use CGI and you want a photograph of a chair, you will have to tell the computer exactly what the chair is supposed to look like, how it’s lit etc. With AI, you can tell it “show me a photo of a chair” and it will produce one:

This is one of the images produced by Stable Diffusion when I prompted it with “Show me a picture of a chair” (I used the free demo version). It looks like a chair, but it’s also wonky. Somehow, the geometry isn’t quite right and neither are various constituent parts. The articles I’ve read so far about AI photographs typically run along the lines of being amazed how realistic they look (which is debatable, but such nuance is typically omitted) and how advances in technology will make the pictures look even better (which might or might not be true; we’ll see).

It’s interesting and disturbing to notice how similar a lot of these articles are to the generally uncritical and hype-prone articles you can find in the world of tech.

Regardless, the picture of the chair was produced by assembling it from existing pictures. Conceptually, that’s interesting, because for me, the first big thing I think about is the following (which is probably based on my background as a theoretical physicist). Let’s use an example. If you buy a bunch of Lego bricks, you can create all kinds of things — as long as you follow what the bricks will allow you to do. In other words, you will be restricted by the options presented by them.

It is as if an artist, let’s say Ai Weiwei, decided to make something from Lego bricks. Whatever he’d decide to do, he’s be limited by what the bricks allowed him to do. In fact, Ai Weiwei indeed just used Lego bricks to recreate (if that’s the word) a painting by Claude Monet. “By recreating this famous scene,” we are being told, “Ai Weiwei challenges our ideas of reality and beauty.” (does he, though?)

I think a good way to look at Water Lilies #1 would be to view it either in the context of art in general (which might yield too low a bar, but your mileage might vary) or in the context of this artist’s own back catalogue. Taking the latter approach, does this piece of art strike you as being in the same league as his earlier work?

By the way, that’s a Claude Monet painting made from Lego according to Dall.E 2.

The point I’m after has something to do with originality, but it’s not quite that. I personally find originality completely overrated. There are a few photographs that I admire and that were genuinely original when they were made. However, the bulk of the photographs I appreciate were completely unoriginal when they were first made, and that didn’t matter one bit. For example, people have taken millions of pictures of landscapes before, and there still are very good landscape pictures being made. The same is true for family photography or any other genre of photography.

But using the term “originality” is too misleading anyway, because mostly, great art is not necessarily appreciated for that (or that alone). To get back to what I’m thinking about: can AI create very good art out of the constituent parts of existing art in a way that moves beyond those parts?

Could AI create a Beethoven symphony out of the parts of Bach concertos? You might think that, well, yes, Beethoven and Bach used the same types of scales and some of the same instruments. But that’s not necessarily how you go from Bach to Beethoven, and it’s not how you’d get to Schoenberg (to throw in a more contemporary composer).

If you think about this on a visual level, you could ask the same question in a different context. If you look at the history of photography, could you create later photography from pieces of earlier one?

This question can get very interesting when a first, basic answer would be: yes. Take collage. For example, László Moholy-Nagy’s or Hannah Höch’s collages are made from existing photographic material in the most literal sense. But could AI really do the same job? Could an algorithm replicate the creative genius of a Moholy-Nagy or a Höch?

My current sense is that that is unlikely. But that’s the only interesting question around AI photography for me. Everything else is merely craft. There’s nothing wrong with craft, but I’m interested in art.

You might argue that AI is merely being used by someone, so it’s not the AI that makes the pictures (this is debatable, but let’s buy into this), it’s the operator. However, the problem does not disappear. It shifts: assuming that the operator possesses enough creative genius, will they be able to make AI do things on their terms — instead of getting images based on the algorithms?

It’s a bit like looking at Facebook pages. They all look the same because people can only fill pre-arranged templates. You can only work with the options provided to you. In principle, this is not a new problem for photographers: you always can only do what your tools allow you to. But with AI, you now have a new parameter: everything you can do has to be based on something that already exists (older images).

When evaluating AI photographs in an art-historical or critical sense, you have to be careful, though. Let’s use an imaginary example. It would be pretty straightforward to imagine AI being used to create the equivalent of Cindy Sherman’s Untitled Film Stills. If that body of work didn’t exist and if Cindy Sherman decided she wanted to create such a body of work today, she’d have to feed her own portrait into the AI system, to then produce those film stills.

However, the question is whether this would be interesting. After all, the original Untitled Film Stills existed in a specific context, the so-called Pictures Generation. They thus acquired their initial meaning in an artistic climate in which artists looked into the role and value of pictures and into how existing pictures shape our world and expectations. If you imagine an AI generation of film stills in that context (admittedly an absurd idea, given that back then AI didn’t exist), then it’s easy to see how Sherman’s AI Untitled Film Stills would have done the same job as the actual ones.

However, if you imagine Cindy Sherman producing the pictures today, far removed from the Pictures Generation and with us being now in the world of fake news etc., the idea would probably fall pretty flat. The conversation about photographs has moved far from the concerns of the Picture Generation.

If you look at the untitled film still I had Stable Diffusion produce, you can see how the AI can’t get hands right. That’s a problem, albeit not in the art-historical or critical sense I’m interested in here. In fact, you could argue that the weird hand is the only interesting element of the picture. However, it’s hard to see how this AI shortcoming translates into a form of artistic merit, especially if we demand that such merit has to be at least somewhat related to a maker’s intent.

Consequently, in an artistic context, AI photographs need to at least aspire to have artistic merit. By that I mean that their makers have to attempt to contribute to the current artistic discourse. So far, I don’t see that happening (nope). Trying to make funny pictures or trying to prove that you can get realistic looking pictures with AI — that’s too low a bar.

There is a second, very interesting and very important aspect to AI photography. In a loose sense, it centers around the intersection of veracity and believability. Something might not have to be truthful to still be believable. For example, little children believe in Santa Claus. This is the general area where the generation of material, whether visual or textual, ultimately can — and very likely will — become political.

A little while ago, I tested ChatGPT to see whether I could make it write nonsense. To be more precise, I wondered whether the AI would correct factual mistakes I included in my questions. It did not. In all fairness to ChatGPT, I just got access to Google’s Bard, and it happily served me nonsense in the same fashion. Instead, both produced what I called Very Convincing Nonsense.

Very convincing nonsense is a piece of information that looks or reads as truthful and that is convincing in form, but that is actually not accurate. This type of nonsense is great for comedy. But it’s not funny otherwise, especially not if it ends up being used by Vladimir Putin’s troll farms or any of its Western equivalents (which are largely driven to undermine our democracies).

Here, you have an actual example that was disseminated by the person it is supposed to depict, Donald Trump (I found this on the same day that I started writing this article). If you look carefully at the image, you can see at least two of the standard problems of current AI image generators. The hands aren’t right. Furthermore, kneeling with your right knee behind your left heel is very, very difficult. (I literally tried this. In general, I have very good balance. But I found it almost impossible to balance the way shown in the image.)

Essentially, you have to be able to recognize the nonsense if it is embedded in something that looks or reads as convincing. If you’re unable to detect it, then… well, you’ll just take what you see at face value. In fact, while I was working on this piece, an AI picture of the pope in some stylish white winter coat fooled a lot of people. When I saw it, I didn’t believe for a second that it was real. Apparently, a lot of people did.

On the other hand, most of the people for whom the Trump image was made probably don’t believe any more in what it shows than citizens of the Soviet Unions believed socialist-realist art. It’s hard to imagine that any of those hard-right Christians believe that Trump is religious. But in the image there’s a code that is transmitted. And that code matters, because the image serves to deliver it — instead of what it depicts in a literal fashion.

Here is a recent example of an Instagram post by a member of Germany’s neofascist AfD party (it would seem that after some discussion of the imagery, the guy pulled the image). Norbert Kleinwaechter (whose last name ironically translates into “Little Guard” in English) is vice chairman of the party’s faction in Germany’s Bundestag (the federal parliament). The text reads “No to even more refugees”. There’s nothing subtle about the image, but obviously that’s par for the course for a party that has a history of producing racism. Note that the fictional person at the leftmost edge of the frame has five fingers.

If you’re not part of the target group, it’s important to be able to read the codes. They might be blatant as in the case of the AfD image, but they can also be more subtle. The visual code often connects to an invisible code that delivers the actual message. Just like in the case of the extremely well balanced Trump, photographic veracity isn’t the actual point.

In my book Photography’s Neoliberal Realism, I talk about codes in a different context. It’s easy to make fun of images like Trump’s or the photographs discussed in my book. But if your response ends there, you’re not performing the crucial and more important second step: understanding the codes that are being exchanged. You’re short-circuiting your critical facilities.

A good and very instructive example of detecting very convincing nonsense in AI image was recently discussed by Hiroko Yoda. Someone had AI generate a picture of a Japanese woman in a kimono (if you look at Yoda’s Twitter thread, you’ll see that the original post she referred to has now been deleted). “I’m certified as a kimono consultant in Japan,” Yoda writes, “and this triggered me in all sorts of ways.”

To begin with, Yoda notes, there are some obvious craft issues. The AI showed a kimono that for a number of reasons couldn’t really be made, whether in terms of the materials or in the way it was folded. But there also is a very important cultural issue tied to it: “the biggest issue is super critical. Look closely at her white undergarment visible at collar. It’s folded right over left — used only for the dead. This is super creepy. So you have a white-faced woman wrapped in fabric scrap with odd hair accessories & funeral undergarments; if I ran into her in Kyoto’s Gion at night I would probably freak out!”

I’ll be honest and admit that I would have not been able to notice any of the problems because I’m not familiar with the details of kimonos. (I can see how the chair I had AI create doesn’t make sense, though.) But I think you can easily see how what Yoda describes is very important: nobody in their right mind would wear a kimono this way in Japan. Obviously, the AI has no idea there is a problem because otherwise, we must assume, it would not have created the image this way.

What we’re left with is a visual example of very convincing nonsense: an image that does not make sense. But you would only know if, in this case, you were a kimono consultant (or, at the very least, someone who knows as much about the garment). And that is exactly the larger problem with AI images that we’re about to run into more and more. The problem is not only that images get produced to show something that didn’t happen or doesn’t exist (even though that’s bad enough).

The larger problem, at least in my view, is produced by images that convey nonsensical information even if they were supposed to be truthful and accurate, images that are so convincing that we take them at face value. I suppose you could view this problem as the equivalent of glitch artifacts. But in AI images, apart from wonky hands or other optical problems the more dangerous glitch artifacts are only visible if you have enough background knowledge to detect them.

One of the solutions we have for this problem is a vastly increased awareness of the importance of visual literacy. Specifically, by visual literacy I mean knowledge of the way of looking into how images convey their meaning. We will have to become a lot more aware of how we consume images. This would involve teaching visual literacy in schools and universities (outside of art departments or classes).

We will probably also get used to the fact that we often have to research images online. If we see an image we might have to look around and see where else it shows up, to infer something about its truth value. My guess is that verification tools will become available. It’s easy to imagine an arms race between AI image-creation and verification tools.

Whatever the outcome, now is the time to start becoming more critical of AI tools. Now is the time to start thinking about how to deal with them. AI image click bait is fun — but it’s just possible that at least some of the time is better spent on learning more about how to critically look at images.