Brief Experiments With Dall.E 3

Article main image

In the world of photography, There is a lot of talk about photo-realistic imagery created with the help of artificial intelligence (AI). There even have been some attempts to produce what we could loosely call art using AI. The efforts I have seen fall way short of what I consider meaningful art, so I’m going to simply ignore them here. But I’m interested in the general question: is it (or will it be) possible to use AI to create art?

To a large extent, the answer will depend on how you define art — or what you expect from it. In general, I prefer the latter approach to thinking about art. Trying to define art seems like the kind of futile endeavour that only serves to create ultimately end- and meaningless discussions (which, granted, fuel part of the academic discourse). Focusing instead on what one expects from art gets at whom art ultimately is made for: people with their own distinct preferences, biographies, socializations, etc.

That all said, my real interest in AI photography (let’s just call it that) originates somewhere outside the confines of art. To begin with, I have a background in the sciences (I have a doctorate in theoretical physics). I am very familiar with modeling ideas, running computer simulations, and related scientific endeavours.

Given that background, I view AI photography simply as such a project. For me, AI generated photographs are simulations of photographs that much like scientific simulations are intended to match what we could loosely call reality as closely as possible.

Computers are not sentient. When an algorithm produces what looks like a believable photograph, that’s interesting to me — at least to some extent: you can use input data (other people’s photographs) and “prompt” the machine to produce output data that mostly conforms to the conventions of the input data. I’m writing “mostly” because people typically do not have six fingers on their hands. My guess is that such problems will eventually disappear.

Attempt to re-create a photograph from my book Vaterland

But my real interest in AI photography centers on something different — even as it is related to the above. As a scientist, I know of the importance of algorithms and their input data. In a nutshell, that’s what I spent three pretty challenging years of my life on (I worked on cosmological parallel supercomputer simulations — essentially creating model universes in the computer). I know that what you put into the computer determines what comes out.

In that earlier life (if I may call it that), what came out of the simulations was being tested against what had been actually measured (by other astronomers). Inevitably, the simulations would fall short of observations, meaning we’d have to go back and run another simulation. This often involved changing input parameters, but it also meant trying to create more complex models (algorithms).

In the world of photography, the focus is the exact opposite: instead of looking at what you put in, people focus on the output — the photographs. I understand why that is the case, and I talked about that in the very beginning of this article. However, I find it a lot more interesting to study what the output might tell us about what was being put into the machine.

Unlike in the case of the work I did for my doctorate, I have limited access to the input or the model itself. The one thing I can control is the chain of words — the so-called prompt — that will have the machine make a picture for me. But that’s already interesting enough. I can’t claim to be a real expert in this. Unlike, say, Roland Meyer, I don’t spend a lot of time with AI-image generators.

My approach to such generators is very simple. In the sciences, you always prefer the simplest possible approach. If you can do something in two ways, one being a lot more complicated than the other, you pick the simpler one (the principle is called Occam’s Razor). I’m also a writer, and for pretty much all of my writing, I try to be clear and, again, simple.

Attempt to re-create a photograph from my book Vaterland

Thus, when I use AI-image generators, I work with very simple, often terse prompts. If the resulting photograph doesn’t quite look like what I thought I wanted, I usually do not change the prompt. Instead, it’s that difference that interests me: how or why does a simple, short prompt produce this picture? Obviously, I don’t know and have no way of knowing. But if I do this game often enough, I believe that I can find out about the generator itself and the input data.

In other words, I believe that the output of AI-image generators tells us something about the ideologies of both the algorithms and the input data. If this sounds too abstract for you, here’s a very simple way to understand it. If you decide to create an algorithm that produces portraits and you only feed it with photographs of white people — do you think that the output will reflect the entire population of, say, the United States? It’s easy to see how that won’t be the case.

Mind you, that’s not some generic example I came up with. That’s one of the main problems with many AI algorithms, which can have gruesome results for those whose data are absent from the input. I wrote an in-depth article for FOAM Magazine about this, so I’m going to refer to that if you’re curious to learn more (the article is called Event Horizon, and it was published in issue 56).

The other day, I found out about Microsoft/Bing’s Image Creator. It now uses Dall.E 3, the updated version of Dall.E 2. I had used Dall.E 2 in 2022 to re-create each photograph in my book Vaterland (you can read about the experiment here). In a nutshell, I found that Dall.E 2 was able to produce pretty good looking pictures (some were better than others); but when put together in a sequence, they simply didn’t work at all as a photobook.

I was curious to see how Dall.E 3 would do: Not so great actually. In fact, the outcome was so bad that I didn’t finish producing all the photographs. That’s why most of them are in their original colours here — I didn’t bother converting them into my b/w, given that this would have still not made them mine. (Given this article is not about my artistic strategies, I won’t dive more deeply into what I mean by this.)

Attempt to re-create a photograph from my book Vaterland

The first immediate problem I ran into was that unlike in the earlier version, Dall.E 3 produces images that look like they’re out of something like The Hobbit or any such Hollywood fantasy. I could have lived with that, but it got a lot worse. All of the portraits I produced looked like an assortment of hipsters.

Even worse, some of my prompts delivered what came across as a Westerners’ idea of a post-Soviet Eastern European wasteland. Parts of Vaterland had been photographed in Poland. Asking for “An empty billboard with paint markings on it in front of apartment buildings in Warsaw” delivered endless rows of brutal, desolate buildings. On Instagram, a Ukrainian photographer friend told me they reminded him of parts of the cities of Mariupol or Bakhmut (both now destroyed by the russian invaders).

In fact, none of my attempts to produce images “from Warsaw” resulted in anything credible or anything I actually saw in the various weeks I was there. When I asked the generator to produce images around the Palace of Culture and Science (the famous Stalin-era building), I ended up with some very strange cartoonish looking versions of it.

You might be tempted to tell me that I should use more complex prompts. It’s possible — I didn’t try this — that you could produce more realistic looking images of the Polish capital. However, I believe that things should work the other way around. You should have to prompt an image generator to get a cartoon — instead of having to try to get away from it.

It’s possibly too depressing an exercise to describe what exactly these cartoon images produced by Dall.E 3 represent. Roland Meyer might have more insight. But I think that this experiment can as a warning to all those who think that they can use AI-image generators and produce art. Maybe it’s art — but whose art is it? Yours — or the programmers’?

You can talk yourself into believing that you can refine your prompts until you get the desired result. But is this really the case? In a similar fashion, you could talk a different AI generator. But does that solve the actual problem at hand?

Furthermore, there are all the pictures you can’t make. For good reasons, companies restrict the possibilities of AI-image generators. Read this article if you want to find out why they have to do that. But such restrictions do not necessarily work in an artist’s favour.

If we simply stick with Vaterland, I was unable to use some of the simple prompts with Dall.E 2 because they contained words or terms that were forbidden. Mind you, my book is explicitly critical of both Germany’s culture of memory and of its neofascist AfD party. The Holocaust and World War 2 play important roles in the book. With Dall.E 2, I had to try to create some pictures without being able to say what they were.

To be honest, I don’t know how one would go about solving this particular problem. I’m probably glad that Nazis are unable to create certain images (even as the article I just liked to makes it clear that this is still a problem), even if this means that I can’t make other pictures that I use in an anti-Nazi content. Furthermore, none of this is a real problem for me, given that I make my pictures in the real world.

Whatever you want to make of the above, be aware that ultimately, AI-image generators are cliché-production machines more than anything else. The next visual cliché — informed by the many biases in the input (aka the source photographs) — is just one prompt away. So don’t focus on the spectacle — instead look at the machinations that produce it.