The age of mechanical reproduction is over.
Welcome to the age of artificial reproduction.
In April 2022, creative AI had its iPhone moment. Due to incredible progress in Machine Learning and Artificial Intelligence, this year a new movement in the creative world has emerged.
“Text-to-Image” AI is here.
And it changes everything.
In this essay, I want to elaborate on some of my recent thoughts and tweets on these new generative systems. It’s an article for every creative and for everyone who has thus far ignored the complexity of the situation that we’ve gotten ourselves into.
In my opinion, this technology is the most important consumer technology since Google Search, and I will go as far as claiming that these systems will usher in a new and potentially final industrial revolution.
With these new prompt-based systems, we can now generate (almost) any image we want by simply instructing the machine with natural language.
The machine does in seconds, what a human needs to learn over years.
As a result, art and design as crafts will change radically. Humans will be competing against machines in a new kind of information warfare.
Supply chains in the creative industry will change very rapidly, resulting in the loss and transformation of millions of jobs, unfortunately including the most skilled creative craftspeople out there. This new form of artificial content production will challenge how we perceive the creative industry as a whole and how we are going to value our analog offline world.
At the end of this article I make the following 3 business predictions:
- Many companies will soon run Text-to-Image AIs inhouse
- Google Search is dead
- AI Media will replace human media in the long run
Status Quo — 4 important Text-to-Image AIs
Before we start, it is important to comprehend what “Text-to-Image AI” actually is.
Text-to-Image (and also Text-to-Video) AI systems have received massive upwind after the release of OpenAI’s DALL-E 2 in April 2022. Today, only a few months later, such tools generate millions of fictional images every day for the Internet.
Since the release of DALL-E 2, 4 models have sparked my interest (chronological order):
For the rest of this article, it is crucial to understand what these Text-to-image “transformer models” are capable of, so I want to start with a short and simple introduction to the different models.
If you know the above or have used them, feel free to skip this entire section.
The release of DALL-E 2 was the iPhone moment of creative AI: an industry milestone and the first time the broad public got its hands on a powerful user-friendly Text-to-Image model.
DALL-E 2 is owned by OpenAI, an American company that has dedicated itself to the “safe” development of AI software. We will later see why this is difficult/impossible.
Generating an image with DALL-E 2 is fairly straightforward, once you have signed up. Similar to a Google Search, DALL-E 2 only provides users with a text input field (a “prompt”), and the possibility to upload an image file.
Once an image was uploaded or generated, DALL-E 2 furthermore provides a set of tools. One tool is called “Outpainting”. Here users can expand existing images with AI-generated content (IMO the best use for DALL-E). Another one is an “Edit Image” functionality, “Inpainting”, where users can replace parts inside an image.
DALL-E 2 is especially useful when it comes to generating images of absurd, complicated scenes. Without special clear instructions, the quality of the output is most often very similar to the aesthetics of a Google Search, but also sometimes rather weird, especially when it comes to generating real people.
With 15$ per 115 credits (1 Credit = 4 Images generated), it is rather expensive.
Midjourney has become my personal Go-to system. It was the first AI Model that was open to the broad public through a Discord server and allowed for community collaboration. To date, Midjourney has evolved a lot and improved its algorithm several times. By sending messages that start with “/imagine prompt” to the server or its Chat Bot, subscribed users can generate several images at a time.
Thanks to Midjourney’s understanding of complex inputs with a multitude of commands, it is IMO the best AI to generate photorealistic results and currently the preferred tool for professional creatives.
Midjourney also allows to generate pictures of existing people, which is still somewhat forbidden with DALL-E.
The website of the project offers a dashboard for logged-in users with additional tools and features, almost similar to an AI version of tumblr.
Its unlimited subscription plan costs 30$ per month and is thus way cheaper than DALL-E for power users. All of the generated images on Midjourney are displayed publicly, but there is a 50$ version that offers privacy features.
A list of Midjourney’s Artist styles can be found in this Google Doc: https://docs.google.com/spreadsheets/d/1cm6239gw1XvvDMRtazV6txa9pnejpKkM5z24wRhhFz0/edit#gid=438712621
Craiyon, also known as DALL-E Mini, is probably the funniest solution among the 4. While the results are primarily weird or creepy, they are generated very fast and in bulks of 9 pictures. Craiyon allows to generate pictures of real people, however, the results are always highly vague and only represent the “look and feel” of a person, but never the actual realistic face.
Craiyon is free to use, but its license includes a few particular clauses, like one on NFTs: when you create images for NFTs using its software, you will owe 20% of the sales to Craiyon.
Finally, Stable Diffusion was the algorithm that accelerated the entire AI game when it came out.
It is the first powerful open-source model and can thus be implemented by anyone FOR anything. Due to the quality of its results, this is a not-to-underestimate game-changer. While the most common model and its source code of Stable Diffusion have some NSFW protections in place, they can easily be circumvented/disabled to literally remove any NSFL blockades…
Stable Diffusion quickly became extremely popular and has since been embedded into many different tools, such as Photoshop plugins.
The images that are generated through stability.ai, the company that first released the algorithm together with the LMU Munich university, have a CC0 license, which means that every generated image lands in the public domain and can thus be used by anybody for free. This is also a significant game-changing factor for the creative industry, as we will see below.
There are many different GitHub repositories and a variety of Google Colab decks that already allow anyone to deploy its own Stable Diffusion model, however for business implementations, I highly recommend going with dedicated GPUs and self-hosting that cost around 0.50$–1.00$ per hour for the GPU + eventual server and database costs.
Furthermore, it is important to note that Midjourney also uses Stable Diffusion under the hood, however, the results still somewhat differ with Midjourney compared to Stable Diffusion.
For all the above algorithms, there are many different variations and combinations that are possible. For example, one can use the “Neural Filters” in Photoshop to edit the generated people or add other AI models like GFPGAN on top to enhance faces directly during the generation process.
Bonus: AI Stockphoto sites
Finally, it is important to add to the above all the new AI Stockphoto sites. I predicted the birth of such websites in April 2022, when I first wrote about DALL-E 2. They have now become a reality in the form of sites like lexica.art or krea.ai.
In the meantime, traditional stock photo sites are starting to ban AI-generated images. Since all of the images that are listed on these sites are available in the public domain, they can be downloaded and used for anything for free. This is a gamechanger.
The immediate consequences
When talking to people about these new tools, the most common reactions are the following:
- People don’t know what Artificial Intelligence really is
- They don’t comprehend that the generated images “don’t really exist”
- Once they understand, the first reaction is usually “mind blown” and then fear.
Therefore it is clear that there is a lot of need for education in this field today.
In the same sense that people needed to learn that they must not believe everything they see on the Internet, they must now learn that they must not believe anything.
The new world requires the assumption of the “A priori fake”: Everything you see on a digital medium is fake unless proven otherwise.
The machines can now invent and copy any digital image in the same way humans could. But it doesn't end there: it can even generate things that humans wouldn’t be capable of.
One example of something that is very hard to do for a human: Perfectly copying and combining the styles of famous old masters like van Gogh or Dali. Doing so would require an artist many years of training. The AI does the same task in seconds.
It is a form of industrial revolution, but this time it feels like the cherry on top. The 5th industrial revolution, the AI revolution, is foremost a cultural revolution. A revolution of the arts - a new era for art, and also a new era for the machines.
In this sense, I believe it is fair to say that AI Art is currently opening up a completely new field. It’s like a Steam Engine for Art, like the re-invention of the first camera, or the invention of Gutenberg’s press, with the difference that this time it is the end of the road.
There is simply no other creative technique that will come after this one.
I like to compare AI Art to the discovery of the imaginary/complex numbers (“C”) in Math, where suddenly we were able to calculate things that were thought to be impossible. While human imagination is incredibly large, it has physical constraints that AI has not. The vice-versa might be true as well, but always only for a short period of time: AI can always copy and learn the styles of humans.
However, humans can not always copy the styles of AI.
Traveling through these creative disciplines, one major chasm right now is the digital vs. the analog world. While humans can still excel in the analog world today, it is fairly easy to imagine robots that will eventually be able to copy most of our physical processes as well.
For other analog types of art, however, this is more difficult to achieve. That’s why IMO the value we attribute to analog, performance, and plastic art will increase a lot compared to digital, easily reproducible digital art like Photography or Illustration.
This brings us to the 4th reaction that people often come up with when learning about AI art, which is exactly this: “But can it do analog art?”.
The answer is no, it cannot. Not yet. But then, the real question is: does it really matter in today’s digital world?
How the creative industry will change
Now that the “genie is out of the bottle” and open source Text-to-image AI is accessible to anyone who signs up to these new user interfaces, there is no going back anymore.
In order to comprehend the transformative industrial power of these new algorithms, let’s start with a few examples where Image AIs will replace or change existing jobs. Obviously, not all of these jobs will simply disappear. There are nuances to how strongly they will be disrupted, so those will need to be addressed (the horse also didn’t vanish once we had the first cars).
Photographer: Stock Photography is dead. I cannot think of a single Use-case for this kind of photography anymore.
Photo Model: Many fashion models will be replaced by AI. Famous actors will sell their image rights to agencies who then sell their faces to brands. For personal use, you are now Pygmalion: you can generate a new AI Waifu for every day of the year.
Concept Artist: Their art can now easily be done in-house by anyone. Think Character Designers, Level Designers, and other disciplines that required a lot of hard work before are now easily replaceable.
Illustrator: They will use AI to visualize their idea and do only edits manually in Illustrator, Photoshop, and so on. Although I’m not sure how it’s gonna work with Vector files here yet.
Stylists, Makeup Artists, Fashion Designers, Product Designers, …
All of the above will use AI as inspiration and to save time on mockups or “rough” sketches.
Besides the mentioned jobs, there is an entire industry that will change with them. Model agencies may switch to selling AI models only. As photographers get fewer jobs, fewer set assistants will be needed, and so on.
In the worst cases, this new technology may lead to existential crises for artists, who spent their entire lives developing their own style, only to be copied by a kid with a Computer today.
This kid will have a new job: “Prompt Designer”. His/Her skill is of intellectual nature, rather than physical. This designer should have extensive know-how of art history, a big library of different styles and prompt text inputs (a “prompt-book”), as well as some basic Image Editing skills to finalize his results.
Now the revolution doesn’t stop at the creative industry. For example, one thing I have been contemplating a lot about is how Text-to-Image AI can be employed in Psychology.
Take the example of a Myers-Briggs or OCEAN personality test, where after completion of the test, the graduate is presented with a bunch of Character designs showcasing his inner persona. This would not only make the test results more interesting to evaluate, but it would also be like a mirror to the graduate.
I think we are only beginning to see how this technology will disrupt businesses all over the world, but there are a few rather simple predictions that can be made already today.
1. Many companies will run Text-to-Image AI in-house
Today the creative and cultural industry accounts for roughly 5–10% of global economic value. A big portion of this value will move to the high-tech industry.
While over time, the prices of image generations will drop to 0$, (due to high competition and ad-based models) it is essential to note that the new gatekeepers of the art and culture industry will be tech companies and tech-savvy “artists” or developers. With tools like Stable Diffusion being freely available and open source, it is not hard to imagine that more and more service providers will emerge who will set up these tools for other companies in-house. In fact, we are already doing this at Neon Internet.
Even though employees and freelancers can now use the existing tools and easily browse the aggregator websites like Lexica.art or krea.ai, at some point the question of image rights and licensing will become very important.
With DALL-E, users share the rights with OpenAI. However, with Stability.ai, every image lands in the public domain under a CC0 license…
The first logical step for companies (especially global corporates) is to move the work of their prompt designers onto privately owned servers so that they own the exclusivity to their visuals.
The next logical step after that is a global legal framework and prompt copyrights because these designers will generate images that look like real humans who (fortunately) still have fundamental rights.
In order to find out whether you are in the Stability AI dataset (LAION), you can use this dataset browser: https://laion-aesthetic.datasette.io.
So running AI in-house is more of a legal question, than a technological challenge.
2. Google Search is dead
When I first learned about lexica.art and saw that my prediction of AI Stock photo websites became a reality, I wrote a longer Twitter thread on the implications.
Let me elaborate on this thread here once again.
When we use Google or any other search engine to find images online we either want to
A) Steal a JPEG of an object
B) Learn about or get inspired by unknown objects
Both will now become infinitely more powerful due to Image AIs.
The new AI Stockphoto websites have millions of images indexed already. All of the images are available for free under a CC0 public domain license.
So you can already steal any image from there for free today. Therefore point A (“Steal a JPEG of an object”) is disrupted and doesn’t need much more explanation.
To be fair, however, Search Engines still have an edge when it comes to real places and real people, but in that case, it is not a technological, but a legal problem. Therefore in the short term, we will probably switch between these 2 types of Search Engines.
Point B (“Learn about or get inspired by unknown objects”) is more complicated and requires more hindsight. Right now, browsing these new dictionaries is a pretty wild experience. Images are still a bit weird, uncanny, creepy, and often total nonsense. This will change soon.
Midjourney already has human classifiers for example. You can select one of 4 reactions to an image. Platforms may also introduce social functions like upvote and downvote buttons. This will easily train the AIs to separate the good from the bad. A vicious circle will be kickstarted.
Then there are “Related images”, “Different styles” and so on. We will have graphs that let us browse through every dank shit we can (or cannot) think of, and we will be able to generate this dank shit in any style we know of.
Think of this like infinite recursive art loops: if you like an image, there is always another loop or derivation that anyone can add to what was already there.
Now, these platforms are not only aggregating the images, but they’re also aggregating the related prompt inputs.
What this means is that you will be able to bookmark your favorite prompt inputs and then either
- generate an infinite amount of images in the same style or
- browse through an infinite gallery of images that were already generated by someone else.
You could for example let the engine run overnight, come back the next morning, and pick the 5 images you like best.
Then start again: Infinite recursive art loops. Edit only one word in the prompt input and get millions of different results.
Down the line, for B all of the above means that in the future you will not only be able to “learn about an unknown object”, but you will simply be able to find ANY object from the set of ALL possible images.
B is disrupted.
Google Search is dead.
What works for images, also works for texts, music, and eventually video.
Obviously, Google has this on their radar, which is why they are running their own AI called Imagen. I believe that rather sooner than later, Google will add Imagen to Google Image Search: “Didn’t find what you were looking for? Generate it!”
3. AI Media will replace human media in the long run
Finally, the last prediction I want to make: “AI will kill the video star”.
AI actors and fictional characters will replace real actors as protagonists in our collective human story. These characters can be cartoons, 3D renders, or totally life-like.
Or in other words: AI-generated media and its anchormen will become our most trusted and most popular source of entertainment. Most information and culture will be derived from AI media. This relates strongly to 2) as well. At first, it will be uncanny and weird, then it will morph into the new normal.
AI will spread its tentacles into every form of digital media. Its characters, items, and visuals will be everywhere and undifferentiable from “real things”. They will become our surrogates and enable Avatarism.
With all the recent buzz about Metaverse technologies, combine all of what you’ve just read with progress in Augmented and Virtual Reality, Speech Recognition, more powerful GPUs, and so on and you can only discover that things are about to go very weird very soon.
In this sense, it seems obvious that the AI Singularity is nearing. According to leading AI researchers, the technological Singularity is expected at some point in the next 25 years. Ray Kurzweil famously predicted it for 2045. We are clearly reaching the point where we merge directly with AI through the media we consume: we are paving the way for zero resistance to Singularity by being in touch with AI visuals every day.
I often like to quote Elon Musk on this point: “Who controls the memes, controls the universe”. AI will soon control the memes. It will design the memes and produce the artifacts of our modern history. Humans will only be the vessels that carry the message. Or to quote Marshall McLuhan: Humans will be the sex organs of the machine world.
What I have observed is this: Everyone on my team that has touched Text-to-Image AI became addicted. Everybody was fascinated and wanted to play around. As soon as people have the “eureka” moment, they can’t stop playing around. AI media has way fewer entry barriers than current forms of media. Everybody can now create his/her own science fiction.
The possibilities and dreams that are being unlocked with this new technology are immense. So are the political and ethical implications.
My personal conclusion
When I did my first deep dive into DALL-E 2 back in April, I wrote a lengthy Twitter article on the matter. My conclusion back then was: Destroy this technology now.
Today, after having generated thousands of images with DALL-E 2, Midjourney and Stable Diffusion, and other tools, I hold on to this belief.
Yes, it has been among the most creative fun I’ve had in my entire life (I can also definitely say that I am addicted). I spent sleepless nights thinking about all the new stuff that I could finally do that wasn’t possible before. But like with every addiction, at some point the high will turn into a low.
And here lies the problem: There are some technologies that are too powerful to be let loose. Nuclear weapons are one of them. Generative AI is another one. It is a brainfuck. It is mass manipulation technology and again: It’s the climax of creative technology.
I fear we have no idea about the Pandora’s box that we have opened.
In the same sense that Nuclear technology can be used to do a lot of good, it can also be used to do a lot of harm. What we are experiencing now with generative AIs is like radioactive quackery during the 20th century:
Back then, people believed it was a great thing to wash your teeth with Thorium. Today we know that it wasn’t.
With this new form of art, we will reach a point of total digital abundance. Not only will the creative industry drown in complete abundance, but we will also lose our own sense of joy that comes from being creative in the long run. Ask yourself: What will it mean to be a great artist if you can simply be copied by anyone?
Will it be an honor to become an “In the style of” prompt input?
There is also a philosophical question here: Maybe, after thousands of years of art, we have reached the point of maximum inflation. There may be no great “old masters” in the future anymore.
Summing up, it is a fact that these tools are extremely cool and effective. They give creatives an edge over what was possible so far; an edge over their competition in the creative content industry.
But in the near future, this will no longer be true.
They will become the standard in the creative industry.
By using them, we are removing ourselves from the equation.
We are letting the machines write history for us.
What needs to happen
I am by no means the first person to talk about the issues around AI technology. OpenAI themselves are perfectly aware of the implications of their products. Elon Musk, who btw is a co-founder and investor in OpenAI, famously believes that AI is infinitely more dangerous than nuclear weapons.
OpenAI has extensive terms for sharing and creating content — they try to limit the negative implications to a minimum, they remove social biases from their results, and address all the superficial problems with their filters.
Unfortunately, the truth is that this stance is nothing more than a light facade as long as the market stays unregulated. Even though they are the market leader, they are in direct competition with open source software that is simply ignoring these risks and believes that “humans will know what’s right and what’s wrong”.
We hope everyone will use this in an ethical, moral and legal manner and contribute both to the community and discourse around it.
As long as regulators don’t step in and introduce these exact, legal, ethical and moral guidelines, open source solutions will decide the fate of the AI market. The “damage” has already been done by Stability AI, the information hazard is spreading. Stability.ai has released the genie out of the bottle — a genie that is on one hand responsible for millions of very uncanny, weird creations, deep fakes and gore, and the evaporation of creative economy jobs.
But they have also created incredible new opportunities, that are so unbelievably cool to play around with…
So: how can we find a middle ground here?
A few months ago, I read an extremely interesting book by Joël Luc Cachelin: Culture 2040- Trends, Potentials, Scenarios, Scenarios of Support (I think it’s only available in German and French right now).
My main learning was the following:
The further we progress on our mass migration into the digital realm, the stronger we want to turn back.
A natural resistance forms (Quote from the book: “Doing nothing will become a rebellious act”).
With all our technological advances, we have brought ourselves in the following predicament:
The people being born today are born “online” by default. They play on their parents’ phones, watch cartoons on YouTube, and so on. “Going offline” has become a conscious choice one needs to make; a choice that requires a lot of action.
But people are lazy. We have always been attracted to “Bread and Games”. We want to observe and enjoy. It often takes a catastrophe to realize that somewhere along the path one got lost and mass hypnotized.
I believe that before we can turn back and escape from the upcoming fake virtual world of total abundance, we must first reach its abyss, look deep inside it, and realize: Nope, that’s not the way we want to go as a species. Only then can we pull ourselves out of this strange mess that we’ve gotten ourselves into.
Some day in the distant future, we may decide to step away from our intelligent machines and let human craftsmanship have sovereignty over Art and Culture again. But that time is not now.
The age of artificial reproduction has begun.
It will be weird.
Thanks to my sparring partners Max Kreis, Sebastian Zimmerhackl, Mathieu Schiltz, Jeff Braun, and my associates at Neon Internet Karim Youssef, Sacha Schmitz, and Jacques Weniger.