Imagine a world where you have a personal assistant that accomplishes any task you dream of, builds any app you can think of, and can give you advice on virtually any topic you want. This futuristic idea has nearly become a reality with generative AI. The world of generative AI has become incredibly saturated over just the past few months, as competitors from Big Tech companies like Google and Microsoft are racing to give customers the best product possible. People like Bill Gates have called this event the “AI Revolution,” and getting on board as fast as possible is prudent. Learning more about generative AI can help us be more aware of the unique advantages and disadvantages this “AI revolution” brings.
ChatGPT was released to the public on November 30, 2022, and has been praised for its detailed responses on various topics ranging from programming to cooking to economics. ChatGPT, a product of OpenAI, has been virtually inducted into popular culture. Hundreds of millions worldwide have used its impressive capabilities to answer specific questions that Google or other search engines may have trouble answering. One of its strong suits is coding, as it can create basic algorithms on numerous topics in various languages. Additionally, it can help plan out documents and articles. For example, when asked for a list of topics for this article, ChatGPT spits numerous topics, such as the “history of generative AI, types of generative models, and applications of generative AI, and ethical considerations” (ChatGPT). Of course, many of these topics are beyond the scope of the article, but it is pretty impressive to see how easy it is to find topics to write about. The generator takes things a step further by writing entire articles and poems in whatever tone or writing style you want. However, it is noteworthy that all generative AI models can only take inspiration from human creations—that is, AI can never be truly creative. Everything it spits out is just a rehashed version of something humans made in the past, and this is one of the most significant limitations of modern artificial intelligence.
There are two main types of generative AI models: the variational autoencoder (VAE) and the generative adversarial network (GAN). VAEs are composed of two parts: the encoder network and the decoder network. The encoder network takes in an input, usually an image, and maps it to a latent space. A latent space is a way to represent compressed data and group them accordingly and is hidden (or latent) to the programmer. The decoder network takes samples of this latent space and maps it back to the original input, producing an image similar but slightly different from the initial input. Using multiple inputs, a VAE can create an image that combines the features of all the images. VAEs are thus best used for image generation, like OpenAI’s DALL-E model. A GAN model also utilizes two neural networks, but they are instead known as the generator and the discriminator. The generator takes in a random noise array as input and generates an image. The discriminator is then fed a real image as well as the generated image and is tasked with deciding which one is real. The GAN is trained through a min-max challenge between the generator and the discriminator. The generator has to create increasingly realistic images so that the discriminator cannot differentiate between the generated and real images. On the other hand, the discriminator has to learn how to better differentiate between real and fake images. Through this process, the GAN model can create diverse and high-quality samples that are rarely found in other models. GAN models can be used for a variety of applications, including image, text, and video generation. The main difference between GANs and VAEs is that VAEs use real images as input, whereas GANs use noise samples as input. Additionally, their training algorithms are somewhat different; while the VAE uses a loss function to train itself, GANs use the min-max mentioned above game to improve.
Other examples of AI technologies include GPT (Generative Pre-Trained Transformer), the technology that ChatGPT runs on. The technology’s USP is that it’s really good at understanding and generating human language. It does so by analyzing large blocks of text and learning the patterns and meanings behind different words and phrases. This technology, called NLP (natural language processing), can be broken up into five steps: tokenization, part-of-speech tagging, parsing, entity recognition, and tone analysis. Tokenization breaks up the text into sentences and words for easier understanding. Part-of-speech tagging marks each word as its corresponding part of speech. Parsing refers to when the model checks each word and how it relates to each other (gathering meaning from the sentences). Then the model checks to see if it can recognize any entities, like names of people, places, or organizations. Finally, the model checks the tone of the text phrase by matching it up with previous samples and seeing what kind of tone it gives off. Once it has learned these patterns, the model is used to perform a wide range of tasks, like answering questions, writing articles in a particular tone, and giving recommendations. GPT models are best used for generating high-quality, natural-sounding text that resembles what a human might write.
Ever since the advent of ChatGPT, numerous companies have come out with their own takes on AI chatbots. Microsoft has invested $10 billion into OpenAI in order to gain access to its AI model and use it in its Bing search engine. They claim that Bing Chat is more powerful than ChatGPT due to its web-based abilities. Rushing to keep up, Google has announced its Bard technology, though it is not available to the public yet. However, Bard has been observed to make more mistakes, most notably in its demo, which definitely hurt its chances of release. Meta also released its take on AI, Galactica, in November, focusing on scientists and researchers. Meta said that its chatbot could “provide assistance to scientists and researchers with summaries of academic articles, solutions to math problems, the ability to annotate molecules, and more” (Roth, 2023). However, in its public beta, the chatbot produced disappointing responses that were biased and dangerous responses, so Meta promptly took it offline. On the image side, DALL-E, Midjourney, and Stable Diffusion have been the most prominent AI text-to-image generators, with Stable Diffusion being open-source while the others aren’t. All of these technologies have warranted ethical concerns, however, as their inputs come from real art created by artists. These artists have claimed they are not receiving due credit or royalties from AI-created images, raising a moral dilemma. Does AI-created art need to be attributed to the artists that inspired it? Such questions will become the center of discussion in the coming years as AI improves and becomes more realistic. In the meantime, artificial intelligence chatbots will become increasingly useful in our daily lives and a staple of popular culture.