Generative AI Models

AI Club
5 min read6 days ago

--

Written by: Arfa Ahmed — TE Electronics, NEDUET.

In the realm of artificial intelligence, Generative AI is a breakthrough development, particularly in deep learning. It is a word for material creation that includes writing, photos, audio, films, and much more. In actuality, it is a more sophisticated form of deep learning models — big models that can perform multiple tasks outside of the box, such as summarising and Q&A classification. The foundation model can be modified for use cases with very little example data and with very little training.

Large language models, or ML models, are used in generative AI to learn how to create content that was created by humans. It then creates new content by applying the patterns it has learned. Although supervised learning is the most popular method for training generative AI models, unsupervised and semi-supervised learning are also used in their training. After receiving a set of human-created materials, including text, photos, audio, videos, graphics, datasets, and related tables, the model is trained to produce new material that is labelled similarly to the human-created material. They interpret vast amounts of data, produce insights, and respond in text, pictures, and other easily navigable formats.

Various generative AI models are used for specific tasks. Each of them has their specific uses and has specific platforms.

Generative Adversarial Networks (GANs)

These are unsupervised learning tasks that discover and learn the patterns or regularities in input data in such a way that they can be used to generate new data similar to the original dataset. GANs frame down the problem as a supervised learning problem with two sub-models; the generator model, that are trained to generate new examples, and the discriminator model which tries to classify examples as either real (from the domain) or fake (generated). The two models are trained together, in an all-or-nothing game, until the discriminator gets fooled about whether the data is similar or AI-generated. Some of the applications of GANs are:

  • Image generation, where they can generate highly realistic images from random noise, such as faces, animals and landscapes.
  • Video generation, where they can generate videos from noise or based on input such as creating animations, simulation of physical phenomena etc.
  • Data augmentation, where they can generate additional training data for machine learning models.
  • Text-to-image synthesis, where they can generate images from textual data.

The platforms that support developing and deploying GANs are TensorFlow GANs, NVIDIA’s GANs libraries, Hugging face transformers, Runaway ML, Fast.ai etc.

Variational Auto Encoders (VAEs): Variational Auto Encoders are intended for the generation and reconstruction of data. It consists of two networks, an encoder and a decoder. The encoder aims to learn efficient data encoding from the dataset and pass it into a bottleneck architecture. On the other hand, the decoder uses the latent space in the bottleneck layer to regenerate images similar to the dataset. These results then back propagate the neural network in the form of the loss function.

VAEs have a wide range of applications in many domains such as:

  • Image generation and manipulation, where they can generate new images by sampling from the latent space and decoding them back to image space.
  • Data compression, where they can compress data into a lower-dimensional latent space, which then can be used to reconstruct the original data.
  • Anomaly detection, where they can detect anomalies, making it useful in the field of fraud detection, network security and medical diagnosis.
  • Music and audio generation, where they can generate new music or audio by learning the underlying patterns in existing audio.

Some specific tools and libraries used for VAEs are Edward (for TensorFlow), Pyro, TensorFlow Probability and Fast.ai.

Transformers

Transformers are types of neural network architecture that are designed especially to deal with sequential data i.e. to transform input sequences into output sequences by learning context and relationships between sequence components. They have changed the NLP technologies by enabling them to handle long-range text dependencies and process those using parallel computing techniques, reducing training and processing time. Transformers have many use cases on any sequential data. Some of them are:

  • Natural Language Processing, where they enable machines to understand, interpret and generate human language in a way that’s more accurate than ever before.
  • Machine Translation, where they are used to provide real-time, accurate translations between languages.
  • DNA Sequence Analysis, where they are used to predict genetic mutations, understand genetic patterns and help identify regions of DNA, responsible for certain diseases.
  • Protein Structure Analysis, where they model process sequential data, making them well-suited for modelling the long chains of complex protein structures.

Some specific tools and libraries for Transformers are PyTorch, TensorFlow, Hugging Face Transformers, JAX, openNMT, AllenNLP and many more.

Recurrent Neural Networks (RNNs)

Recurrent Neural Networks (RNNs) are a class of neural networks designed to handle sequential data. Unlike traditional neural networks, RNNs have connections that form directed cycles, allowing them to maintain a memory of previous inputs and to model temporal dependencies.

RNNs work in a flow such that the input layer takes the input and combines it with the hidden layer. The hidden layers are updated based on the current inputs and previous hidden state. The output layer then produces the output based on the hidden state.

RNNs have various use cases including natural language processing, speech recognition and time-series prediction but with certain limitations such as gradient instability where the gradient becomes too large or too small making it difficult to learn long-term dependencies and short-term memory, making it difficult to remember long-range dependencies. Some specific tools and libraries for RNNs are Keras, Pytorch, MXNet and Gluon, CNTK, Paddle-Paddle, JAX and so on.

Despite the limitations, each GenAI model has its significance in the industry as they are robust in their specific domains. As the advancement continues, we can expect even more robust and handy GenAI models to develop solutions with much ease.

Conclusion

Generative AI models like GANs, VAEs, Transformers, and RNNs represent a major leap in artificial intelligence, enabling sophisticated content creation across various media. Each model type has unique strengths: GANs for realistic images and videos, VAEs for data compression and anomaly detection, Transformers for natural language processing, and RNNs for sequential data. Despite their limitations, these models are transforming industries and promise even more advanced solutions in the future. The ongoing advancements in generative AI are set to revolutionize content creation and streamline processes across numerous fields.

--

--

AI Club

The AI Club was founded by the students of NEDUET with the primary motive of providing opportunities and a networking medium for students, in the domain of AI.