Generative AI System Design Interview - Chapter 1 - Introduction and Overview

scroll ↓ to Resources

Contents

  • GenAI overview
    • [[#GenAI overview#Discriminative|Discriminative]]
    • [[#GenAI overview#Generative|Generative]]
    • [[#GenAI overview#Three pillars of GenAI|Three pillars of GenAI]]
      • [[#Three pillars of GenAI#Data|Data]]
      • [[#Three pillars of GenAI#Model capacity|Model capacity]]
      • [[#Three pillars of GenAI#Compute|Compute]]
      • [[#Three pillars of GenAI#Scaling laws|Scaling laws]]
  • Framework for ML system design interviews
    • [[#Framework for ML system design interviews#Clarifying requirements|Clarifying requirements]]
      • [[#Clarifying requirements#Functional requirements|Functional requirements]]
      • [[#Clarifying requirements#Non-functional requirements|Non-functional requirements]]
    • [[#Framework for ML system design interviews#Framing the problem as an ML task|Framing the problem as an ML task]]
    • [[#Framework for ML system design interviews#Data preparation|Data preparation]]
      • [[#Data preparation#Data collection|Data collection]]
      • [[#Data preparation#Data cleaning|Data cleaning]]
      • [[#Data preparation#Data efficiency|Data efficiency]]
    • [[#Framework for ML system design interviews#Model development|Model development]]
      • [[#Model development#Model architecture|Model architecture]]
      • [[#Model development#Model training|Model training]]
        • [[#Model training#Methodology|Methodology]]
        • [[#Model training#Data|Data]]
        • [[#Model training#Loss and ML objectives|Loss and ML objectives]]
        • [[#Model training#Task-specific challenges and mitigations|Task-specific challenges and mitigations]]
        • [[#Model training#Model inference|Model inference]]
    • [[#Framework for ML system design interviews#Evaluation|Evaluation]]
      • [[#Evaluation#Offline Evaluation|Offline Evaluation]]
      • [[#Evaluation#Online Evaluation|Online Evaluation]]
    • [[#Framework for ML system design interviews#Overall ML system design|Overall ML system design]]
    • [[#Framework for ML system design interviews#Deployment and monitoring|Deployment and monitoring]]
  • Resources

GenAI overview

  • AI >> ML
  • ML models can be discriminative or generative

Discriminative

Generative

  • Learn the distribution of input data (e.g. for Image\Text generation tasks) or the joint for sequence-to-sequence generation (text-to-image, translation, text-to-sql, etc.)
  • Classical
    • naive bayes
    • Gaussian mixture models (GMM)
    • Hidden Markov models (HMM)
  • Modern
  • Applications
    • chatbots
    • summarization
    • content creation
    • Text-to-video\image\audio\another language
    • synthesis of anything generic or very specific

Three pillars of GenAI

  • Data, model capacity, compute
  • scaling laws
    • experimentally derived dependency between the number of parameters, training data tokens and best performance (lowest loss)

Data

  • in Discriminative ML specific tasks require specific datasets which are costly to acquire
  • GenAI is self-supervised learning
    • can employ vast amounts of unstructured data from the internet, ==no data labeling==
    • modern models like LLama 3 are pre-trained on tens of trillions tokens

Model capacity

Compute

  • also measured in FLOPs, but now as the total number of operations needed to train the model
  • modern GPU can deliver up to 60 teraflops per second, but training modern models on a single GPU still takes years and cost >$100 mil

Scaling laws

  • improvement in performance can be predicted from model size, dataset size, or computational resources increase
  • previously models were not large enough for the amount of data, as these two should scale linearly

Framework for ML system design interviews

  1. Clarifying requirements
  2. Framing the problem as an ML task
  3. Data preparation
  4. Model development
  5. Evaluation
  6. Overall ML system design
  7. Deployment and monitoring

Clarifying requirements

Functional requirements

  • What the system should do? Which functionality should be present?
  • Shape the high level architecture
  • Examples
    • chatbot: Based on a user's request, the system searches internal documents, retrieves only contextually appropriate information, and generates a draft response for the user.
    • Harmful content detection system: The system must moderate generated content to filter out harmful, biased, or inappropriate output before showing it to the user
    • intent recognition: The system must interpret user intent by analyzing and extracting entities and context from the input text
    • AI use-case - Fitness app: The system must analyze video recordings of users' gym training and give advice on technique, create exercise and meal plans

Non-functional requirements

  • focus on how the system performs, NOT what it does
  • Consider performance, scalability, business objectives, data requirements, architecture constraints, usability, security, ethics, reproducibility, etc
  • Examples
    • minimum acceptable speech generation speed
    • GDPR compliance of the whole system
    • chatbot language support
    • an average new user must be able to get first generated image in under N minutes
    • support for peak load under X concurrent users with max degradation of response time of Y%, no output quality degradation

Framing the problem as an ML task

  • Is ML\AI needed to solve the problem?
  • Define system inputs and outputs based on the requirements?
    • modality (image, text, audio,…)
    • should that be a single model or specializaed ones?
  • Choose a suitable ML approach
    • Discriminative vs Generative
      • choice defined by the system output
    • Identify the task type
    • select an algorithm
      • ability to handle input mentalities
      • What are the specific trade-offs between different algorithms in terms of quality, efficiency, and ease of use?
      • Is the chosen approach scalable and flexible enough to accommodate future changes or additions to the system’s capabilities?
      • How easily can the system adapt if new input modalities or outputs are introduced later?

Data preparation

  • Structured data
    • categorical
    • numerical
    • ordinal
  • Unstructured data
    • Emails, social media posts, multimedia data, pdfs

Data collection

  • Data scraping from the Internet and other sources like books
  • synthetic data generation to use in training later
      • Improving data diversity, helping with imbalanced datasets
      • quality of generated data depends on the model, is a GenAI project by itself and may not capture complexity of the real world scenarios

Data cleaning

  • filter out:
    • data-inherent biases: demographic, geographic
    • cognitive bias
    • misinformation
    • harmful or inappropriate
    • sensitive information
      • How sensitive is the data (e.g., personal, financial, medical)? Is anonymization necessary to protect sensitive information?
    • low quality and noisy
      • What is the definition of noisy data? Are there outliers?

Data efficiency

  • How to ensure efficient storage and processing?

Model development

Model architecture

  • explore different architectural options and weigh their advantages and disadvantages
  • discuss how the input is transformed into the output
  • tricky interview questions: how to modify the architecture to support a new feature?
    • image generation: how to let users control the style of generated images
    • text-to-video: control the direction of motion (e.g., left to right) during generation
  • canonical architecture question could be about self-attention

Model training

Methodology
  • diffusion models gradually denoise data to generate high-quality samples from noise
  • GANs rely on adversarial training, where a generator and a discriminator compete to improve over time
  • transformer uses embeddings representation and attention mechanism
  • in practice, LLMs undergo multi-stage training
    • pretraining on large datasets to learn general patterns
    • supervised finetuning to adapt to the specific task
    • alignment stage to ensure outputs align with human values or intended behaviors
Data
  • data sources, volumes and quality differs per stage
    • public data for pretraining vs carefully curated proprietary data for alignment
    • Functional requirements play a big role in identifying useful data: languages, coding knowledge, etc.
Loss and ML objectives
  • ML objective: how does it impact performance
  • loss function
    • available many options for different cases, but creating a custom one is complex. but possible
    • can differ and be influenced by business requirements or the task at hand
    • may be updated with regularization terms
  • Interview questions
    • What is the loss function that aligns with the chosen ML objective? Do you use a single loss function or multiple ones? If multiple, how do you combine them to optimize the training process? What is the purpose of each loss function?
Task-specific challenges and mitigations
  • For resource-heavy tasks like video generation or training LLMs from scratch
  • mixed precision training uses both 16-bit (half-precision) and 32-bit (single-precision) floating point numbers to speed up model training and reduce memory usage.
  • latent diffusion models
  • gradient checkpointing to reduce memory usage during model training by saving only a selected subset of activations. During the backward pass, the missing activations are recomputed.
  • parallelization techniques, distributed computing enable efficient training of large models by utilizing multiple machines or devices in parallel.
    • data parallelism
    • Model parallelism
    • Hybrid parallelism
  • Interview questions
    • What are typical training challenges specific to the chosen ML algorithm? How can these challenges be mitigated to ensure effective training?
Model inference
  • sampling strategy for generating new data or outputs from the trained generative model
  • See also inference optimization
  • Interview questions
    • How do different sampling strategy (e.g., top-k, top-p) work? What are the pros and cons of each? How does the sampling process work? How do they affect the quality and creativity of the model’s output? What methods can you use to make the sampling process faster without compromising quality?

Evaluation

Offline Evaluation

  • model evaluation on pre-collected data, before putting into production
    • GenAI applications assume evaluating generated content without fixed ground truth label
    • LLM metrics
  • essential to assess the generated content from multiple angles
    • text-to-image: ensure the generated image is both high-quality and that it aligns with the given text prompt
    • chatbot: capability across different tasks such as mathematics, common-sense reasoning, and code generation
  • Interview questions
    • Which offline metrics best evaluate the quality and accuracy of the generative model? How do these metrics measure the diversity, realism, and coherence of generated outputs?
    • For generative models, especially in creative fields (e.g., text generation, image synthesis), human feedback is vital. How can human reviewers complement automatic evaluation? What methods (surveys, A/B testing, expert reviews) will best assess the model’s performance? How can you mitigate the effects of subjectivity among different reviewers?

Online Evaluation

  • assesses how the model performs in production (i.e., after deployment)
  • choosing online metrics is more subjective and requires input from product owners and stakeholders
  • Examples of LLM metrics:
    • Click-Through Rate (CTR): Percentage of users who click on content or suggestions.
    • Conversion Rate: Percentage of users who complete a desired action (e.g., purchase, subscription) after interacting with the system.
    • latency (Inference time): Time taken by the model to generate content.
    • Engagement Rate: Measure of user interaction, such as time spent engaging with the system.
    • Revenue Per User: Average revenue generated per user.
    • Churn Rate: Percentage of users who stop using the system over a given period.
    • User Satisfaction: Direct feedback from users on their experience with AI-generated content.
    • User Retention: Percentage of users who continue to use the system over a specific period.
    • Completion Rate: Percentage of tasks (e.g., text completions, image generations) successfully completed by the model.
  • Interview questions
    • Which metrics are crucial for assessing the effectiveness of the generative model in a live production environment? How do these metrics align with the business goals, such as enhancing user creativity, boosting engagement, or driving product innovation?

Overall ML system design

Deployment and monitoring

Resources


Transclude of base---related.base


table file.inlinks, filter(file.outlinks, (x) => !contains(string(x), ".jpg") AND !contains(string(x), ".pdf") AND !contains(string(x), ".png")) as "Outlinks" from [[]] and !outgoing([[]])  AND -"Changelog"