Tech - Tricky interview questions

scroll ↓ to Resources

Contents

LLM

Other

  • why can an LLM respond differently to the same input? And if the temperature parameter is set to 0?
    • NB!: LLMs are autoregressive that is one altered output token will change all subsequent generations.
    • non-determinism on vendor’s side
      • the underlying model version has changed on the side of the host, cache altered, different set of GPUs used for inference
    • fundamental issues with batches and GPU calculations
      • floating-point non-associativity, batch invariance
    • Thinking Machines: Defeating Nondeterminism in LLM Inference
  • how does structured output work under the hood?
  • why does increasing the temperature parameter may lead to increased latency?
    • flatter log probs distribution increases token count (until the end-of-sequence token is selected)
    • worse speculative decoding yield. draft-verify methods accept less draft tokens
    • output length difference hurts continuous batching efficiency
  • I want to have a certainty metric for a classification task solved by an LLM. What approaches are there?
    • constrained decoding to output only a label, take the log probs and convert to probabilities
    • self-consistency voting: sample the model n times and use max vote share, margin between top1 and top2 or entropy as confidence
    • use embedding model to encode x and train a classic classifier (logistic regression) on that, this will give direct probabilities
    • ask an LLM to self-report confidence (bad choice)
    • if not talking about LLMs, then:
      • bootstraping to asses confidence interval around predicted probabilities
      • use an ensemble of multiple models, variance among their prediction is an indicator of uncertainty
      • Bayesian methods to quantify uncertainty by treating model parameters as distributions rather than point estimates
  • how to demonstrate LLM system performance to stakeholders?
    • technical and business metrics for Metrics
    • demos
      • chatbot demo where users are presented with two responses based on their query: one response presents info from FAQ exactly as it is in the docs (no summarization, no generative action used), another response - information after the GenAI block. Ask users to vote on the best option to subjectively estimate the quality of GenAI and to understand whether users need a chatbot or a search-bot.
      • show users your RAG system and Gemini\ChatGPT side-by-side, answering the exact same questions. Understandably, bare-bone LLM doesn’t have access to internal documentation and. therefore, the comparison is not technically fair, but it may create a positive impression for non-technical stakeholders. If I also cache the demo-prompts, than the custom system will also look lightning-fast.
  • how to estimate the cost of the genAI project vs potential economical benefits?
  • why decoder-only? What are the benefits? Why good embedding models are not decoder-only?

Machine Learning

  • why not making decision tree of higher depths level?
    • leads to overfitting. we want to keep dispersion of each tree in the ensemble low, therefore usually the depth is 5-6
  • what happens if we remove the first or the last tree from random forest and from xgboost?
    • base algorithms in random Forest are independent so removing any of them will not affect the outcome much. For xgboost on the contrary base algorithms are sequential and that is removing the first one will have large effects, but deleting the last one will not have much effect
  • your precision is high and recall small and your client wants the opposite. how to do that in a very fast manner?
    • lower the decision threshold

Resources


Transclude of base---related.base


table file.inlinks, filter(file.outlinks, (x) => !contains(string(x), ".jpg") AND !contains(string(x), ".pdf") AND !contains(string(x), ".png")) as "Outlinks" from [[]] and !outgoing([[]])  AND -"Changelog"