Jeremiah Lowin on How Transformers in GPT Models Work

Source: Invest Like the Best on Spotify

Why Listen To This?

An interview with Jeremiah Lowin, a graduate from Harvard and the Founder and CEO of Prefect, a dataflow automation company.

‘4 — What are transformer models?
- They transform some sequence of inputs into a different sequence of outputs following a complex set of rules/heuristics.
- An analogy would be translation.
  - Translating one language into another is not simply translating word by word.
‘7 — How do Generative AI projects like Stable Diffusion, Midjourney, etc. work?
- The goal is to take the information we have at the beginning and end of the pipeline intact even as we change the form of the information from text to pixels.
  - This happens in the latent phase.
- In the latent phase, the concept(say query) is broken down into as many uniquely identifiable parts as possible.
- Thereafter it becomes easy to tweak some parameters to get the required output.
‘15 — Diffusion in pixel space vector vs latent space vector:
- Diffusion in pixel space is limited when compared to latent spaces.
- Eg.
  - DALL E → Pixel Space
  - Stable Diffusion → Latent Space
‘26 — Training a model.
- Components of a model:
  - Architecture → The framework that undergoes transformation to give a particular output.
  - Parameters of the architecture → Weights, which tell the architecture how to behave.
- Initially, the parameters are random and when you run an input, it doesn’t know how to process it.
- Training is the process of changing those weights so that they encode information such that when you run an input through the model you get a sensible output.