Understanding GPT - Transformers
Part 2/3 - Understanding how modern LLMS work. From RNNs, to transformers, towards modern scaling laws.
Part 2/3 - Understanding how modern LLMS work. From RNNs, to transformers, towards modern scaling laws.
Introduction ChatGPT has took the world by storm, and has possibly started the 6th wave. Given its importance, the rush to build new products and research on top is understandable. But, I’ve always liked to ground myself with foundational knowledge on how things work, before exploring anything additive. To gain such foundational knowledge, I believe understanding the progression of techniques and models is important to comprehend and appreciate how these LLM models work under the hood....
Introduction Loss functions tell the algorithm how far we are from actual truth, and their gradients/derivates help understand how to reduce the overall loss (by changing the parameters being trained on) All losses in keras defined here But why is the loss function expressed as a negative loss? Plot: As probabilities only lie between [0-1], the plot is only relevant between X from 0-1 This means, that it penalises a low probability of success exponentially more....