• Lovely visualization. I like the very concrete depiction of middle layers "recognizing features", that make the whole machine feel more plausible. I'm also a fan of visualizing things, but I think its important to appreciate that some things (like 10,000 dimension vector as the input, or even a 100 dimension vector as an output) can't be concretely visualized, and you have to develop intuitions in more roundabout ways.

    I hope make more of these, I'd love to see a transformer presented more clearly.

  • For the visual learners, here's a classic intro to how LLMs work: https://bbycroft.net/llm
  • This is just scratching the surface -- where neural networks were thirty years ago: https://en.wikipedia.org/wiki/MNIST_database

    If you want to understand neural networks, keep going.

  • Oh wow, this looks like a 3d render of a perceptron when I started reading about neural networks. I guess essentially neural networks are built based on that idea? Inputs > weight function to to adjust the final output to desired values?
    • Yes, vanilla neural networks are just lots of perceptrons
  • Spent 10 minutes on the site and I think this is where I'll start my day from next week! I just love visual based learning.
  • I like the style of the site it has a "vintage" look

    Don't think it's moire effect but yeah looking at the pattern

  • This visualizations reminds me of the 3blue1brown videos.
    • I was thinking the same thing. Its at least the same description.
  • Great explanation, but the last question is quite simple. You determine the weights via brute force. Simply running a large amount of data where you have the input as well as the correct output (handwriting to text in this case).
    • "Brute force" would be trying random weights and keeping the best performing model. Backpropagation is compute-intensive but I wouldn't call it "brute force".
      • "Brute force" here is about the amount of data you're ingesting. It's no Alpha Zero, that will learn from scratch.
        • What? Either option requires sufficient data. Brute force implies iterating over all combinations until you find the best weights. Back-prop is an optimization technique.
  • I get 3fps on my chrome, most likely due to disabled HW acceleration
  • Nice visuals, but misses the mark. Neural networks transform vector spaces, and collect points into bins. This visualization shows the structure of the computation. This is akin to displaying a Matrix vector multiplication in Wx + b notation, except W,x,and b have more exciting displays.

    It completely misses the mark on what it means to 'weight' (linearly transform), bias (affine transform) and then non-linearly transform (i.e, 'collect') points into bins

    • > but misses the mark

      It doesn't match the pictures in your head, but it nevertheless does present a mental representation the author (and presumably some readers) find useful.

      Instead of nitpicking, perhaps pointing to a better visualization (like maybe this video: https://www.youtube.com/watch?v=ChfEO8l-fas) could help others learn. Otherwise it's just frustrating to read comments like this.

  • Great visualization!
  • very cool stuff