Why are LLMs so small?

LLMs are compressing information in a wildly different way than I understand. If we compare a couple open source LLMs to Wikipedia, they are all 20%-25% smaller than the compressed version of English wikipedia. And yet you can ask questions about the LLM, they can – in a sense – reason about things, and they know how to code.

NAME	SIZE
gemma:7b	5.2 GB
llava:latest	4.7 GB
mistral:7b	4.1 GB
zephyr:latest	4.1 GB

Contrast that to the the size of English wikipedia – 22gb. That's without media or images.

Shannon Entropy is a measure of information desitity, and whatever happens in training LLMs gets a lot closer to the limit than our current way of sharing information.

Why are LLMs so small?

so much knowledge in such a small space

Contents

5 year old hacking chatgpt

POSSE rss to mastodon

keep it local and then share