Why are LLMs so small?

March 1, 2024 9:42 am

Why are LLMs so small?

LLMs are compressing information in a wildly different way than I understand. If we compare a couple open source LLMs to Wikipedia, they are all 20%-25% smaller than the compressed version of English wikipedia. And yet you can ask questions about the LLM, they can – in a sense – reason about things, and they know how to code.

NAME	SIZE
gemma:7b	5.2 GB
llava:latest	4.7 GB
mistral:7b	4.1 GB
zephyr:latest	4.1 GB

Contrast that to the the size of English wikipedia – 22gb. That's without media or images.

Shannon Entropy is a measure of information desitity, and whatever happens in training LLMs gets a lot closer to the limit than our current way of sharing information.

Why are LLMs so small?

April 2024

March

February

2023

October

August

July

June