Slimming Down AI Models - Pruning Layers Drastically Cuts Size While Retaining Accuracy
-
Pruning deep layers of large language models like Llama 2 can reduce memory needs by 75% with minimal loss of accuracy. This allows them to run on consumer GPUs.
-
The "lottery ticket hypothesis" states that large AI models contain smaller sections that can reproduce the full model's accuracy. This insight is now being used to shrink models.
-
Middle layers of models like GPT seem to store factual information. Deeper layers can be removed with little performance impact until a threshold when accuracy plunges.
-
Removing up to 50% of Llama 2's layers doesn't affect its accuracy on benchmarks like question answering. More research is needed to know if other tasks would be impacted.
-
It's unclear if current training properly utilizes deeper layers, or if shallow layers play a critical role in storing knowledge. More research could optimize model efficiency.