Study Finds Large Language Models Store Some Facts in Simple Ways, Allowing Targeted Correction
-
Researchers found that large language models use simple linear functions to retrieve and decode stored facts about relations between subjects and objects.
-
They developed a method to estimate these decoding functions and test if they can accurately retrieve facts when the subject changes.
-
The decoding functions worked over 60% of the time, showing some facts are stored in this linear way, but not everything is.
-
They created "attribute lenses" to visualize where information about a particular relation is stored across the model's layers.
-
This approach could help identify and correct false information stored in the models, reducing tendency to give wrong answers.