Study Finds Large Language Models Store Some Facts in Simple Ways, Allowing Targeted Correction

Researchers found that large language models use simple linear functions to retrieve and decode stored facts about relations between subjects and objects.
They developed a method to estimate these decoding functions and test if they can accurately retrieve facts when the subject changes.
The decoding functions worked over 60% of the time, showing some facts are stored in this linear way, but not everything is.
They created "attribute lenses" to visualize where information about a particular relation is stored across the model's layers.
This approach could help identify and correct false information stored in the models, reducing tendency to give wrong answers.