Meta's Logarithm System Ingests Over 100GB/s Logs for AI Model Debugging
-
Logarithm is a hosted, serverless system that ingests, indexes, and enables querying of application and system logs at Meta. It scales to handle over 100GB/s of logs.
-
Logarithm's data model represents logs as streams of text lines with metadata, supports regex-based extraction into the metadata, and tiered storage for retention.
-
Logarithm powers AI model training debugging by ingesting continuous systems telemetry logs and model telemetry from jobs. It enables correlated log analysis.
-
Logarithm separates ingestion and query clusters, uses lightweight disaggregated indices in a distributed cache for low latency queries.
-
Logarithm provides strong guarantees on availability, data durability and completeness, ingestion to query freshness, and interactive query latencies.