Text-to-Video Demand Could Require $21B in GPUs, Straining Nvidia Production Capacity

720,000 high-end Nvidia GPUs estimated to be required to support text-to-video for TikTok and YouTube creator community
Sora AI model alone requires 10,500 GPUs to train and can only generate 5 mins of video per GPU per hour
Inference (generating new videos) will require more compute power than initial model training as adoption grows
Nvidia shipped 550,000 H100 GPUs in 2023; top 12 customers have 650,000 - Meta and Microsoft have 300,000
Cost of required GPUs would be $21.6 billion - nearly the entire market cap of AI tokens