AMD has released an errata for their Zen 2-based second generation EPYC processors, stating that "A core will fail to exit CC6 after about 1044 days after the last system reset." This equates to roughly 34 months or just under three years of total uptime. However, some sysadmin sleuths on Reddit and Twitter have calculated that the actual time is 1042 days and 12 hours. The issue arises because the CPU REFCLK counts 10ns ticks in a 54-bit signed integer, and an overflow occurs at 1042.4999 days after counting just over 9 quadrillion of these ticks. Once this overflow occurs, the cores are stuck in a zombie state and will not take any external interrupt requests until the power switch is turned off and back on again.

While it is impressive that this problem was discovered, it also suggests that more than one system has been running for almost three years without a single restart. This puts EPYC "Rome" out of the running for any possible awards for longest running systems. However, it serves as a reminder to initiate system updates or patches for other vulnerabilities that have been discovered in the four years since that generation of processor was first launched. AMD does not plan to issue a fix for the CC6 bug, instead recommending that administrators disable CC6 to avoid the cores entering the zombified state or simply initiate a restart every once in a while before the time limit expires.