No. of Recommendations: 9
Save the Data
by Martyn Wendell Jones
When Jefferson and Juliana McMillan-Wilhoit received an alert last Friday morning that the Center for Disease Control and Prevention’s webpage dedicated to the Youth Risk Behavior Surveillance System (YRBSS) had been taken offline, they knew they had to move quickly.
The McMillan-Wilhoits are the founders of Flourish and Thrive Labs, a consulting firm that works with state and local public health departments on technology implementation and strategic planning. They were aware that they were likely seeing the beginning of the CDC’s attempt to meet the imminent deadline of President Donald Trump’s executive orders requiring government agencies to scour any trace of DEI or “gender ideology.” Jefferson and a colleague started searching out and downloading as many of the agency’s datasets as they could.
“At that point, the CDC website was still up and working, so we were able to download a lot of data before the outage,” he told The Bulwark. A file transfer protocol site where most of the data was stored remained up after the agency’s public-facing webpages went down, Juliana added, which extended the window for them to save the data.
“People didn’t know that FTP site exists, because if you’re not in the research world, you have no reason to know it exists,” Jefferson said. But it wasn’t long before even that site was taken down. The McMillan-Wilhoits believe around 1,000 CDC-related pages ended up going dark in the wake of Trump’s orders. The New York Times estimates around 3,000 pages disappeared.
Flourish and Thrive Labs was ultimately able to preserve about ten datasets, some of which contain decades’ worth of raw data. The recovered material includes vital statistics, immunization information, the YRBSS, the Behavioral Risk Factor Surveillance System (BRFSS), the National Health Interview Survey, and other sources, many of which still turn up “404” page-not-found error messages on the agency’s website.
“All of this data, beyond being publicly available for reasons of transparency, is essential for public health programming,” Jefferson said. Health departments across the country rely on it to do community health improvement planning, and without access to the agency’s studies and surveys, making those plans becomes virtually impossible.
Normally, Jefferson said, the CDC enables public health professionals to keep track of the progress of seasonal diseases like the flu and RSV almost in real time, with information being updated daily or weekly. That service helps them respond quickly to changes in the environment.
“We may have to pivot how we’re vaccinating, or pivot our prevention efforts,” he said. When those tools disappear, “we can’t do that with enough time to make any difference.”
“The impact for public health and healthcare providers is huge,” Juliana said. “This is really impacting local public health systems.”
Having copies of the raw datasets has also made it possible for the McMillan-Wilhoits’ firm to begin assessing the scope of changes that have been made to the curated data files being restored to the CDC website since the blackout. The primary change they’ve been able to detect so far in these restored files—which represent the aggregations of raw data that make higher-level trends across populations visible—is that variable labels for “gender” have been switched to “sex,” reflecting the demands set out in Trump’s executive order targeting “gender identity.” But the fact that these variable labels were changed in the curated data files means those files were “touched” during the blackout—which is to say, it creates the possibility that there could be underlying data-level modifications, as well.
“I would only trust the raw data, because during the outage, I can’t guarantee that any of the data wasn’t changed” in files where variable labels were switched, Jefferson said.
Given the importance of the data Flourish and Thrive Labs recovered, the firm has been making it available to state and local health departments, many of which are in the middle of their health improvement planning cycles. There is no charge for access, but the McMillan-Wilhoits have refrained from making the data public or widely accessible. They were spooked by others who approached them for access seemingly with an intent to monetize it. [who else has this data? Elon has this data- Bill Z]
“This is vital data, and we want to make it available” to those who need it most, Juliana later wrote in an email. The couple plan on setting up a protected website where “anyone in state and local health” can access the recovered information, but in the meantime, they are inviting anyone who urgently needs it and works in public health to contact them at cdcdatarequest [at] fandtlabs.com.