/Tech1d ago

Study Shows Clustering Algorithms Yield Inconsistent Long Covid Subgroups

1590303511.9K

Original post

Patient-Led Research Collaborative@patientled

We have a new paper published!

We asked a simple question about #LongCovid “phenotypes": if you take the same patients and the same symptoms, but run different clustering algorithms, do you get the same patient subgroups?

Short answer: no. 🧵 1/

1:06 PM · Jun 23, 2026 · 10.4K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS159LIKES4

Patient-Led Research Collaborative@patientled

Thank you to @the_tessallator for leading & analysis, @rusty_cjm and @leothesaffer for analysis, and @ahandvanish, @leticiasaurus, @ChronicResearch, Alison Cohen, @xuanalogue, and @tessfalor for your work on this paper! 16/

https://academic.oup.com/ooim/article/7/1/iqag010/8707854

1d1594

RETWEETS3

Tessa Green@the_tessallator

Really proud of myself for getting this one over the finish line. Super interesting chance for me to examine clusters robustness is a new-to-me domain, and great to be part of OOI’s special issue on patient-led work

Patient-Led Research Collaborative@patientled

We have a new paper published!

We asked a simple question about #LongCovid “phenotypes": if you take the same patients and the same symptoms, but run different clustering algorithms, do you get the same patient subgroups?

Short answer: no. 🧵 1/

1d1.5K112

REPLIES1

Patient-Led Research Collaborative@patientled

Our asks for future phenotyping work: ✅ report sensitivity to algorithm choice & subsampling, not just internal scores ✅ capture the full breadth of symptoms, including severity and trajectory ✅ integrate biomarkers to define real endotypes, not just symptom boundaries 14/

1d833

Patient-Led Research Collaborative@patientled

Takeaways:

1) Symptoms alone are likely not the best way to define #LongCovid phenotypes. We need more phenotyping by biomarkers!

2) Studies using fewer symptoms, fewer patients, or a single clustering method are likely detecting phenotypes that aren't robust or repeatable. 7/

1d894

Patient-Led Research Collaborative@patientled

With any clustering, caution is needed to ensure that algorithmically imposed boundaries are not mistakenly interpreted as biologically discrete or clinically stable subtypes. #LongCovid 13/

1d843

Patient-Led Research Collaborative@patientled

This paper was led entirely by people with #LongCovid with machine learning backgrounds, with help from people with other IACCs or those taking care of people with Long COVID. 15/

1d943

Patient-Led Research Collaborative@patientled

That said, some patterns did recur across all 3 methods:

🔹 A high-burden, multi-systemic cluster appeared every time 🔹 Higher symptom burden tracked with more severe physical & cognitive PEM 🔹 Symptom burden tracked with demographics (see next tweet)

#LongCovid 8/

1d413

Patient-Led Research Collaborative@patientled

The data: 6,031 adults with #LongCovid from our patient-led international survey.

Each person reported presence/absence of 162 symptoms across 10 organ systems, plus post-exertional malaise (PEM) severity and demographics. 2/

1d772

Patient-Led Research Collaborative@patientled

We ran 3 different unsupervised ML methods on the exact same symptom matrix:

A) autoencoder + HDBSCAN B) ensemble UMAP + k-means consensus C) latent class analysis (LCA)

Then we asked how much they actually agreed. #LongCovid 3/

1d572

Patient-Led Research Collaborative@patientled

Even when two methods both found a "high symptom burden" cluster, they didn't contain the same patients.

A patient could land in a high-burden neurocognitive cluster under one algorithm and a generic multi-system cluster under another. 5/

1d452

Patient-Led Research Collaborative@patientled

Each method produced clinically plausible clusters — high-burden neurocognitive groups, autonomic groups, pain-dominant groups. They all looked reasonable on their own.

But agreement between methods was low. Pairwise scores ranged from just 0.13 to 0.40. 4/

1d422

Patient-Led Research Collaborative@patientled

Consistently across methods: 🔹 Low-burden clusters → higher average age, lower proportion of women 🔹 High-burden clusters → younger, more women, and more severe physical & cognitive PEM

Women were over-represented in high-burden groups throughout. 9/

1d372

Patient-Led Research Collaborative@patientled

"Cleaner" clusters, as seen in other phenotyping papers, may just be an artifact of measuring less.

When we dropped symptoms, the optimal cluster count fell. 6/

1d372

Patient-Led Research Collaborative@patientled

Two subgroups were reproduced in 2 out of the 3 methods: one with prominent speech & cognitive-linguistic difficulty (B and C), and one reporting PEM but minimal sleep disturbance (A and C). 10/

1d352

Patient-Led Research Collaborative@patientled

Our results support using symptom clusters as exploratory tools to generate hypotheses and help communicate about patterns of #LongCovid illness, rather than as rigid labels that define eligibility or predict response to specific treatments. 12/

1d312

Patient-Led Research Collaborative@patientled

Bottom line for clinicians & researchers: symptom clusters are useful exploratory and communication tools, not fixed diagnostic types.

Don't treat single-method clusters as biologically discrete subtypes or use them to define trial eligibility without checking robustness. 11/

1d312