Machine Learning for Proteomics
NAIAD identifies novel ciliary proteins in Chlamydomonas reinhardtii using protein language models, achieving 95% accuracy in predicting subcellular localization.
A single-celled green algae with two flagella, serving as a model organism for studying ciliary biology. Its genome encodes ~17,000 proteins, but only 187 are confirmed ciliary proteins.
NAIAD combines ESM-2 protein embeddings (650M parameters) with Gene Ontology annotations to predict which of the remaining proteins localize to cilia.
Model Performance
Trained on 187 known ciliary proteins using 5-fold cross-validation. The model outperforms the CilioGenics database baseline (0.81 AUC) by 14 percentage points, demonstrating that protein language models capture meaningful structural features predictive of subcellular localization.
Novel Candidates
Gene Query
Enter gene IDs to retrieve ciliary localization probabilities from our model.
View Full Database →