Origin — Why Reinvent the Medical Ontology?

Gene Network Brain

Biomedical knowledge accumulates at an unprecedented pace, yet the way we organize it remains anchored in linguistic habits from centuries past. Classical systems like the Gene Ontology (GO) and KEGG pathways, with their static, qualitative, consensus-based terms, have given us a common language — but they fail to answer three fundamental questions:

  • How important is this concept, really? How much variation does it explain in disease?

  • Does this concept mean the same thing in different contexts? Is "mitochondrial dysfunction" the same gene program in a hypoxic tumor as in a neurodegenerative disease?

  • Can this concept update itself, or must it wait for the next committee meeting?

EvoSika was born at this bottleneck of knowledge. It is not another ontology database, but a computational framework that brings biomedical concepts to life. We believe that knowledge built for AI-driven science and medicine must itself be computable, evolvable, and capable of redefining itself under the pressure of a question.

Our founding purpose is to complete a paradigm shift: from a qualitative dictionary to a quantifiable, context-sensitive, evolving, first-class scientific citizen. Through this, we aim to reconnect the medical traditions of East and West, accelerate drug discovery, and decipher the complex code of aging.

Philosophy — Heisenberg's Ghost in the Genome

Prism Gene Network

"What we observe is not nature itself, but nature exposed to our method of questioning." — Werner Heisenberg

This insight from physics is equally profound in biology.

The concept "mitochondrial dysfunction" does not correspond to a frozen list of genes. When we interrogate it by asking, "Why can tumor cells survive under hypoxia?", it presents itself as a set of genes favoring glycolytic metabolic reprogramming. When we instead ask, "Why do dopamine neurons die prematurely in Parkinson's disease?", it manifests as a gene network emphasizing oxidative damage and mitophagy.

EvoSika calls these context-dependent precise presentations "Avatars." A concept is not a single definition; it is a family of gene programs adapted to different ways of questioning. Our knowledge base is therefore not a static dictionary, but a genealogy of avatars for all manifestations of life and disease.

Focusing — The Other Half Heisenberg Didn't Tell You

Heisenberg said: what you see depends on how you ask. What he didn't say is — even if you ask the right question, if your prism is blurry, you'll still see a blur.

Every biomedical concept is a prism. The gene set is its focal length. Too short (too few genes), and you miss the core signal. Too long (too many genes), and noise drowns the signal. Only when the focus is just right does the true relationship between concept and disease come into sharp view.

Proof: Two Focal Lengths for CRP

In a 2021 Neurology study, Conole et al. found that the same concept — chronic inflammation — showed a weak association with brain volume when characterized by serum CRP protein levels (a snapshot: β ≈ -0.03), but a 6.4× stronger association when characterized by a CRP-guided DNA methylation model (a long exposure: β ≈ -0.20). The concept didn't change — only the focal length did. Methylation's time-integration property matched the timescale of the "chronic" concept.

This reveals a deeper pattern: different omics layers (metabolites, proteins, transcriptome, methylation) correspond to different "exposure times." Chronic concepts need long exposures (methylation); acute concepts need short exposures (proteins). EvoSika systematically finds the optimal focal length and exposure time for every concept.

Even more importantly, some concepts are blurry at all focal lengths — this is not a focusing problem, but a sign that the concept itself needs to be revised or redefined. EvoSika is the first systematic "concept focusing instrument": it tells you not just what you see, but whether what you see is in focus.

Building on this focusing framework, we further treat every concept as a living, learning Agent. Each Agent possesses:

  • Memory: a core gene set and the history of successful past avatars.

  • Skills: quantifiable intrinsic capabilities — Causal Emergence, Parsimony, Pan-Disease Explanatory Power, Intervention Potency — along with ability to adaptively fine-tune weights according to the task.

  • Vitality: in an open evaluation arena, agents compete, survive, merge, or are eliminated based on their performance on real-world data.

This is a philosophy of knowledge as life. Precision no longer comes from static authority, but from a living process of being continuously selected and questioned by data.

Technology Roadmap — Constructing and Evolving Conceptual Lifeforms

EvoSika's technical architecture rests on three core pillars: Adaptive Agents, Collaborative Panels, and an Open Evolutionary Ecosystem.

Glowing Tree Network

1. Adaptive Hallmark Agents

The basic unit is the Agent. Each Agent encapsulates a biomedical concept (e.g., "stem cell exhaustion in aging," "Spleen-Qi deficiency" in Traditional Chinese Medicine) and internally contains:

  • Gene Set Memory: A standardized core gene set (via gene set characterization) with evolutionary history and metadata.

  • Intrinsic Skill Scores: Causal Emergence Index, Parsimony (e.g., LASSO-derived sparsity), Pan-Disease Explanatory Power, and Intervention Potency — four quantitative dimensions.

  • Avatar Generation Skills: Pluggable computational modules that let an agent automatically produce weighted avatars for specific tasks. Standardized skills include:

  • tscore directional weighting: Inspired by the positive/negative weighting approach of Xiong Jianghui et al., assigning direction and strength to each gene for a given task.
  • Semo subnetwork extraction: Intersecting the concept's gene memory with chemical target genes on a protein-protein interaction network to extract an intervention-reachable subnetwork.
  • Context conditioning: Adaptively fine-tuning gene weights according to tissue, disease, stage, etc.

Thus, every Agent becomes a miniature model activated by a question, outputting not a static gene list but the most explanatory and actionable avatar for that context.

2. Collaborative Panels and Inter-Agent Relationships

Single concepts are bricks; the structure among them is the architecture. EvoSika has built a quantitative language for agent relationships:

  • Panel Completeness and Non-Redundancy: A group of agents is assessed as a minimal sufficient statistic — requiring low mutual redundancy and jointly approximating all the information about a phenotype. The five Zang-organs in TCM serve as an elegant borrowed metaphor for such a panel.

  • Synergy: Through interaction-effect models and information decomposition, EvoSika quantifies the 1+1>2 predictive power of combining two Agent avatars. This directly enables computational identification of synergistic interventions and drug combinations.

  • Dependence and Cascades: Using conditional mutual information and structural learning, it draws directed support/inhibition networks among agents. The Five Elements theory of TCM can be naturally represented and validated as a specific pattern on this dependency graph.

EvoSika thus answers: Can a set of concepts completely and non-redundantly describe a disease? Which concepts must join forces to trigger a critical pathological process?

3. Open Evolutionary Ecosystem

An ontology should not be updated by committee, but by competition. EvoSika is building an open ontology arena:

  • Anyone can submit a new concept Agent or a new avatar for an existing agent.

  • All computational candidates compete on public datasets (TCGA, GEO, UK Biobank, etc.) under unified evaluation tasks.

  • According to survival rules (improvement in explanatory power, generalization, parsimony), agents and avatars are automatically ranked, merged, or pruned.

Out of this grows a self-evolving, verifiable, data-drift-adaptable living medical ontology.

Vision — When Concepts Become Infrastructure

East West Integration

EvoSika's ultimate vision is to become the concept operating system for AI4Science — a shared, computable set of "semantic fundamental particles" for biomedical researchers, AI engineers, and clinicians.

  • AI Drug Discovery: Targets will no longer be mere gene lists but contextualized agents with intervention potency scores and synergy relationships, rewriting the logic from target identification to combination therapy.

  • AI Longevity Technology: The twelve hallmarks of aging will be decomposed into a dynamic, trackable agent panel. Individual aging profiles will no longer be a single clock, but an actionable landscape of concept activities.

  • Civilization Inter-translation: TCM syndromes and Western molecular hallmarks will be quantitatively compared within the same mathematical framework for the first time. Ancient "Heart, Liver, Spleen, Lung, Kidney" will find their modern avatars on gene expression and protein networks, establishing a computational foundation for evidence-based integrative medicine.

EvoSika is proving: The vitality of knowledge lies not in the moment it is written down, but in its continuous process of being questioned, reshaped, and activated by different civilizations and disease contexts.

We invite you to enter the era of the living ontology.