The Proof

Published benchmark. $10. Every hard question answered.

8 min read

This is not a pitch. It is a published result.

A nine-billion-parameter open model — Qwen 3.5, fine-tuned with QDoRA on expert-quality drug interaction data and augmented with retrieval over FDA-approved drug labels — outperforms GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro on pharmacological mechanism identification.

It identifies 92% of drug interaction mechanisms. GPT-5.4 identifies 69%. Total cost: ten dollars.


The claim.

We are claiming: A small, open, expert-trained model with retrieval over authoritative FDA data identifies drug interaction mechanisms more accurately than frontier models — while citing its evidence. Not AI that replaces the pharmacist. AI the pharmacist can trust.

We are NOT claiming: A 9B model beats frontier models on general knowledge, creative writing, or any task outside its trained domain.

The claim is narrow and specific. That is what makes it credible — and generalizable, because the mechanism (expert training + structured retrieval) applies to any knowledge-intensive professional domain.


The results.

ModelMechanism RecallSeverity AccuracyManagementCitationsConfiguration
Qwen 3.5 9B + QDoRA + RAG0.9170.400100%90%Fine-tuned, retrieval over FDA labels
GPT-5.40.6920.900100%0%Default configuration
Claude Opus 4.60.8250.700100%50%Default configuration
Gemini 3.1 Pro0.8920.700100%100%Default configuration

10 held-out drug interaction scenarios from DrugBank, evaluated against ground truth. Frontier models tested via OpenRouter API. Our model runs locally on a MacBook Air (Q4 quantized, 5.2 GB).

Our model achieves the highest mechanism recall and is the only one that consistently cites authoritative evidence (90%) AND provides management recommendations (100%). GPT-5.4 wins on severity classification but never cites sources — in clinical practice, an unsourced answer is not actionable.


What it cost.

PhaseCost
Training data (4,573 clinical interaction assessments)~$8
Fine-tuning (QDoRA, 3 epochs, A100, 54 minutes)~$1.30
RAG corpus (757 FDA drug labels, 5,622 passages)free
Benchmarking (frontier model comparison via OpenRouter)~$1
Total~$10

The fine-tuning layer — where professional expertise lives — costs ten dollars. Compare this to frontier model training costs: hundreds of millions per run.


Reproduce it.

All code is at scripts/ai-poc/ in our open-source repository:

  1. fetch_drugbank.py — download drug interaction pairs from DrugBank
  2. fetch_openfda_interactions.py — fetch FDA drug label text (free, no auth)
  3. build_retrieval_index.py — build FAISS vector index over FDA clinical text
  4. generate_raft_pairs.py — generate expert-structured training pairs (~$8)
  5. export_training.py — export as JSONL with train/validation/test split
  6. train.sh — QDoRA fine-tune on cloud GPU (~$1.30)
  7. merge_weights.py — merge adapter weights for fast inference
  8. benchmark_local.py — full comparison against frontier models

Total wall-clock time: approximately one day. Total cost: approximately ten dollars.

If you can prove us wrong, do it. We publish the code because we want you to try.


Hard questions. Honest answers.

If you are reading this far, you are the kind of person who does not join things easily. Good. We are not looking for enthusiasm. We are looking for judgment.


"Why would experts contribute?"

Revenue. 95% of consumer revenue flows to experts. Guild members get full AI access for $5 instead of $20.

AI that works for them. Trained by verified experts in your field, with retrieval over the databases you actually use.

Ten minutes, not a career change. Minimum contribution: a five-second yes/no judgment.


"Why $5/month? Why not free?"

We charge $5 so we never have to take venture capital. No investor would allow 95% to flow to contributors. No investor would accept a constitution that forbids labor-replacement design. The $5 is the price of independence.

Contributing is free — anyone can register and do quick reviews. Guild membership ($5/month) unlocks full review tools, AI access, and revenue share.


"95% to experts — how is that sustainable?"

Two revenue streams. Guild fees ($5/month per expert) fund the core team. The company also receives 5% of consumer revenue. The Constitution guarantees it — the company cannot increase its share beyond 5% without a supermajority vote of Guild members. Infrastructure costs are published monthly.


"What can this do that ChatGPT cannot?"

Five things no frontier model can guarantee:

  1. Source attribution. Every claim traces to a named expert who verified it.
  2. Deterministic computation. Tax brackets, drug doses, building codes — computed, not predicted.
  3. Temporal validity. Units expire when the law changes. LLMs confidently cite last year's rules.
  4. Jurisdiction specificity. Czech tax law is not German tax law. One model cannot serve both.
  5. Consent and compensation. Every expert who contributed is named, consented, and paid.

"What if frontier models just get better?"

The gap is architectural, not a performance lead. Frontier models predict what answers look like. We compile how professionals actually reason — into verified units that execute deterministically. A frontier model can get better at guessing the right tax calculation. Our system runs the tax calculation. That is a structural property, not a lead that erodes.

The base model gets better — we swap it in. The expert reasoning is the moat, not the model weights.


"What professions do you NOT cover?"

Deliberately: software engineering, data science, design, management consulting, creative work. These are where LLMs already work well. We build for the 124 professions where AI is weakest — rule-dense, jurisdiction-specific, high-stakes.


"How do you prevent gaming?"

Three layers: automated anomaly detection (duplicates, volume anomalies, plagiarism), peer review (every contribution reviewed by 2+ verified professionals), and credential verification (license, certification, or degree — reviewed by domain governance committee). Volume without quality earns nothing.


"This sounds like crypto."

No token. No blockchain. No speculation. Points are permanent, non-transferable, non-tradeable. They reflect verified professional contribution, not a position to sell. $5/month for experts. $20/month for consumers. Transparent costs. Constitutional protections.


"One person can't build this."

One person started it. The Constitution ensures no one person controls it. Expert communities govern their domains. Constitutional constraints prevent capture regardless of who runs the company.


Still not convinced? Read the Constitution — it is the shortest path to knowing whether we mean it.

Contribute free · Join the Guild — $5/month