u/MirrorEthic_Anchor

temporal-mcp: wall-clock awareness for LLMs, with OAuth
▲ 4 r/MCPservers+1 crossposts

temporal-mcp: wall-clock awareness for LLMs, with OAuth

One of the small failure modes I keep hitting with agent stacks is that the model has no idea how much time passed between turns. It'll greet you with "good morning" at 11 PM, or pick up a conversation three weeks later as if no time has passed, or compute "today's data" off whatever fragment of context happens to be in scope.

Built a minimal MCP server to fix it. Two tools: temporal_tick and temporal_peek. They return elapsed-time-since-last-turn, day-rollover detection, and a fresh-thread flag, both as a human-readable header and as JSON.

Ways to use:

Local stdio: pip install temporal-mcp (works with Claude Desktop, Cursor, Cline, Zed, Claude Code)

Hosted with OAuth (claude.ai / ChatGPT): visit https://temporal-mcp.dev/connect, click "Generate OAuth Credentials", paste into your custom connector. Full OAuth 2.0 with PKCE and refresh tokens, but no signup, the credential pair is the identity. (Verified working in claude.ai).

Hosted with raw bearer (any client that supports custom headers): Authorization: Bearer <any-opaque-string> against https://temporal-mcp.dev/mcp. The token gets SHA-256'd; we never see the plaintext.

Self-host: Cloudflare Workers deploy in workers/ in the repo, free tier covers ~100k req/day.

Grok/xAI: https:temporal-mcp.dev/mcp/<string> (Verified working with Grok)

MIT, ~150 lines of stdlib Python on the local side, ~400 lines of TypeScript on the hosted side (engine + OAuth provider), both with tests. Listed in the official MCP Registry. Smithery and Glama submissions in flight.

Curious to hear how folks would use the JSON day_rollover and delta_sec signals. I've been using them for context decay and resume detection but there are probably more interesting use cases.

Source: github.com/MirrorEthic/temporal-mcp

u/MirrorEthic_Anchor — 10 days ago

In the spirit of open-source inspection, reproduction, and critique.

I recently released T³-124M-v36, a 124M-parameter experimental transformer checkpoint, along with a reference repo, benchmark artifacts, trace tooling, and an ablation sibling. (Literally yesterday. Repo is still a little rough)

Links:

GitHub: https://github.com/MirrorEthic/t3-reference

Main checkpoint: https://huggingface.co/mirrorethic/t3-124m-v36

PC-loss ablation sibling: https://huggingface.co/mirrorethic/t3-124m-v36-pcloss

Benchmarks: https://t3atlas.dev/benchmarks/

T³ is a small experimental transformer variant using a three-stage / three-clock routing structure with Clifford-algebra-coupled state. The current public checkpoint is not meant to be a production text-generation model. It is 124M parameters, English-only, not instruction-tuned, and mainly intended for research, interpretability, and architectural comparison.

Evaluation numbers are full "lm-eval-harness 0.4.x" runs, no subsets. Reproduction is through "examples/run_benchmarks.py" in the reference repo.

v36 eval snapshot:

Task| Metric| Value

WikiText-103 val| perplexity| 27.76

BoolQ| acc| 0.6046

ARC-Easy| acc| 0.4331

ARC-Challenge| acc| 0.2176

PIQA| acc| 0.6050

HellaSwag| acc| 0.3040

WinoGrande| acc| 0.5043

COPA| acc| 0.6000

RTE| acc| 0.5235

The main comparison I’m investigating is against a vanilla GPT-2 124M baseline trained on the same 5B-token data mixture. The interesting behavior is the downstream capability profile, especially on compositional / multi-step reasoning tasks under a same-data architectural comparison.

I also released "t3-124m-v36-pcloss", a negative/neutral ablation sibling. It uses the same architecture, same data, same step count, and same configured hyperparameters as v36, but enables gradient flow through the inter-stage predictive-coding loss. The result I think is useful because the internal K-predictor learns a stronger cross-stage map, but that doesn't translate into downstream reasoning gains at 124M scale. So it's a mechanism probe.

What I’d most appreciate from this community…

Reproduction attempts

Baseline critique

Repo/API cleanup feedback

Eval harness suggestions

Suggestions for cleaner architecture ablations

People interested in testing the architecture on better-controlled corpora

I want to be better. Feedbacks how I learn from my mistakes.

Limitations:

- 124M parameters, so it is not useful as a chat/generation model

- English-only

- no instruction tuning / RLHF / safety tuning

- public repo is still being cleaned into a better module split

- broader architectural interpretation is still being tested through ablations

- perplexity comparisons are only meaningful when validation corpus, tokenizer, context length, packing, and preprocessing are controlled

The project is Apache-2.0 for both code and weights.

Running a 358M v3.7 training run on the 5B corpus now. That should be a more capable substrate for testing but it will be probably 12 days for that to finish. Will post it all up on t3atlas.dev when it's complete.

u/MirrorEthic_Anchor — 19 days ago

I've spent the last year independently developing T³, a transformer architecture that augments standard attention with a per-head ecology grounded in Clifford algebra. Wanted to get the public artifact out for feedback, working in isolation can form unseen blindspots.

247 inference traces across 12 architectural lineages and 3 foundation-model substrates (GPT-2, Gemma3, Qwen2.5)

Documented stable schema with versioning

~990 benchmark measurements with same-data baselines run through a single canonical eval harness

Pareto frontier visualizations per task

Tier-marked dataset distinguishing canonical results from probable / archival

Headline: T³ at 124M parameters trained on ~500M tokens shows +6 to +10pp over same-data vanilla GPT-2 124M at ~10× less compute on compositional reasoning benchmarks (HellaSwag, ARC-C, WinoGrande, BoolQ). Roughly tied on knowledge benchmarks (ARC-E, PIQA). The differential pattern is consistent with the architectural prediction.

The work sits in the intersection of geometric algebra transformers (GATr, Versor, CliffongdNet), alternative attention architectures (Mamba, RWKV, xLSTM), and mechanistic interpretability infrastructure (SAEBench, Neuronpedia).

Built solo on consumer hardware (painstakingly😂). TMLR submission with co-author under review (just waiting on AE and review team for revisions).

Happy to answer questions about architecture, methodology, or the consolidation process. Did my best to make this as rigorous as I could while providing something interesting to interact with.

https://huggingface.co/mirrorethic/t3-124m-v36

https://github.com/MirrorEthic/t3-reference

https://t3atlas.dev

u/MirrorEthic_Anchor — 20 days ago