u/trainermade

Context window issue through Anthropic endpoint - is the practical limit lower than 204.8k?

I’m using MiniMax-M2.7-highspeed through an Anthropic-compatible endpoint inside an agent framework called Hermes.

MiniMax appears to advertise a large context window, around 204,800 tokens, but I’m seeing API failures well below that.

The error looks like this:

⚠️  API call failed (attempt 1/3): BadRequestError [HTTP 400]
   🔌 Provider: minimax  Model: MiniMax-M2.7-highspeed
   🌐 Endpoint: https://api.minimax.io/anthropic
   📝 Error: HTTP 400: invalid params, context window exceeds limit (2013)
   📋 Details: {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'invalid params, context window exceeds limit (2013)'}, 'request_id': '065b1bb0eab63a4b21e50cb78514'}
   ⏱️  Elapsed: 2.61s  Context: 106 msgs, ~134,146 tokens
Provider reported overflow amount only; keeping context_length at 204,800 tokens and compressing.
⚠️  Context length exceeded at minimum tier — attempting compression...
🗜️ Context too large (~134,146 tokens) — compressing (1/3)...
🗜️ Compacting context — summarizing earlier conversation so I can continue...

My questions:

  1. What does the number in this error mean?
reddit.com
u/trainermade — 3 days ago

Constant compaction - why?

Processing img z1yb3e61rx1h1...

Why does compaction happen at token 106k when my token max is 204k? Started seeing this alot recently.

reddit.com
u/trainermade — 3 days ago

I am looking to move my Hermes VPS implementation from a reactive chat tool to a persistent agent similar to "Felix" from the Nat Eliason interview: https://www.youtube.com/watch?v=nSBKCZQkmYw
I have a few questions on the mechanics of proactivity:

  1. The Wake Up Trigger: How does the agent theoretically "wake itself up" to perform unprompted actions? For example, if a bug is reported or a new feature is requested via a ticket, do you use an MCP to trigger an interrupt, or a persistent heartbeat loop that polls for new tasks?

  2. Autonomous Coding: For a bot that builds its own features, how are you handling the handoff to worker sessions? I want the agent to identify a bug, spawn a persistent worker to fix it, and notify me only when the PR is ready. How do you prevent the OS from killing these long sessions?

  3. Memory Consolidation: For 24/7 operation, how do you handle nightly context cleaning? I want the agent to extract "nuggets" from the day into a knowledge graph without overwriting core system rules or hallucinating data.

If you have structured a "Heartbeat" skill or a specific config for proactive workflows, I would love to see how you are bridging that gap.

u/trainermade — 20 days ago