u/cyclebiff

▲ 9 r/oMLX

Seeking Optimization Advice: Qwen 3.6 27B Setup on M2 MacBook Pro

Happy Sunday, everyone! I'm relatively new to running local LLMs (about two weeks in), so I appreciate your patience with my questions. I'm eager to learn from this community's expertise.

Background

A few weeks ago, I discovered agentic coding through my work's GitHub Copilot account. After quickly exhausting my usage limits (lesson learned about token management!), I decided to explore running Qwen models locally on my personal laptop for hobby projects.

Hardware

  • M2 MacBook Pro Max 96GB

Models Tested

  • oMLX: Qwen 3.6 27B (oQ4/oQ5/oQ6/oQ8-fp16-mtp variants)
  • LM Studio/GGUF: Qwen 3.6 27B (Q4_K_M, Q6_K, Q8_K)
  • llama.cpp: Configured per this post

Use Case

I'm primarily doing C++ and ESP32/PlatformIO development for personal projects, including:

  • Real-time voice modulation for cosplay costumes
  • Real-time bark detection logger (courtesy of my neighbor's enthusiastic dog)

Current Configuration

After implementing MTP changes, I've settled on the following setup:

Model: oMLX Qwen 3.6 27B-oQ5-fp16-mtp

Settings:

  • Context: 262,144
  • Temperature: 0.6
  • Top P: 0.95
  • Top K: 20
  • Min P: 0
  • Repetition Penalty: 1
  • Presence Penalty: 0
  • Extended thinking: Enabled
  • Native MTP: Enabled
  • oMLX caching: Enabled

IDE Setup:

  • VS Code with Cline extension
  • OpenAI-compatible API from oMLX

Workflow:

  1. Enable PLAN mode in Cline
  2. Request feature implementation or bug research plan
  3. Switch to ACT mode and execute
  4. Wait lol

Current Performance

While the quality of Qwen 3.6 (Q4-Q8) is impressive, performance could be better:

  • Prompt processing: ~120 tok/s
  • Token generation: ~15 tok/s

Question

For those running similar hardware (especially M2 users), what combination of:

  • Software stack (oMLX, LM Studio, llama.cpp, etc.)
  • Specific Qwen 3.6 model variants
  • Inference settings

...have you found optimal? Any suggestions for improving prompt processing and token generation speeds on M2 hardware would be greatly appreciated!

reddit.com
u/cyclebiff — 4 days ago