u/msrdatha

This question is for those who have tried the MTP quants of oQ version of models with oMLX.

Are you seeing any compromise on the quality of the outputs, compared to non-MTP versions?

Sure the speed increment on token does help, but if the tool call failures or any such issues are happening, it is not really worth the additional tok/sec we get right?

We will be able to assess this only on real scenario usages which we have been using before and are familiar with.

So are you seeing any such degradation of quality or do you think its worth going with MTP version? What are your thoughts?

Is MTP speed boost really helping ?