/Tech13h ago

Entropix creator _xjdr says Moonshot's Kimi K2.7-Code outperforms K2.6 while cutting daily token usage to 592 million

The test ran inside a custom Claude coding harness

23327135221.2K

#501

Original post

xjdr@_xjdr#830inTech

2.7-Code in claude code with my custom inference harness works surprisingly well. im comparing it to my k2.6 which was fine tuned for this harness as well as my recent runs of fable (at least the ones that succeeded) and it stacks up favorably. where k2.6 felt very much like opus, k2.7 is kind of its own thing (for better and for worse) . its more terse, more argumentative and overall 'smarter' (at least for the use cases i've tried it on already) . Moonshot did a very very good job with the post training on this, i am very impressed with their work. While k2.7 would certainly benefit from a FT run specifically on my claude code harness and my new scm system (called ncode), i am going to continue to use it as is for a while instead of falling back to the k2.6 ft (which is about the highest endorsement i can give) .

xjdr@_xjdr

Lets see how this stacks up . high hopes

10:23 AM · Jun 13, 2026 · 16.9K Views

Sentiment

Many users praise the K2.7-Code model's strong out-of-the-box usability and outperformance in custom Claude inference harnesses because it delivers excellent results for coding and engineering with no quality drop.

Pos

100.0%

Neg

0.0%

9 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3KBOOKMARKS12LIKES95RETWEETS4REPLIES5

xjdr@_xjdr

here is a snapshot of a single repo, but k2.7-Code has been much more terse (much much more) without a perceived drop in quality (in fact id say mostly an overall increase)

same repo, same workloads, some work done in parallel for evals

2h3K9512

xjdr@_xjdr

@keennay you'd be surprised how much intelligence is in the harness and i've spent a lot of time making my claude code fork with well with k2.* family.

13h484163

xjdr@_xjdr

@xlr8harder @keennay what you call out is one of the reasons i said github was the second largest risk behind anthropic / claude code

13h310122

xjdr@_xjdr

@teortaxesTex this is hooked into my inference system so it would be hard, but if i have time i'd love to figure out how. so far, i've been really impressed by k2.7. im probably going to abandon my k2.6 FT and focus on optimizing for 2.7

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@_xjdr Can you check Composer? I don't get why it's so much cheaper

2h484200

Yannick Nick@keennay

@_xjdr have you ever used OpenCode → your local Kimi inference for your workload?

13h55051

xlr8harder@xlr8harder

@_xjdr @keennay @secemp9 Oh I misunderstood at first. You're thinking takedown resistance?

13h8922

xlr8harder@xlr8harder

@_xjdr @keennay Not sure if it's the right basis for you but @secemp9 has a python port of pi that i've been working with

13h16041

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@_xjdr Can you check Composer? I don't get why it's so much cheaper

xjdr@_xjdr

here is a snapshot of a single repo, but k2.7-Code has been much more terse (much much more) without a perceived drop in quality (in fact id say mostly an overall increase)

same repo, same workloads, some work done in parallel for evals

2h73360

xlr8harder@xlr8harder

@_xjdr @keennay any thoughts on open sourcing this? If it's a fork I guess you can't, but perhaps it could be ported to another framework?

13h1926

xjdr@_xjdr

@joseluissaorin its hard to say because i use these models almost entirely for code and engineering so i would be unable to properly gauge how robust the pretraining is. for my purposes, it is excellent

13h2854

Pivi@ApplyWiseAi

@_xjdr how custom is "custom" here? like swapping model calls or is there actual orchestration around the code generation step

12h2112

er 𝕵𝖔𝖘𝖊𝖑𝖚@joseluissaorin

@_xjdr Do you feel it is far off in terms of pretraining? How do you feel about its "creativity" or deep knowledge and memorization?

13h3271

synabun.ai@SynabunAI

@_xjdr k2.7 being its own thing is the correct take. every model that tries to be opus clones itself into mediocrity. moonshot actually shipped something with a personality for once.

8h211

Gregor@bygregorr

@_xjdr not sure beating a harness-specific k2.6 means 2.7-Code is genuinely stronger could just mean the fine-tune drifted from your current config. saw that pattern with a tool setup i optimized then updated. did your harness change much since you trained k2.6?

12h2311

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@_xjdr This is a pretty insane ratio

xjdr@_xjdr

here is a snapshot of a single repo, but k2.7-Code has been much more terse (much much more) without a perceived drop in quality (in fact id say mostly an overall increase)

same repo, same workloads, some work done in parallel for evals

2h42640

xjdr@_xjdr

@bygregorr i didn't say it was stronger, simply that it worked surprisingly well out of the box and in my initial testing compared favorably to what i have been using . these models are good enough now that 'stronger' is typically reductive. most have strengths and weaknesses

12h1973

Bela@0xBE7A

@_xjdr @keennay Sounds pretty cool and interesting: Are you dogfooding your training stack for harness post-training or have you written or talked about what you’ve done anywhere?

Is it just SFT to get it to know tool formats or full RL?

13h36

xjdr@_xjdr

@ApplyWiseAi i have both, im not sure what would be practical to OSS . i.e, specific k8s tooling and flows for managing gb300nvl72s is probably not the most practical to support.

12h1842

Casp@casperxbt

@_xjdr thank you mr oracle, this will heal the scars fable has left us

13h2291

Juuso Heikkinen@Juuso95

@_xjdr cool teardown. for anyone in this thread still on fable watch, leaving this here

12h1301