Request a new model runs emotion-vector extraction on hf jobs

Analyze traces replays sessions through a model, emits per-turn emotion trajectories

Recent runs

🦊 slyfox β€” give Claude Code your team's prior context, when it matters

The pitch: maintainers re-explain the same things to every new contributor (and to every fresh AI session). Their Claude Code sessions are a treasure trove β€” encoding how this team thinks about specific files, which optimisation patterns they reach for, where the dragons live. slyfox surfaces that prior context inside any Claude Code session in the same repo.

What's in this Space: the harness for measuring whether that injected context actually helps an AI fix real bugs. 100 perf-flavoured tasks from huggingface/transformers, gold patches included, runnable both with and without slyfox.

Three companion pieces (one repo: github.com/huggingface/slyfox):

  • plugin/ β€” the Claude Code plugin that does the injection (UserPromptSubmit hook + retrieval).
  • tools/hf-upload-traces/ β€” uploads filtered local Claude Code traces to a HF dataset (privacy scrubbed).
  • jobs_scripts/ β€” HF Jobs for extracting emotion-direction vectors per model and projecting traces through them (the "persona" layer, owned by @AidanZach).
  • harness/ β€” what you're looking at: 100-task SWE bench + LLM-judge.

The diagram below walks the full flow end-to-end and highlights the buddy and mirror ideas. Both pipelines require open weights; closed-API tokens-only models can't be substituted at any of the boxes tagged open.

🦊 slyfox β€” buddy, mirror, quests Three modes of one plugin. All three need open weights, and all three are reachable from the plugin AND from the Space tab bar. 1 Β· SOURCE Claude Code sessions ~/.claude/projects/ turns Β· tools Β· files Β· timestamps 2 Β· PUBLISH hf upload-traces Filter secrets + PII + paths. Push to HF dataset. 3 Β· HUB HF-slyfox/example-traces 52 sessions Β· 40k turns arthurs-beast/transformers/* 3a Β· EMBED JOB (open weights) OPEN βœ“ Qwen3-Embedding-0.6B β†’ FAISS Chunk each session β†’ embed locally on the job runner β†’ write chunks.parquet + faiss.index β†’ HF-slyfox/example-traces-embeddings 8,908 chunks Β· 1024-dim Β· 3,848 are yours 3b Β· PERSONA JOB (open weights) OPEN βœ“ Aidan's emotion-scope pipeline extract_vectors.py: load open model, layer- sweep residual stream, save 20 emotion vectors. analyze_traces.py: replay each session through the same model β†’ per-turn projection β†’ parquet. Needs residual-stream access at chosen layers. Closed APIs return tokens only β€” impossible. CLOSED API: βœ— 4 Β· RUNTIME slyfox plugin UserPromptSubmit hook Β· install.py runs inside Claude Code on your machine on every user prompt: a Β· embed the prompt (same model as index) b Β· FAISS top-K β†’ candidate prior turns c Β· persona rerank (project prompt + match trajectory shape against candidates) d Β· inject excerpts via additionalContext all three modes accessible from: Β· /slyfox-expert /-status /-refresh (slash commands inside Claude Code) Β· tab bar at HF-slyfox/slyfox (Buddy Β· Mirror Β· Quests tabs) ≀1500 token budget Β· 1400ms timeout Β· fail-open chunks.parquet + faiss.index trajectory.parquet πŸ«‚ Buddy "Retrieve from someone who has already worked here." You / colleague / the whole org's pooled sessions. Uses 3a (semantic) + 3b (persona rerank). πŸͺž Mirror "Project everything through the same open model β€” index, query, persona. Two different X's = two activation spaces." Same-model invariant β€” enforced by 3b. 🎯 Quests Persona trajectories recommend a next task that fits your current emotional arc. A Quest = {issue, base_commit, target_files, judge_rubric} β€” same schema as the harness. Driven by 3b (emotion trajectories). Open inside the plugin OR from the tab bar. 5 Β· OUTPUT Claude Code session, augmented Your prompt + fenced trace excerpts ("from buddy: ArthurZ Β· file-expert: …") + persona-aware Quest suggestion ("decisive β†’ ready to land") + Mirror guarantee (every projection through the same open model). harness/run_eval.py measures the LLM-judge lift on 50 perf bugs in transformers. Result so far: signal exists per task (PR #44231 +3, #35184 +5, …) but sits inside generation noise at n=50. runtime output ⚠ Why open weights β€” non-negotiable Residual-stream access. Persona extraction reads layer-L activations; tokens-only APIs can't do this. Mirror (same model both sides). Closed APIs silently swap model versions across calls β€” reproducibility = open. Live per-turn projection. Persona scoring runs as you type β€” must be a model you can hot-load. Buddy works *better* with open weights too: index, query, AND rerank in the same activation space. Closed APIs degrade Buddy to text-only retrieval and disable Mirror + Quests entirely. huggingface/slyfox Β· github.com/huggingface/slyfox Β· all three modes reachable from the plugin + the Space tab bar

Open the full harness Space β†’ for the plugin / persona pipeline / results tabs.