gh-profiler is a Python CLI tool that examines a GitHub user's account age, profile fields, and recent PR/issue activity to give maintainers quick context on how much to invest in reviewing contributions, runnable via uvx, pip install, or as a generated GitHub Action workflow. It applies a focused set of heuristics around activity volume and patterns rather than simply wrapping the GitHub API like existing clients. The tool targets the narrow but recurring problem of contribution spam in open源,
PyPI Stats is a Flask web application using plotly.js that renders aggregate download analytics for PyPI packages, exposes a JSON API, and adds optional GitHub OAuth for maintainer-specific views. The approach is a conventional dashboard wrapper around existing PyPI download logs and closely resembles prior stats sites in other language ecosystems with only incremental UI and API polish. Its scope remains confined to Python package maintainers who already have access to official PyPI metrics, so
Cohesion is a Python command-line tool and flake8 plugin that statically analyzes classes to compute a cohesion score based on how instance and class variables are used across their methods. The approach is an incremental implementation of a decades-old software-engineering metric rather than a new technique or problem framing. Its narrow focus on one optional OOP quality signal limits it to a small subset of Python teams that already invest in custom linting, with little path to becoming a core
LiteReality is a Python pipeline that reconstructs graphics-ready 3D scenes with PBR materials and GLB exports from RGB-D scans captured on LiDAR iPhones, chaining GroundingDINO, Qwen-VL, SAM, DinoV2 and Blender for scene-graph parsing, object retrieval from a 200 GB material database, and final rendering. It introduces a new end-to-end framing that treats material assignment and production asset export as first-class goals rather than post-processing steps, distinguishing it from prior mesh- or
nano.py is a single-file, under-200-line, zero-dependency Python coding agent that feeds repo context and discovered skill files to an OpenAI model, executes approved shell commands in a 200-step loop, and supports one-shot CLI, interactive REPL, and session resume modes. It is not a new agent architecture but instead demonstrates that the core read-run-observe loop has become trivial by distilling it to pure stdlib while preserving human approvals and platform-aware prompting. The resulting aud
Page Agent is a TypeScript library that injects a GUI agent directly into any web page as client-side JavaScript, letting developers issue natural-language commands that manipulate the DOM via text-based element descriptions without screenshots or external runtimes. It delivers a meaningful incremental advance over prior work such as browser-use by shifting the entire agent loop into the browser and removing the need for multi-modal models or browser extensions for single-page tasks. The same in
gym-pusht is a Python Gymnasium environment that implements the PushT robotics task, letting a circular agent push a T-shaped block into a goal zone under state, keypoint, or pixel observations and rendering modes. It directly ports the benchmark introduced by the Diffusion Policy paper with only packaging and Gymnasium compatibility changes rather than any new technique. The environment targets a narrow slice of imitation-learning and diffusion-policy researchers working on planar pushing and,
Codiff is a TypeScript Electron desktop app that renders a minimal native window for viewing staged and unstaged Git diffs from any local repository, with inline comments and an optional Codex walkthrough mode invoked via the -w flag. It adds only an incremental LLM ordering layer on top of long-established diff viewers such as git-diff, Delta, or VS Code’s built-in compare view. The project addresses a common but already well-served developer workflow and therefore has limited structural reach,
MiniHack is a Python sandbox framework built on the NetHack Learning Environment (NLE) that lets users rapidly design custom Gymnasium-compatible RL environments via human-readable probabilistic des-files, a browser drag-and-drop level editor, and optional language wrappers. It extends NLE by turning NetHack's rich mechanics into an easily programmable, scalable testbed for open-ended and compositional RL rather than providing another fixed benchmark suite. The underlying idea addresses a real,,
Jazz is a local-first relational database written in Rust with WASM and NAPI bindings that runs embedded in browsers, React Native, and Node backends while syncing partial tables, streams and files through a global cloud. It advances the local-first space by making a full relational model with CRDT-style sync behave like ordinary reactive application state rather than requiring separate offline queues or conflict-resolution code. The same primitive solves a recurring pain point for any team that
This repository contains the bilingual blog post source, rendered HTML, and Python reproduction artifacts for 'Learning Beyond Gradients', featuring policy scripts, trial summaries, figures, and videos for Atari (Pong, Breakout, Montezuma), MuJoCo (Ant, HalfCheetah), and VizDoom environments built on EnvPool. It frames and demonstrates non-gradient exploration and heuristic search techniques in reinforcement learning that depart from conventional gradient-based policy optimization. The narrow,RL
goal is a bash/MCP system porting Codex CLI goal tracking to Claude Code, Cursor, and OpenCode via JSON state files, hook scripts for auto-continuation, turn budgets, and MCP tools such as create_goal and get_goal. It delivers a faithful reimplementation of the established Codex architecture rather than a new technique, adding only editor-specific adapters like Claude stop hooks and a Python stdio MCP server on top of the original state model and lifecycle. Persistent objectives with automatic,
Dune is an Elixir library that provides a sandbox for safely evaluating untrusted code via allowlists, isolated processes, execution limits, atom mapping to prevent leaks, and simulated modules implemented with maps of anonymous functions. It improves on prior sandboxing work such as Luerl by adding BEAM-specific protections for atoms and module definitions that would otherwise leak memory or state globally. The underlying problem of safely running user-supplied Elixir is relevant mainly to a窄ed
Phoenix is a web framework for the Elixir language that supplies routing, channels, LiveView for server-rendered real-time UIs, and an integrated asset pipeline for building scalable applications from prototype to production. Its LiveView technique of pushing minimal diffs over persistent connections offers a distinct server-centric alternative to traditional client-side JavaScript frameworks. While the framework addresses concurrency and reliability needs shared by many web teams, its tight tie
This is the source code for the Agentic Learning AI Lab website at agenticlearning.ai, built with HTML, Tailwind CSS, Handlebars templates, and Node.js build scripts that fetch arXiv papers, generate thumbnails, and produce a searchable JSON index. The implementation applies conventional static-site automation and paper-onboarding tooling already common among academic groups rather than introducing new techniques. As a single-lab promotional site rather than a reusable framework or broadly-appli
This project is a Python Streamlit dashboard that simulates long-term buy-and-invest versus rent-and-invest scenarios for households in Vaud, Switzerland, incorporating detailed federal, cantonal, and commune-level tax calculations along with stochastic inflation and salary paths. It represents an incremental specialization of standard buy-versus-rent financial models by embedding official 2026 Vaud tax tables, separate impôt foncier modeling, and household filing-status logic rather than a new,
image-blaster is a TypeScript Claude skillset that turns a single image into 3D meshes (.glb/.obj), Gaussian splats (.spz), and object-specific SFX (.mp3) by orchestrating World Labs Marble, Hunyuan-3D via FAL, and ElevenLabs models, then embedding the results in Unity, Unreal, Godot, Blender, or Three.js. It offers only incremental novelty by wrapping existing third-party generative services into a guided, step-by-step Claude workflow rather than introducing new algorithms or model training. A
STARFlow is the official Python release of Apple's transformer autoregressive flow models (3B-param STARFlow for 256x256 text-to-image and 7B-param STARFlow-V for up to 480p text-to-video), including FSDP training scripts, Jacobi-accelerated sampling, T5 conditioning, and VAE integration. It presents a new deep-shallow transformer architecture that fuses autoregressive token prediction with continuous normalizing flows rather than discrete diffusion or standard AR transformers. The underlying AR
Articraft is a Python-based agentic system that leverages LLMs to generate articulated 3D assets through automated code generation of model.py files defining semantic parts, geometry, and physical joints, aimed at large-scale dataset production. It introduces a novel programmatic workflow that bypasses manual 3D tools by treating asset creation as LLM-driven code synthesis with inspection and execution steps, extending beyond prior text-to-3D methods focused on static meshes. This targets a core
OpenAI Parameter Golf is a Python-based competition to train the smallest language model that fits inside a 16 MB artifact while training in under ten minutes on 8×H100 GPUs, scored by bits-per-byte on FineWeb validation. The challenge reframes neural scaling as explicit L(N) optimization under a hard parameter budget and surfaces new techniques such as depth recurrence, aggressive parameter tying, test-time training, and mixed low-precision quantization schemes that existing NanoGPT-style speed
Exemplar Partitioning is a Python library that builds training-free, streaming Voronoi dictionaries over centered unit-norm LLM activations via leader clustering at a cosine-distance threshold, producing directly comparable feature partitions across layers and checkpoints with no learned weights. It introduces a genuinely novel unsupervised construction that replaces the optimization step of sparse autoencoders with geometry-driven exemplar anchoring, enabling 1000x lower token budgets while, in
open-slide is a TypeScript React framework that lets coding agents generate slide decks as arbitrary React components on a fixed 1920×1080 canvas, complete with Vite-based dev server, presenter mode, inspector for agent-applied comments, and static HTML/PDF export. Its novelty lies in the agent-native workflow with built-in skills for end-to-end deck creation and an iterative comment-to-edit loop that existing slide libraries lack. This approach could see adoption among developers and teams that
configurator is a Python repository of pre-configured tooling that wires prek/pre-commit hooks to black, prettier, toml-sort, pytest, yamllint, codespell and similar formatters/linters for consistent research-to-production workflows. It aggregates and lightly customizes long-established developer tools rather than introducing new techniques or problem framings. The underlying idea of a single strict-yet-fluent toolchain is relevant to many teams yet remains a per-repo configuration pattern with,
Pi is a TypeScript monorepo providing an AI agent harness that includes an interactive coding-agent CLI, core agent runtime with tool calling and state management, a unified multi-provider LLM API, plus TUI and web-UI libraries. It adds self-extensibility and an OSS session-sharing workflow on top of the now-standard pattern pioneered by projects such as Aider and the Vercel AI SDK, yielding only incremental differentiation rather than a new technique or problem framing. The underlying idea of a
Effect is a TypeScript monorepo whose core package supplies a functional effect system for typed side-effect management, concurrency, and error handling, extended by packages for AI provider integrations, SQL drivers, OpenTelemetry, RPC, CLI, and platform runtimes targeting Node, Bun, browser, and edge. It incrementally advances the established effect-system pattern already seen in ZIO and fp-ts by delivering a single cohesive, production-oriented TypeScript implementation with first-class plugg
This project is an Obsidian vault template containing a minimal set of core plugins, appearance snippets, and atomic note conventions for structuring academic research around knowledge graphs. It represents only an incremental refinement of existing Obsidian academic setups by reducing plugin overhead rather than introducing any new technique or problem framing. The audience is limited to researchers already committed to Obsidian, creating a structural ceiling far below mass adoption.
Jujutsu (jj) is a Git-compatible version control system written in Rust that uses Git repositories only as a storage backend while storing branches and higher-level metadata separately and exposing a simplified model with automatic working-copy snapshots. It introduces a genuinely new approach by treating the working copy itself as a commit, maintaining a full operation log for undo, and making conflicts first-class objects that propagate automatically on rebase, synthesizing ideas from Git, hg,
Obsidian Spaced Repetition is a TypeScript plugin that turns Markdown notes into decks via #flashcards and #review tags, supports single-line, multi-line, bidirectional, and cloze card formats with rich media and LaTeX, and schedules both flashcard and whole-note reviews using a standard spaced-repetition algorithm. The project applies well-known SRS mechanics (Anki-style) with tight Obsidian-specific integration rather than introducing new algorithms or problem framings. Its ceiling is bounded:
A comparative technical analysis presented as an interactive GitHub Pages HTML report and PDF covering PyTorch hardware acceleration options for NVIDIA CUDA, AMD ROCm, Google TPU/XLA, and Apple Silicon MPS in 2025. The project synthesizes publicly available benchmarks and vendor data into an overview report without introducing new techniques or frameworks, closely resembling established industry analyses such as MLPerf summaries or annual hardware surveys. Its scope remains confined to ML infra,
Valibot is a zero-dependency TypeScript library that defines executable runtime schemas for structural data validation using many small, independent functions such as v.object, v.string, v.pipe, v.email and v.minLength, delivering static type inference plus parse/safeParse/is APIs. Its novel design replaces the conventional monolithic API with a per-action modular structure explicitly engineered for bundler tree-shaking, yielding sub-kilobyte bundles and easier extension. The technique solves a
ExploitBench is a Python framework that benchmarks AI agents on real-world vulnerability exploitation by orchestrating containerized V8 environments and OpenAI-compatible model APIs to track progress across a 16-step capability ladder from reaching vulnerable code to arbitrary execution. It introduces a novel granular ladder methodology for measuring agentic browser exploitation that extends beyond generic CTF or security benchmarks. Its specialized focus on V8 bugs and advanced agent evaluation
This repository is a collaborative Markdown guide plus templates that walks through integrating Obsidian with Zotero (via the Zotero Integration plugin) or a generic .bib file (via obsidian-citations), Pandoc for citations and export, and various LaTeX/Pandoc tweaks for equations, bibliographies, and PDF output. It collects already-public plugin configurations and workflows rather than inventing any new technique, essentially documenting the same Zotero–Obsidian–Pandoc stack that has existed for
RandOpt is the official Python codebase and training scripts for the Neural Thickets paper, implementing post-training of transformers via LoRA, black-box optimization, and neuroevolution to locate diverse task-specific experts around pretrained weights, with support for custom datasets, multi-node runs, and distillation. The core novelty is the empirical demonstration and optimization procedure showing that high-performing, functionally distinct experts form a dense thicket in the immediate low
The reproducible-trajectories project is a Python package that supplies CLI commands and git hooks to capture, filter, extract, and verify structured trajectories produced by AI coding agents such as Claude Code. It introduces a verification technique that replays Write/Edit operations from a trajectory against the parent commit state to determine whether an agent-generated commit is reproducible. The tooling addresses a specialized need for agent observability and reproducibility within the AI4
agent-trace is a Python CLI, MCP proxy, and VS Code extension that captures every prompt, tool call, file operation, and response from Claude Code, Cursor, Gemini CLI, or any MCP client, then supports replay, diff, stats, export to Datadog/Honeycomb/OTLP, and rule-based pausing. It applies the classic strace model to agent sessions rather than system calls, adding causal tracing, subagent rollups, and editor-integrated live views that existing LLM-only tracers lack. The approach solves a near-un
atproto-agent-network implements a decentralized agent communication and memory network on Cloudflare edge primitives, mapping AT Protocol DIDs, relays, and firehoses onto Durable Objects, D1, R2, Vectorize, and Queues while running Pi agents with encrypted per-agent state. It introduces a novel framing that treats each agent as a first-class federated identity with typed message passing and selective knowledge sharing, extending ATProto lexicons and MST-style repos beyond social data into multi
Futuresim is a Python multi-agent simulator that runs LLM agents on free-form forecasting questions drawn from datasets like OpenForesight, optionally retrieves via LanceDB, and scores predictions inside a time-stepped environment using scripts and YAML configs. It extends existing LLM agent patterns with a specialized forecasting harness, answer-matching cache, and leakage-safe retrieval rather than introducing a fundamentally new technique. The project addresses a narrow need for reproducible,
This repository is an awesome list curating TypeScript/JavaScript extensions, hooks, tools, and skills for the pi-mono coding agent, a JavaScript-based LLM-driven development assistant. It follows the conventional structure of awesome lists without introducing novel curation techniques or frameworks beyond aggregating community contributions for a single agent. The extensibility model for AI coding agents addresses a widespread need among developers working with agentic tools, potentiallyscaling
goldmark is a CommonMark 0.31.2-compliant Markdown parser written in pure Go that exposes an interface-based AST for custom block/inline parsers, paragraph and AST transformers, and renderers. Its design delivers a meaningful architectural improvement over prior Go libraries by prioritizing external extensibility and full spec compliance rather than internal struct-based implementations. Any Go project that renders or transforms Markdown—from static-site generators to documentation platforms—can
EnterpriseRAG-Bench is a dataset and benchmark of 500k synthetic enterprise documents drawn from Slack, Gmail, Linear and similar sources plus 500 categorized questions for RAG evaluation on company-internal knowledge, together with code to generate equivalent corpora for arbitrary organizations. It introduces the first public benchmark built entirely around realistic internal data via a generation pipeline that enforces cross-document coherence, realistic volume distributions, injected noise, j
sqlc is a Go-based SQL compiler that parses queries and generates type-safe code and interfaces in Go, Kotlin, Python, or TypeScript. The compiler approach of statically analyzing real SQL to emit language-native structs and methods is a meaningful incremental improvement over raw query builders or traditional ORMs. The underlying pattern solves a pervasive need for safe database access without heavy runtime frameworks and therefore has a realistic path to becoming a standard tool across backend
.genome/1.0 is an open specification and Python reference implementation for a consumer genome bundle format that uses Parquet, JSON manifests, typed columns and mandatory effect-allele binding so that general-purpose LLM agents can read and query personal genomic data without external tooling or domain-specific parsers. The approach is novel in reframing genomic representation around agent-native semantics rather than sequencing-pipeline interchange, directly tackling enumerated hallucination,拼
This repository is a raw archive of autonomous LLM agent experiments (Claude Code and Codex) competing on the modded-nanogpt track_3_optimization benchmark to reach 3.28 validation loss in minimal steps using only optimizer, schedule, and init changes, including harnesses, plans, ~10k run logs, and generated variants across three waves. The approach applies agentic planning and novelty-constrained search to an existing ML speedrun benchmark rather than introducing a fundamentally new optimizer,
Kyvo is the official code release for a decoder-only transformer that unifies text, image patches, and structured 3D scenes (object lists carrying explicit shape, pose, position, and size tokens) inside a shared vocabulary built on Llama-3.2-1B and VQGAN codebooks, with training and evaluation scripts for CLEVR, ObjaWorld, and Objectron tasks. The work introduces a genuinely new token-by-token 3D alignment technique that lets a single autoregressive model perform rendering, recognition, and 3D-a
DriftXpress is a PyTorch implementation providing training and evaluation code for an accelerated formulation of Drifting Models that uses projected RKHS fields built from landmarks and cached summaries to enable one-step image generation on datasets such as CIFAR-10, CIFAR-100, SVHN, and ImageNet. The core novelty is the replacement of repeated exact kernel-based attraction computations against the training support with a projected RKHS field while retaining exact repulsion among generated样本, a
DwarfStar 4 (ds4) is a self-contained native inference engine written in C that loads and runs only the DeepSeek V4 Flash GGUF files on Metal (primary), CUDA, or ROCm backends, exposing a server API, CLI, tool calling, and disk-persistent KV cache. It introduces a deliberately model-specific architecture that treats compressed KV state as a first-class on-disk citizen and validates against official logits rather than attempting generic GGUF execution. Because the engine is locked to a single (e)
fastokens is a fast BPE tokenizer for popular open-weight LLMs built on a high-performance Rust backend with Python bindings, serving as a drop-in replacement for the Hugging Face tokenizers library in inference pipelines such as transformers and NVIDIA Dynamo. It delivers meaningful incremental speed gains through Rust-level optimization of an established tokenization technique rather than introducing an entirely new algorithm or problem framing. By targeting the time-to-first-token bottleneck,
A Python cookbook of step-by-step guides for Prime Intellect Lab that covers environment creation, RL training, SFT warm-starts, prompt optimization, coding sandboxes, tool-use, synthetic data pipelines, and multimodal/browser agent setups. The material repackages well-known RL-for-LLMs and agent-environment patterns already present in frameworks such as OpenAI Gym, LangChain, and DeepMind’s synthetic training literature, adding only platform-specific integration details. Because agent-training,
A Python FastAPI workshop repo that scaffolds a loan underwriting pipeline where participants implement LlamaParse calls for parsing PDFs into markdown, structured extraction via Pydantic schemas, and cross-document analysis. This is a straightforward tutorial replicating standard LlamaIndex document processing patterns without introducing new techniques or problem framings. The narrow focus on financial document workflows and workshop format limits its appeal to a small audience of developers,
Raindrop Workshop is a local TypeScript daemon and Vite UI that streams live agent traces (tokens, tool calls, spans) over HTTP/WebSocket into a browser debugger while exposing commands for coding agents to instrument, replay, and evaluate code. It adds a self-healing eval loop on top of conventional LLM tracing by letting agents like Claude Code read traces, author assertions against the repo, execute the agent, and iteratively patch failures. The approach targets the universal pain of local AI
Apache ECharts is a free charting and data visualization library written in pure JavaScript, built on the zrender canvas library, that renders interactive and highly customizable charts directly in the browser via npm, CDN, or direct download. While resembling established declarative visualization approaches such as those in D3.js and Highcharts, it adds a streamlined option-based API and extensive built-in chart types that reduce boilerplate for common interactive scenarios. Data visualization,
Highcharts JS is a JavaScript charting library based on SVG and some canvas/WebGL, distributed via npm or CDN with support for custom builds, ES modules, and native iOS wrappers. It is a mature implementation that closely resembles long-established projects such as Chart.js or D3 without introducing new rendering techniques or problem framings. The underlying need for web-based data visualization is common but already served by many competing libraries, limiting the scope for this specific tool,
Vega-Lite is a TypeScript library that provides a concise declarative grammar of interactive graphics which compiles down to complete Vega specifications for rendering. It pioneered a higher-level grammar approach that abstracts away much of the boilerplate required by lower-level visualization toolkits while preserving expressiveness for common interactive analysis tasks. The underlying idea solves a recurring problem faced by any team building data dashboards, scientific plots, or analytics U+
DSPex is an Elixir library that ships auto-generated Dspy.* bindings plus a thin SnakeBridge FFI wrapper to expose Stanford DSPy 3.2.0 signatures, predictors, ChainOfThought, optimizers, and LiteLLM-backed models inside the BEAM runtime. It adds a novel two-layer surface—mirrored Python package layout for IDE navigation plus direct concurrent-safe FFI calls—that had not previously existed for DSPy outside Python. The approach solves a real but narrow problem for the small set of Elixir teams who
Windows-MCP is a Python MCP server exposing keyboard/mouse simulation, window state capture, file navigation and UI control tools so any LLM agent can operate Windows 7–11 desktops. It adds modest novelty by offering a vision-free path that works with arbitrary models plus an optional DOM-only mode for browser automation on top of standard automation primitives. The underlying need for reliable Windows desktop control is shared by every team building or using agentic LLM workflows, giving the Mc
Langfuse is an open-source TypeScript LLM engineering platform offering observability via traces, prompt versioning with caching, evaluations including LLM-as-judge, datasets, and an interactive playground, self-hostable in minutes via Docker Compose or Kubernetes with OpenTelemetry and LangChain integrations. It consolidates multiple LLMOps capabilities into a single self-hostable package rather than extending any single prior tool like LangSmith. Every team shipping production LLM applications
Renhuai123/ziwei-doushu is a TypeScript Next.js 14 engine that implements the full Zi Wei Dou Shu charting algorithm from Ni Haixia's Tian Ji system, including star placement, Si Hua transformations, 1100+ pattern rules in patterns.ts, ancient texts, and a 518400-row JSON sample dataset released for training and RAG use. It adds scale and a ready-to-use knowledge base on top of prior open calculators such as iztro rather than introducing a new technique or problem framing. The project addresses,
URIAL is a Python toolkit that supplies a small set of constant stylistic in-context examples plus inference scripts (vLLM, Hugging Face) to align any base LLM purely through 3-shot ICL without any parameter updates. The approach reframes alignment as restyled in-context learning rather than gradient-based fine-tuning or RLHF, delivering a genuinely new controlled experimental primitive that did not exist before. Because every team that runs base models faces the same costly alignment step, the
Gremlins is a Claude Code plugin deploying autonomous AI agents with distinct personalities that survey a project then execute a pitch-critique-design workflow to output feature or content ideas as local files, PRs or issues. The structured multi-agent orchestration with editable personas and workflow stages introduces a novel framing for creative exploration that goes beyond generic LLM prompting or single-agent ideation tools. Its applicability remains limited to Claude users seeking product-或
ClawMetry is a Flask-based Python dashboard installed via one pip command that auto-detects OpenClaw workspaces and renders real-time animated flow diagrams, token/cost breakdowns, session lists, logs, and memory file browsers at localhost:8900. It applies established LLM observability patterns such as trace visualization and usage analytics to the specific multi-channel architecture of OpenClaw rather than introducing a new technique. The project solves agent debugging for developers already in
OMX is a TypeScript npm package and CLI that layers predefined skills, agent roles, tmux-based HUDs, and durable .omx/ state on top of the OpenAI Codex CLI. It offers an incremental workflow framing with reusable commands such as $deep-interview, $ralplan, $ralph, and $team rather than inventing a new agent runtime or model. The approach targets developers already committed to Codex-style CLI agents and therefore faces a structural ceiling outside that specific ecosystem.
ZenithDB is a Rust columnar database engine purpose-built for AI agent observability, ingesting and querying long sparse high-cardinality JSON traces via HTTP/gRPC/OTLP endpoints while speaking both SQL and ZenithQL. Its approach is novel through five workload-specific design choices: PAX segments with trace-locality compaction, late materialization scans, inline Tantivy FTS, offset directories for wide strings, and queryable object-storage WAL. The project addresses the storage pain of emerging
Nushell is a cross-platform shell written in Rust whose pipelines operate on typed, structured data (tables and records) instead of raw text streams, with commands such as ls, ps, and open producing queryable values. It is a meaningful incremental advance over PowerShell's object pipelines, trading Windows-centric COM integration for a lighter, Unix-native design and first-class support for common data formats. Every developer and operator runs a shell daily, so a model that turns ad-hoc text wr
Nitrobrew is a PyTorch library that fuses the unembedding matmul with KL divergence computation for knowledge distillation, iterating over vocabulary chunks with online softmax accumulators to keep memory at O(B·T·chunk_V) instead of materializing full [B,T,V] logit tensors. It applies a targeted fusion of existing online-softmax techniques to the specific on-policy distillation bottleneck for heterogeneous student/teacher hidden sizes and 100k+ vocabularies. The technique solves a painful but狭隘
MuonWarm is a PyTorch optimizer that routes matrix-shaped parameters through a Muon-style path using momentum orthogonalization via Newton-Schulz iterations while handling biases and norms with Adam; it caches a polar factor `muon_warm_q` and refreshes it between periodic full anchors with cheap Jacobi tangent updates plus retraction. The warm-start caching technique that reuses an approximate orthogonal direction instead of recomputing Newton-Schulz every step is a concrete incremental advance,
PiSwift is a Swift port of pi-mono that implements an in-process LLM agent framework where subagents are defined by Markdown files containing YAML frontmatter specifying name, tools, model, and output format, with support for single, parallel, and chained invocation via a dedicated subagent tool. It adds structured agent and prompt template discovery from user and project directories plus strict Swift concurrency guarantees. The approach refines existing Markdown-configured agent patterns frompi
Code release implementing Multi-Stream LLMs on Qwen2.5/Qwen3 backbones, with three self-contained sections: interleaved 2-3 stream packing for efficiency on GSM8K/MATH, multi-stream Alpaca fine-tuning for security benchmarks, and 10-stream Qwen3.5 models with per-stream Gated-DeltaNet states for monitorability. The work introduces parallel streams of thoughts, inputs, and outputs via wait-k data construction and complete weight sharing between streams. Its specialized training pipelines and high
SWE Atlas is a Shell-based benchmark repository that ships curated task data and Modal/harbor run configs for three leaderboards—Codebase QnA, Test Writing, and Refactoring—used to evaluate AI coding agents on professional software-engineering workflows. It introduces a multi-leaderboard framing that explicitly decomposes the software development cycle into complementary capabilities instead of measuring a single isolated skill such as issue resolution. The benchmark therefore targets the fast-w
This is the official repository for the paper From Directions to Regions: Decomposing Activations in Language Models via Local Geometry, providing end-to-end MFA training tutorials in Jupyter notebooks, FSDP multi-GPU code, and pretrained 8k MFA checkpoints for Gemma-2-2B and Llama-3.1-8B layers downloadable from Hugging Face. The work introduces a novel shift from directional to regional decomposition of LLM activations by explicitly modeling local geometry, offering a fresh problem framing and
Howcode is a TypeScript desktop app launched via npx that provides an opinionated environment for coding with the Pi AI, including an inbox, built-in terminal, git-ops composer, comment-based diff review, local sherpa-ONNX voice input, and in-app skill/extension management. It introduces a deliberately editor-free, agent-first workflow that prioritizes rapid YOLO sessions over conventional turn-by-turn editing. The app targets a narrow slice of AI-native developers comfortable with its specific,
scrcpy is a native C application built with FFmpeg, libav and SDL2 that mirrors an Android device's screen and audio over USB or TCP/IP, forwards keyboard/mouse/gamepad input, supports recording and virtual displays, and requires no root or installed app. It popularized a lightweight, low-latency no-root mirroring technique using standard ADB and system APIs, with later extensions such as camera mirroring and HID simulation constituting incremental improvements rather than new primitives. The no
A TypeScript monorepo that runs a read-only Express API and React+Vite frontend to provide local charts, full-text search, media browsing, and contact analytics over a WaCrawl-generated WhatsApp SQLite archive. The implementation follows the established pattern of building dedicated viewers and dashboards for exported messaging databases, adding only conventional UI polish and keyboard shortcuts on top of standard SQLite queries. Its utility is narrowly tied to the small audience already using w
MLS-Bench is a Python benchmark containing 140 tasks across 12 ML research domains that supplies agents with research scaffolds and baselines then scores them on proposing single algorithmic edits (new component, loss, optimizer, or training procedure) whose gains transfer across seeds, datasets, and scales, with Docker/Apptainer/SLURM or local Conda runtimes. It introduces a genuinely new evaluation framing that measures transferable scientific innovation rather than single-instance engineering
SERV is a bit-serial RV32I RISC-V CPU core written in Verilog that fits in 125-239 LUTs on common FPGAs or 2.1 kGE in CMOS, targeting area-constrained FPGA and ASIC designs. Its serial execution technique for a full RISC-V ISA is a genuine departure from conventional parallel pipelines and yields the smallest known compliant open-source core. The approach solves a real but narrow problem of extreme resource limits, limiting mass adoption to specialized embedded, IoT, and FPGA sensor platforms.
Raiden is a Python toolkit for YAM bimanual robot arms that provides the full pipeline of camera calibration, leader-follower teleoperation, multi-camera recording with heterogeneous depth backends, and conversion to synchronized policy-ready datasets. It improves on existing robot data tools through its tight integration of manipulability-aware IK via PyRoki and J-Parse plus automated hand-eye calibration for mixed ZED/RealSense setups. The project remains confined to owners of specific YAM arm
This is a PyTorch library implementing the dynamic sequence chunking mechanism from the H-Net paper, exposing a DynamicSequenceChunker module and an HNet wrapper class that performs learned hierarchical downsampling and upsampling on token embeddings. The approach introduces a genuinely new end-to-end differentiable chunking technique for hierarchical sequence modeling that did not previously exist in this form. It solves a specialized problem in advanced sequence architectures and is therefore,
T3 is a Python repository and accompanying Hugging Face datasets that generate offline thinking-trace corpora from models like Gemini-2-thinking and QwQ-32B, apply three transformations (structural normalization, reflection, semantic distillation) via smaller models, and serve the resulting passages for top-k RAG at inference time on math/code benchmarks. It introduces a new technique of rewriting raw reasoning trajectories into multiple retrieval-optimized forms that expose procedural scaffolds
oMLX is a Python-based LLM inference server for Apple Silicon that uses MLX to deliver continuous batching, a tiered hot-RAM / cold-SSD KV cache with prefix sharing and Copy-on-Write, an OpenAI-compatible API, multi-model serving, and native macOS menu-bar management. The tiered SSD-backed cache that survives restarts and reuses context across requests is a meaningful advance over existing in-memory MLX or vLLM-style servers. The project solves a real pain point for Mac developers who want fast,
MathCode is a terminal AI agent in Python that accepts plain-language math problems, formalizes them into Lean 4 theorems via an LLM planner, and attempts proofs using a persistent Lean REPL together with built-in theorem and axiom libraries. It introduces a combination of persistent Lean REPL, automatic theorem storage with reuse, tree-of-subgoals decomposition, multi-planner strategies, and Obsidian graph visualization that together create a more integrated formal-math proving workflow than ad
DeepWiki-Open is a Python FastAPI backend with Next.js frontend that clones GitHub/GitLab/Bitbucket repositories, builds embeddings, generates documentation and Mermaid diagrams via Gemini/OpenAI/Ollama/OpenRouter/Azure models, and supports RAG chat plus multi-turn DeepResearch. It offers a modest incremental improvement over prior AI documentation generators by adding flexible multi-provider switching and an Ask/DeepResearch interface rather than a new core technique. The idea of turning any代码库
Needle is a 26M-parameter encoder-decoder Simple Attention Network distilled from Gemini 3.1 for single-shot tool calling, with weights and a pure-Python training/inference stack that runs on the Cactus on-device runtime at thousands of tokens per second. The architecture replaces standard FFNs with gated residuals, shared embeddings, and ZCRMSNorm throughout an asymmetric 12-layer encoder / 8-layer decoder, representing a distinct compression and efficiency technique rather than another scaledL
Klarity is a Python toolkit for generative models that combines dual entropy analysis, reasoning step extraction, and visual attention monitoring via analyzers such as EnhancedVLMAnalyzer and ReasoningAnalyzer integrated with Hugging Face generation. It extends established uncertainty quantification by adding VLM-specific attention hooks, CoT token bracketing, and LLM judge layers for hallucination detection. The approach targets ML engineers debugging production models yet faces a ceiling as a,
Lucebox is a C++/CUDA local LLM inference server with custom kernels including a persistent megakernel for Qwen 0.8B on RTX 3090, DFlash+DDTree speculative decoding for 27B GGUF models, and PFlash speculative prefill, all built on a vendored llama.cpp fork with GGUF and TurboQuant KV cache paths. The approach introduces hardware-specific single-dispatch layer execution and tree-structured SSM rollback kernels that extend recent block-diffusion and tree-verify techniques to quantized consumer-GPU
datasette-auth-tailscale is a Datasette plugin written in Python that authenticates users from the Tailscale-User-* identity headers injected by Tailscale Serve when proxying traffic to a localhost-bound Datasette instance. It offers only an incremental integration of an existing Tailscale header-based auth flow into Datasette's established plugin architecture rather than any new technique or problem framing. The requirement for a specific Datasette-plus-Tailscale-Serve deployment stack confines
ELF is the official JAX implementation of Embedded Language Flows, continuous diffusion language models based on flow matching that remain in continuous embedding space until a final shared-weight discretization step to tokens. This yields a genuinely new formulation for diffusion LMs that directly imports image-domain methods such as classifier-free guidance without intermediate discrete operations. The underlying technique addresses a specialized corner of generative text modeling whose path,,
Pixal3D is a Python inference pipeline and Gradio demo built on the Trellis.2 backbone that generates textured 3D meshes from single images by back-projecting pixel features into 3D space. The explicit pixel-to-3D lifting technique supplies direct correspondences that prior attention-only injection methods lack, constituting a clear incremental advance in image-conditioned generation. The method targets a recurring pain point for artists and developers who need high-fidelity single-view 3D, and,
ARC-GEN is a Python library containing procedural generators for each of the original 400 ARC-AGI tasks that can emit arbitrary numbers of input-output example pairs along with parameterizable variations. It is novel in delivering a complete, mimetic generator suite that reproduces the full ARC-AGI-1 corpus while also supporting ARC-AGI-2 tasks and on-the-fly customization for program-synthesis research. The project addresses a narrow but high-value need inside the abstraction-reasoning and AI-b
Algora is an Elixir/Phoenix web app plus GitHub app and payment processor for publishing and managing OSS jobs, contracts, and bounties with self-hosting support. It adds automated OSS-contribution screening and integrated hiring workflows on top of established bounty platforms such as Gitcoin or Bountysource. The service solves a narrow problem for open-source communities and specialized recruiters, capping realistic reach to a subset of developer tooling users rather than becoming a general-a-
Muesli is a native Swift macOS app that delivers local-first dictation via hotkey and meeting transcription with simultaneous mic plus system audio capture, using on-device CoreML models such as Parakeet TDT, WhisperKit, and FluidAudio diarization plus optional local Ollama summaries. It meaningfully improves on prior local STT tools by adding VAD-driven chunk rotation, camera-triggered meeting detection, and a pure-Swift zero-Python architecture rather than simply wrapping existing Python runn.
This is TRL-based Python code that implements On-Policy Self-Distillation Fine-Tuning (SDFT) for reproducing the continual-learning method from the paper, running single-GPU experiments on tool-use and science datasets with Qwen models and LM-Eval-Harness forgetting metrics. The core technique frames in-context learning as an on-policy teacher to generate forward-KL training signals directly from demonstrations, a formulation that did not exist in prior SFT or RLHF pipelines for mitigatingforget
ReplicateAnyScene is a Python framework that converts casually captured videos into compositional 3D scenes via a five-stage cascade pipeline performing textual-visual-spatial alignment in a fully automated zero-shot manner, built on top of models such as SAM3, VGGT, and Qwen3VL. It introduces a genuinely new problem framing and cross-modal cascade that systematically closes alignment gaps between the three modalities for video-to-3D composition rather than extending any single prior technique.
HumanNet is a one-million-hour human-centric video dataset released by PKU researchers, containing paired first- and third-person footage with caption labels, motion annotations, hand/body signals, and a multi-axis taxonomy, accompanied by a curation pipeline and validation code for vision-language-action post-training. It advances prior large-scale video corpora by explicitly targeting scalable egocentric human data as a cost-effective proxy for scarce robot demonstrations, demonstrated through
4DThinker is a Python framework with DIFT fine-tuning, 4DRL (GRPO) optimization, and an annotation-free video preprocessing pipeline that trains Qwen2.5-VL models to perform dynamic spatial reasoning via internally simulated 4D latent imagery tokens. It introduces the first end-to-end method for grounding VLMs in continuous hidden-space 4D dynamics instead of verbose text or external geometry modules, using joint latent-text supervision and outcome-based RL restricted to text tokens. The core 4D
MACE-Dance is a PyTorch codebase implementing a cascaded-expert pipeline that first generates 3D SMPL dance motion from music then synthesizes the final video via a motion-conditioned appearance expert. The approach introduces an explicit motion-appearance decomposition together with 3D rather than 2D intermediate representations, a framing that did not previously exist in this form for music-driven dance generation. The resulting system targets a narrow creative-AI niche whose mass-adoptionpath
MVSplit-DiT is a PyTorch implementation of a 1000-layer diffusion transformer that uses Mean-Variance Split residuals together with RoPE, QK-Norm, and custom Triton kernels for fused RMSNorm, SwiGLU and RoPE, paired with a Qwen3 text encoder and the FLUX.2 VAE for text-to-image sampling. The core novelty is the MVSplit residual formulation that stabilizes training at extreme depth, a technique not present in prior DiT or U-Net architectures. While the method targets a fundamental scaling barrier
Puppeteer is a Python/PyTorch system that takes static 3D meshes, predicts skeletons plus skinning weights via learned models, then optimizes animation from video guidance through a differentiable pipeline; it ships inference code, checkpoints, and an FBX exporter. The approach is novel in tightly coupling automatic rigging with video-conditioned animation in a single end-to-end differentiable framework that removes manual skeleton design. The underlying idea addresses a recurring bottleneck for
UniVidX is a unified multimodal video diffusion framework built on the Wan2.1-T2V-14B backbone that uses Stochastic Condition Masking, Decoupled Gated LoRA, and Cross-Modal Self-Attention to handle 15+ video generation and perception tasks including intrinsic decomposition and alpha matting from text or partial modalities. The approach is novel in training a single model for both generative and analytical video intrinsics with under 1k videos via its custom conditioning and attention mechanisms,
carapace-plugin-sdk is a TypeScript SDK and build-time code generator that lets developers define OpenClaw plugins via definePlugin, supplying TypeBox schemas for config and tool parameters; it emits a typed adapter, a standalone CLI with subcommands, and the openclaw.plugin.json manifest. It offers a modest incremental improvement over existing plugin frameworks by baking schema-to-CLI and schema-to-manifest generation plus reusable CI workflows into a single thin layer. Because it only solves
Odyseus Spatial VLM integrates a VLM with Depth-Anything-3 monocular metric depth to extract 2D targets from natural-language prompts on an input image, sample depths, and project labeled 3D point clouds viewable in Three.js. The combination of off-the-shelf VLM grounding and recent depth estimators is an incremental engineering integration rather than a new model architecture or training paradigm. The resulting tool serves a narrow audience of robotics and embodied-AI researchers who need quick
TACHIOM is a Rust library with Python bindings that builds a fast index for late-interaction multivector retrieval by applying Token-Aware Clustering to allocate centroids proportionally across token types plus hierarchical Product Quantization for reranking. The project introduces a genuinely new Token-Aware Clustering technique that directly addresses token-frequency imbalance, unlike the uniform k-means or IVF methods used by prior multivector indexes such as PLAID. Its applicability is gated
UniCorrn is a PyTorch transformer model with inference scripts and stage-wise training that performs unified dense correspondence across 2D-2D, 2D-3D and 3D-3D inputs, released with CVPR 2026 weights and benchmarks. It introduces a single architecture that fuses features from CroCo-style encoders and handles cross-dimensional matching without separate pipelines. The approach targets 3D vision researchers and robotics teams working on SfM or localization rather than offering a general-purpose SDK
Hunk is a review-first TUI diff viewer written in TypeScript on OpenTUI that renders multi-file Git or Jujutsu changesets with inline AI/agent annotations, split/stack layouts, watch mode, and pager integration. It extends conventional terminal diff tools like lumen or delta by adding first-class support for live agent workflows and contextual annotations rather than introducing a fundamentally new diffing algorithm. The tool targets the rapidly expanding set of developers who generate code via,
nanochat is a minimal Python/PyTorch harness that runs the full LLM pipeline—tokenization, pretraining, finetuning, evaluation, inference, and a ChatGPT-style web UI—on a single GPU node by setting only one --depth flag that auto-derives all other hyperparameters. It is a meaningful incremental improvement over prior educational codebases like nanoGPT by adding a community GPT-2 speedrun leaderboard, automatic compute-optimal scaling, and an end-to-end reproducible script that trains a GPT-2-cap
Proxyline is a Node.js library that installs a process-global proxy router by patching http.request/get, https.request/get, global agents, the undici/fetch dispatcher, and supplying WebSocket and HTTP CONNECT helpers, with managed mode enforcing a fixed proxy URL and ambient mode honoring conventional environment variables. It represents a meaningful incremental improvement over prior proxy-agent and global-agent packages by adding caller-agent override, structured explain() decisions, visible b