Ztorch

Pure Zig machine learning library with comptime optimization and multiple backends

Ztorch is a high-performance ML library built from the ground up in Zig, featuring:

🎯 Comptime graph optimization - Fuse operations and generate specialized kernels at compile time
🚀 Multiple backends - CPU (scalar & SIMD), CUDA, ROCm, Vulkan
🛡️ Tiger Style development - Safety first, benchmarked performance, zero technical debt
🧪 Test-driven - TDD from day 0, tested on Linux/macOS/Windows, x86_64/aarch64
🔧 Clean API - Define models as Zig structs, explicit and ergonomic

Status

v0.1-dev - Early development. Core architecture and CPU backend in progress.

✅ Project architecture defined
✅ Tiger Style development process established
🚧 CPU scalar backend (in progress)
⏳ CUDA backend (planned)
⏳ ROCm backend (planned)
⏳ Vulkan backend (planned)

Quick Start

const std = @import("std");
const ztorch = @import("ztorch");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();

    // Define model at comptime
    const Model = ztorch.Sequential(.{
        ztorch.Linear(784, 128),
        ztorch.ReLU(),
        ztorch.Linear(128, 10),
        ztorch.Softmax(),
    });

    // Compile for backend
    var model = try Model.compile(.cpu, gpa.allocator());
    defer model.deinit();

    // Train
    const input = try ztorch.Tensor.randn(.{32, 784});
    const labels = try ztorch.Tensor.randint(.{32}, 10);

    const output = try model.forward(input);
    const loss = try ztorch.crossEntropy(output, labels);
    try model.backward(loss);
    try model.step(.{ .adam = .{ .lr = 0.001 } });

    std.debug.print("Loss: {d:.4}\n", .{loss.item()});
}

Installation

Add to your build.zig.zon:

.dependencies = .{
    .ztorch = .{
        .url = "https://github.com/mattneel/ztorch/archive/refs/tags/v0.1.0.tar.gz",
        .hash = "...",
    },
},

In your build.zig:

const ztorch = b.dependency("ztorch", .{
    .target = target,
    .optimize = optimize,
});
exe.root_module.addImport("ztorch", ztorch.module("ztorch"));

Why Ztorch?

Comptime optimization: Your model structure is known at compile time, so Ztorch can fuse operations, eliminate overhead, and generate optimal kernels for your exact architecture.

Multiple backends: Write once, run on CPU, CUDA, ROCm, or Vulkan. Each backend is optimized for its target.

Proven correct: Every operation is tested against a reference implementation. GPU backends are verified against CPU. No surprises.

Benchmarked: Every optimization is measured. We prove the gains with napkin math and benchmarks.

No magic: Explicit control flow, static allocation, clear error handling. You know exactly what the machine is doing.

Building

# Run tests
zig build test

# Run benchmarks
zig build bench

# Build examples
zig build examples

# Check formatting
zig fmt --check .

Design Principles (Tiger Style)

Safety - Fixed limits, static allocation, explicit errors, fail fast
Performance - Napkin math first, benchmark everything, prove all gains
Developer Experience - Clean API, clear names, good documentation
Zero Technical Debt - TDD from day 0, do it right the first time

Keyboard shortcuts