Ztorch
Pure Zig machine learning library with comptime optimization and multiple backends
Ztorch is a high-performance ML library built from the ground up in Zig, featuring:
- ๐ฏ Comptime graph optimization - Fuse operations and generate specialized kernels at compile time
- ๐ Multiple backends - CPU (scalar & SIMD), CUDA, ROCm, Vulkan
- ๐ก๏ธ Tiger Style development - Safety first, benchmarked performance, zero technical debt
- ๐งช Test-driven - TDD from day 0, tested on Linux/macOS/Windows, x86_64/aarch64
- ๐ง Clean API - Define models as Zig structs, explicit and ergonomic
Status
v0.1-dev - Early development. Core architecture and CPU backend in progress.
โ
Project architecture defined
โ
Tiger Style development process established
๐ง CPU scalar backend (in progress)
โณ CUDA backend (planned)
โณ ROCm backend (planned)
โณ Vulkan backend (planned)
Quick Start
const std = @import("std");
const ztorch = @import("ztorch");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
// Define model at comptime
const Model = ztorch.Sequential(.{
ztorch.Linear(784, 128),
ztorch.ReLU(),
ztorch.Linear(128, 10),
ztorch.Softmax(),
});
// Compile for backend
var model = try Model.compile(.cpu, gpa.allocator());
defer model.deinit();
// Train
const input = try ztorch.Tensor.randn(.{32, 784});
const labels = try ztorch.Tensor.randint(.{32}, 10);
const output = try model.forward(input);
const loss = try ztorch.crossEntropy(output, labels);
try model.backward(loss);
try model.step(.{ .adam = .{ .lr = 0.001 } });
std.debug.print("Loss: {d:.4}\n", .{loss.item()});
}
Installation
Add to your build.zig.zon:
.dependencies = .{
.ztorch = .{
.url = "https://github.com/mattneel/ztorch/archive/refs/tags/v0.1.0.tar.gz",
.hash = "...",
},
},
In your build.zig:
const ztorch = b.dependency("ztorch", .{
.target = target,
.optimize = optimize,
});
exe.root_module.addImport("ztorch", ztorch.module("ztorch"));
Why Ztorch?
Comptime optimization: Your model structure is known at compile time, so Ztorch can fuse operations, eliminate overhead, and generate optimal kernels for your exact architecture.
Multiple backends: Write once, run on CPU, CUDA, ROCm, or Vulkan. Each backend is optimized for its target.
Proven correct: Every operation is tested against a reference implementation. GPU backends are verified against CPU. No surprises.
Benchmarked: Every optimization is measured. We prove the gains with napkin math and benchmarks.
No magic: Explicit control flow, static allocation, clear error handling. You know exactly what the machine is doing.
Building
# Run tests
zig build test
# Run benchmarks
zig build bench
# Build examples
zig build examples
# Check formatting
zig fmt --check .
Design Principles (Tiger Style)
- Safety - Fixed limits, static allocation, explicit errors, fail fast
- Performance - Napkin math first, benchmark everything, prove all gains
- Developer Experience - Clean API, clear names, good documentation
- Zero Technical Debt - TDD from day 0, do it right the first time