AI & Machine Learning
TorchInductor Adds CuteDSL: SOTA GEMMs on NVIDIA GPUs
Mid-compile, TorchInductor's autotuner fires up CuteDSL — NVIDIA's Python DSL that's quietly rewriting the rules for GEMM kernels. Faster than CUTLASS, just as potent, it's the backend PyTorch devs have been waiting for.