Modal is a serverless compute platform built for running Python workloads on GPUs at scale. Designed for AI engineers, Modal lets you deploy LLM inference endpoints, batch processing jobs, and long-running training pipelines without provisioning servers, writing Dockerfiles, or wrestling with Kubernetes. Cold starts measured in seconds, automatic scaling to thousands of GPUs, and a Python-native developer experience make it a favorite for teams shipping production AI applications.
Spin up A10G, A100, H100, and H200 GPUs on demand with sub-second container starts via Modal's custom runtime
Deploy functions, web endpoints, and long-running jobs by decorating Python code — no YAML, no Dockerfiles required
Scales to zero when idle and to hundreds of GPUs under load, with per-second billing for actual compute consumed
Network file systems, distributed dictionaries, and queues for stateful workloads without external services
Turn any Python function into an HTTPS API with FastAPI integration, WebSockets, and streaming response support
Spawn isolated containers for running untrusted user code or agentic tool execution