
First open-source instruction-tuned LLM for commercial use
Dolly is an open-source, instruction-following large language model developed by Databricks, notable as the first commercially licensed instruction-tuned LLM. Built on EleutherAI's Pythia architecture and fine-tuned on approximately 15,000 instruction-response pairs generated by Databricks employees, Dolly demonstrated that effective instruction-following could be achieved with relatively modest fine-tuning data. Available in three model sizes (2.8B, 6.9B, and 12B parameters), Dolly enables organizations to deploy AI-powered chatbots, summarizers, and Q&A systems without licensing costs or API dependencies. The model supports seven capability domains: brainstorming, classification, closed-book QA, generation, information extraction, open-book QA, and summarization. Databricks has been transparent about Dolly's limitations — it struggles with complex syntax, programming, mathematics, and factual accuracy. The project entered maintenance mode in April 2023, with Databricks shifting development to DBRX, their original LLM built from scratch. Dolly remains an important historical milestone in the open-source AI ecosystem.
Choose from 2.8B, 6.9B, or 12B parameter versions for flexible deployment
Fine-tuned on 15,000 human-generated instruction-response pairs across 7 domains
Apache 2.0 licensed with explicit commercial use permissions
databricks-dolly-15k dataset freely available for research and fine-tuning
Optimized for deployment on A100, A10, and V100 GPUs
Can be further fine-tuned on proprietary domain-specific data
Works with HuggingFace Transformers, Ollama, and vLLM
Supports brainstorming, Q&A, summarization, and information extraction