
Open source text annotation tool for machine learning
doccano is an open-source, web-based annotation tool for machine learning practitioners. It supports text classification, sequence labeling, and sequence-to-sequence tasks, making it ideal for building labeled datasets for NLP models like sentiment analysis, named entity recognition, and text summarization.
Label documents with categories for sentiment analysis, topic classification, and spam detection tasks
Annotate entities within text for named entity recognition (NER), part-of-speech tagging, and chunking
Create paired text annotations for machine translation, text summarization, and paraphrase generation
Multiple annotators can work concurrently on the same project with role-based access control
Programmatic access for data upload, annotation export, and importing model predictions for pre-labeling
Support for various data formats including JSON, JSONL, CSV, CoNLL, and plain text
Create a project, upload data, and start annotating in minutes with Docker or pip install
Create labeled datasets for training custom NLP models including sentiment analysis, NER, and text classification
Academic and research teams annotating text corpora for linguistics, social science, and medical NLP studies
Use the REST API to integrate annotation workflows into automated machine learning pipelines with pre-labeling support

AI-powered SQL client that turns natural language into database queries