LightlyStudio streamlines data curation, labeling, and management for computer vision projects. It provides a high-performance Python interface powered by a Rust backend, enabling users to index, query, and slice massive datasets whether they are stored on local drives or in cloud environments like S3 and GCS. The platform natively supports standard formats including YOLO object detection, COCO instance segmentation, and COCO captions.
The primary advantage of LightlyStudio is its automated data selection. The tool identifies the specific subsets of data that offer the most value by finding samples that are both representative and diverse. This significantly reduces total annotation costs, shortens training cycles, and improves the final quality of the model.
LightlyStudio is compatible with Windows, Linux, and macOS. It requires Python 3.8 or newer.
pip install lightly-studio
Once installed, the environment is ready for use.
To explore the platform's features, you can download the following example datasets:
git clone https://github.com/lightly-ai/dataset_examples dataset_examples
Below are examples of how to load various dataset types.
Image Folders
To manage a directory of raw images, create a file named example_image.py:
import lightly_studio as ls
dataset = ls.Dataset.create()
dataset.add_samples_from_path(path="dataset_examples/coco_subset_128_images/images")
ls.start_gui()
Running python example_image.py will launch the web interface at localhost:8001.
YOLO Object Detection
For YOLO-formatted datasets, create example_yolo.py:
import lightly_studio as ls
dataset = ls.Dataset.create()
dataset.add_samples_from_yolo(
data_yaml="dataset_examples/road_signs_yolo/data.yaml",
)
ls.start_gui()
After running the script, images and their associated bounding boxes will appear in the application.
COCO Instance Segmentation
For instance segmentation tasks, create example_coco.py:
import lightly_studio as ls
dataset = ls.Dataset.create()
dataset.add_samples_from_coco(
annotations_json="dataset_examples/coco_subset_128_images/instances_train2017.json",
images_path="dataset_examples/coco_subset_128_images/images",
annotation_type=ls.AnnotationType.INSTANCE_SEGMENTATION,
)
ls.start_gui()
Upon execution, segmentation masks will be displayed alongside the images.
COCO Captions
For captioning datasets, create example_coco_captions.py:
import lightly_studio as ls
dataset = ls.Dataset.create()
dataset.add_samples_from_coco_caption(
annotations_json="dataset_examples/coco_subset_128_images/captions_train2017.json",
images_path="dataset_examples/coco_subset_128_images/images",
)
ls.start_gui()
The application will display the associated captions in the viewer.
The Python interface allows for programmatic indexing, querying, and manipulation of datasets.
The Dataset Object
The Dataset object is the central hub for your data. You can ingest samples from various sources at any time.
import lightly_studio as ls
dataset = ls.Dataset.create()
# Import from cloud storage
dataset.add_samples_from_path(path="s3://my-bucket/path/to/images/")
# Append data from additional sources
dataset.add_samples_from_path(path="gcs://my-bucket-2/path/to/more-images/")
dataset.add_samples_from_path(path="local-folder/some-data-not-in-the-cloud-yet")
# Load a previously saved database file
dataset = ls.Dataset.load()
The Sample Object
A sample represents an individual data point. You can retrieve or modify its attributes directly.
for sample in dataset:
pass
samples = list(dataset)
s = samples[0]
s.sample_id # UUID
s.file_name # e.g., "img1.png"
s.file_path_abs # Absolute file path
s.tags # List of strings, e.g., ["tag1", "tag2"]
s.metadata["key"] # Access specific metadata
# Modifications
s.tags = {"tag1", "tag2"}
s.metadata["key"] = 123
s.add_tag("some_tag")
s.remove_tag("some_tag")
Dataset Queries
Queries allow you to combine filtering, sorting, and slicing through logical expressions.
from lightly_studio.core.dataset_query import AND, OR, NOT, OrderByField, SampleField
# Identify samples that require labeling or are small and unreviewed
query = dataset.match(
OR(
AND(
SampleField.width < 500,
NOT(SampleField.tags.contains("reviewed"))
),
SampleField.tags.contains("needs-labeling")
)
)
# Sort results by width in descending order
query.order_by(OrderByField(SampleField.width).desc())
# Extract a specific slice
subset = query[10:20]
# Chained operations
query = dataset.match(...).order_by(...)[...]
# Apply actions to query results
query.add_tag("needs-review")
for sample in query:
pass
samples_list = query.to_list()
# Export results for labeling in COCO format
query.export().to_coco_object_detections()
Automated Sample Selection
The core strength of LightlyStudio lies in its ability to isolate the most useful samples for labeling. You can balance two primary signals: typicality (representing common cases) and diversity (representing unique or edge cases).
from lightly_studio.selection.selection_config import (
MetadataWeightingStrategy,
EmbeddingDiversityStrategy,
)
# Calculate the typicality of each sample
dataset.compute_typicality_metadata(metadata_name="typicality")
# Select 10 samples by balancing typicality and diversity
dataset.query().selection().multi_strategies(
n_samples_to_select=10,
selection_result_tag_name="multi_strategy_selection",
selection_strategies=[
MetadataWeightingStrategy(metadata_key="typicality", strength=1.0),
EmbeddingDiversityStrategy(embedding_model_name="my_model_name", strength=2.0),
],
)
By prioritizing the most informative data, you reduce manual labeling effort while ensuring the model gains the necessary knowledge to perform reliably.
Sora 2 AI Watermark Remover: Remove Sora Watermarks Cleanly
Qwen3-ASR-Studio: Real-Time Voice Recognition with PiP Mode
Besnow Cloud VPN: 60% Off Coupon + 30-Day Free Trial
Mars3D Vue Examples: 381 Interactive 3D Map Demos and Live Code Editing
Open Deep Research: Customizable AI Agents for Automated Report Generation
JoyAgent-JDGenie: An Open-Source Multi-Agent System for Direct Report Generation
mRemoteNG Setup: Manage RDP, SSH, and VNC in One Tabbed Console
Mevzuat MCP: Search Turkish Legislation Directly in Claude
BiliNote: Convert YouTube and Bilibili Videos Into Markdown Notes
ACE-Step: 15x Faster Open-Source Music Generation Model
How to Highlight Top 3 and Bottom 3 Bars in an Excel Chart
What Is a Web Accelerator? Speed Up Your Site Without the Hype