QSV is a powerful command-line toolkit designed for data processing. It efficiently handles CSV files, spreadsheets, databases, and Parquet formats, allowing users to query, slice, index, filter, transform, validate, and join datasets directly from the terminal.
| Command | Functionality |
|---|---|
apply |
Executes string, date, mathematical, or currency operations on a column. It also supports basic NLP tasks like similarity scoring and sentiment analysis. |
count |
Counts rows and generates column width statistics using Polars-driven multithreading. |
diff |
Identifies differences between two CSV files. It can process one million rows in approximately 600ms. |
geocode |
Converts addresses into geographic coordinates using local Geonames and MaxMind databases. It processes up to 360,000 records per second and caches results. |
joinp |
Joins datasets using the Polars engine, supporting non-equi joins and "asof" joins for time-series data. |
sqlp |
Executes Polars SQL queries on CSV files that exceed available RAM, with support for multiple output formats. |
stats |
Generates comprehensive statistics, including percentiles, standard deviation, sparsity, and automated type inference. |
validate |
Validates a CSV against a JSON Schema, checking custom formats and ensuring composite key uniqueness. |
Several installation methods are available to suit different environments.
Pre-built Binaries
Binaries are available for Linux, macOS, and Windows. CPU-optimized builds are provided for x86_64 and Apple Silicon, utilizing SSE4.2, AVX2, or ARM64 NEON instructions where supported. Windows users can use an MSI installer that automatically adds QSV to the system path. On macOS, the quarantine flag must be cleared manually using: xattr -d com.apple.quarantine qsv.
The tool can be kept up to date with the qsv --update command, which automatically verifies package signatures.
Package Managers
yay -S qsvbrew install qsvscoop install qsvBuilding From Source
Building from source requires the Rust toolchain. Run the following command:
cargo install qsv --locked --features all_features
Specific features can be toggled using the --features flag. Slimmer variants are also available:
qsvlite: A stripped-down version, roughly 13% the size of the full binary.qsvdp: Specialized for DataPusher+, containing only essential commands and a lightweight applydp function.Performance
The index command creates a dedicated index file for a CSV. Once indexed, operations like slicing and counting run in constant time. For example, a 15GB dataset containing 28 million rows can be indexed in just 14 seconds.
Commands marked with 🚀 utilize multiple threads even without an index, while those marked with 🏎️ switch to multithreaded mode once an index is detected. Most commands stream data, ensuring they never load an entire file into memory. Giant files can be deduplicated using extdedup, which employs a memory-mapped hash table.
Format Support
QSV provides native support for CSV, TSV, SSV, JSON, JSONL, Excel, and ODS. When Polars features are enabled, it also supports Parquet and Arrow IPC. Additionally, Snappy-compressed files (.sz) are decompressed automatically on the fly.
Output options are equally flexible, including Parquet, PostgreSQL, SQLite, Excel, and Data Packages. The sqlp command can also output data in JSON or Avro formats. QSV expects UTF-8 encoding by default; on Windows, setting the QSV_OUTPUT_BOM environment variable adds a Byte Order Mark to prevent Excel from mangling text.
Scripting and Integration
luau: An embedded DSL for building complex data pipelines within QSV. It includes helper functions for queries and aggregation, performing faster than Python without requiring an external runtime.py: Allows the use of Python expressions for filtering rows or creating new columns (requires Python 3.8+).fetch / fetchpost: Interacts with web services. This can be combined with Jaq JSON queries and MiniJinja templates, with response caching available via Redis or local disk.geoconvert: Facilitates data conversion between GeoJSON, SHP, CSV, and SVG formats.QSV Pro
A premium tier offers a graphical user interface, an API, CKAN integration, and support for natural language queries. This version is verified against Windows Defender and is available via the official website or the Microsoft Store.
Security
Built in Rust, QSV is audited for vulnerabilities. Pre-built binaries are signed using zipsign, and data integrity is maintained through JSON Schema validation and composite key checks.
Big Data Analysis
Using the NYC 311 dataset (15GB, 28 million rows) as a benchmark, QSV completes indexing in 14 seconds. With the index active, the stats command runs in just 7.3 seconds. Commands can be chained using standard pipes; for example: qsv slice -i 5 data.csv | qsv flatten.
Data Cleaning
The dedup command performs in-memory deduplication, while extdedup is designed for files that exceed available RAM. Sorting options include alphabetical, numeric, or random ordering, and sortcheck can verify if a file is already ordered. The input command standardizes malformed CSVs, and fmt allows for the adjustment of delimiters and quoting rules.
Geocoding
The geocode command resolves addresses against a local database, supporting place names in 254 languages. Because it runs locally, it is not subject to API rate limits.
Interactive Exploration
The lens command opens an interactive viewer powered by the csvlens engine, allowing users to filter and scroll through data in real time.
ext prefix (e.g., extsort, extdedup) to avoid memory exhaustion.--features and omit the python feature if it is not required.luau for complex logic; it runs natively within QSV, has no external dependencies, and starts up faster than Python.
Tiny Qwen: A Clean PyTorch Implementation of Qwen3 and Qwen2.5-VL
BananaFace: Open Source AI Stylist for Consistent Character Design
Grey Deer VPN: Residential IPs for Secure Global Access
Lens Desktop Installation Guide: macOS, Windows, Linux
Yank Note Review: A Hackable Markdown Editor That Runs Code
ChatGPT-on-WeChat Setup Guide: Run GPT-4o, Claude & More on WeChat
Windows-Use: Enabling LLMs to Control the Windows GUI Without Vision Models
Gemini-CLI-UI: A Web Interface for the Google Gemini CLI Coding Assistant
Cogency: Build AI Agents in Python with Transparent ReAct Loops
MonkeyCode: Secure Private AI Coding with Integrated Security Scanning & Admin Controls
UTCP Explained: A Universal Tool Calling Protocol for APIs, LLMs, and Beyond
What Is a Web Accelerator? Speed Up Your Site Without the Hype