
Command-line data processing tool for CSV, TSV, JSON, and other structured formats with field-aware operations
Miller is a command-line data processing tool that operates on structured data formats including CSV, TSV, JSON, JSON Lines, and positionally-indexed data. It functions as a field-aware alternative to traditional Unix tools like awk, sed, cut, join, and sort, allowing users to reference data by field names rather than positional indices.
The tool excels at data manipulation tasks such as adding computed fields, dropping columns, sorting, statistical aggregation, format conversion, and pretty-printing. Miller handles record-heterogeneous data where different records can have varying schemas, making it well-suited for modern no-SQL data processing scenarios. It supports streaming operations that process one record at a time, enabling work with datasets larger than available RAM.
Miller targets data analysts, system administrators, DevOps engineers, and developers who need to clean, transform, or analyze structured data from log files, databases, or APIs. It complements tools like R and pandas by handling data preparation and reduction tasks, while also serving as a format converter between different structured data representations. The tool maintains high throughput performance comparable to Unix utilities and runs as a single binary with zero runtime dependencies.
# via Homebrew
brew install miller
# via APT
apt-get install miller
# via Yum
yum install miller
