Command line CSV data manipulation tool for indexing, slicing, analyzing, splitting and joining CSV files
xsv is a command line program for processing CSV files with operations including indexing, slicing, analyzing, splitting, and joining. The tool provides 20+ commands that can be chained together, including count, stats, search, select, sort, join, and frequency analysis. Commands are designed to be simple, fast, and composable.
The tool supports creating indexes for CSV files that enable constant-time slicing operations and significantly faster statistics gathering. For example, with an index, you can instantly extract the last 10 records from a 3+ million row dataset without parsing the entire file. Performance is a key focus - xsv can generate statistics on large datasets significantly faster than comparable tools.
xsv includes specialized commands like frequency table generation, reservoir sampling for random row selection, regex search across fields, and various join operations (inner, outer, cross) using hash indexes. The stats command provides comprehensive column analysis including data types, ranges, means, standard deviations, and cardinality counts. Output formatting is handled by the table command which uses elastic tabstops for proper alignment.
Note: The original xsv project is now unmaintained, with the author recommending qsv or xan as alternatives. This tool would primarily be used by data analysts, researchers, and developers who need to process large CSV datasets from the command line.