All Posts programming execute SQL-like queries on CSV and TSV files

execute SQL-like queries on CSV and TSV files

ยท 320 words ยท 2 minute read

What is CSV and TSV ? ๐Ÿ”—

CSV stands for Comma-Separated Values. It is a plain-text file format.

TSV stands for Tab-Separated Values. It is a plain-text file format.

CSV is more common than TSV as a plain-text dataset format. Both are usually used for light weight data exchange. They can be used as a spreadsheet formats; as they are compatible with Microsoft Excel format (.xls, .xlsx) and LibreOffice Calc format (.ods).

What is q ? ๐Ÿ”—

q is a tool to execute SQL-like queries on CSV and TSV files.

More information: https://harelba.github.io/q .

Usage ๐Ÿ”—

Query a CSV file by specifying the delimiter as ‘,’:

q [-d|--delimiter] ',' "SELECT * from path/to/file"

for example:

q -d ',' "SELECT * from path/to/file"

Query a TSV file:

q [-t|--tab-delimited] "SELECT * from path/to/file"

for example:

q -t "SELECT * from path/to/file"

Query file with header row:

q [-d|--delimiter] delimiter [-H|--skip-header] "SELECT * from path/to/file"

for example:

q -d ',' -H "SELECT * from path/to/file"

Read data from stdin ; ‘-’ in the query represents the data from stdin:

output | q "select * from -"

Join two files (aliased as f1 and f2 in the example) on column c1, a common column:

q "SELECT * FROM path/to/file f1 JOIN path/to/other_file f2 ON (f1.c1 = f2.c1)"

Format output using an output delimiter with an output header line (Note: command will output column names based on the input file header or the column aliases overridden in the query):

q [-D|--output-delimiter] delimiter [-O|--output-header] "SELECT column as alias from path/to/file"

example:

q -D ',' -O "SELECT column as alias from path/to/file"

I hope you enjoyed reading this post as much as I enjoyed writing it. If you know a person who can benefit from this information, send them a link of this post. If you want to get notified about new posts, follow me on YouTube , Twitter (x) , LinkedIn , and GitHub .