A Different Kind Of Shell
There is a brewing renaissance in the developer shell ecosystem these days. Lots of exciting projects are maturing, sucking up investment and innovating on a decades old part of the developer workflow. My daily driver is Alacritty under tmux. warp and nu are two other players that are fundamentally changing how we use the shell.
With so many choices it is hard to keep up. The way I approach these shells is as alternative tools to grow my arsenal. Nu shell offers really powerful functions to work with file data, including support for Dataframes and for Excel data. This post will share insights from using nu
to validate data, instead of reaching for more familiar tools like Python or Bash.
Strange Tongues
After a decade of cobbling together snippets in Bash to accomplish tasks like this, the syntax in nu
felt very strange indeed. It comes with a robust command reference that takes some time to use effectively.
My task was to take a complex, large Excel document that is produced by other systems in a distributed platform and validate its data. This task was complicated by a lack of upstream testing due to technical debt on different parts of the platform. While that debt is chipped away at, we needed a short term approach to have confidence in the Excel data.
Within 15 minutes I was able to use the REPL and produce the following:
let raw = (open --raw report.xlsx | from xlsx)
let tab = ($raw | columns | where ($it =~ "Fiscal 2016"))
(
$tab |
each {|x| $raw | get $x} |
get 0 | get column4 |
math round -p 2 |
drop 2 |
skip 2 |
to json
)
A Byte At A Time
Lets break down the above snippet and build it back up again, eating the elephant a byte at a time. In nu
, there is a very clean functional style built around pipelines. Pipelines are inspired by Unix. Think of them like standard pipes on the shell with superpowers.
open --raw report.xlsx | from xlsx
is a pipeline that opens a file called report.xlsx
, piping the raw bytes to the from
command. If you check the command reference, there are a handful of from
commands, all of which take some input schema and output it to a table in nu
.
By wrapping the command in parentheses, we create a subexpression to capture the result of our pipeline.
The following code creates a pipeline in a subexpression, capturing its output into a variable called raw
:
let raw = (open --raw report.xlsx | from xlsx)
Once we have the Excel data read in a table in nu, we can do a ton of different things to it. Using where
and columns
, I can easily grab a reference to the worksheet tab that I care about with the title "Fiscal 2016".
let tab = ($raw | columns | where ($it =~ "Fiscal 2016"))
The worksheet tab we want is now parsed. Lets mangle the data to get the parts we care about (comments added for emphasis):
(
$tab | # grab the worksheet tab we care about
each {|x| $raw | get $x} | # get each row in that tab
get 0 | get column4 | # take the fourth column with the values to check
math round -p 2 | # round the decimal number to 2 positions
drop 2 | # drop the last 2 rows of the table
skip 2 | # skip the first 2 rows of the table
to json # output the data as json
)
So given an input like the following:
1 | 2 | 3 | ||
---|---|---|---|---|
desc | client | totals | ||
Generated from fiscal | 9ab92977-4084-498a-bd75-dc56017355b0 | 42092.992 | ||
Output by aggregate | 3546421c-ba97-4347-ab3c-8c0270e573ce | 325109.12225 | ||
Date: 06/01/2023 | 20048102-788e-4002-ade8-ba2581a0085d | |||
lorem ipsum dolor sit amet |
It would produce:
> nu extract.nu | jq '.[]'
42092.992
325109.12225
This example is somewhat contrived, but hopefully it shows the power of having nu
in your toolchain for simple validations. There are so many things you can do now, like piping the result to aggregate with | math sum
. In one use case, I use nu
to roll-up the aggregations across worksheet tabs to validate a summary tab.
A Realistic Example
My workflow was slightly more involved than this. I combined a more complex version of the script above with a few other tools in my shell.
First, I ran the application to generate reports locally. Then I issued requests to it using httpie. That looked something like this:
time echo '{
"key1": ["param1", "param2"],
"complex_object": {}
}' | http POST http:///localhost:8000/v3/reports/$report_id \
Content-Type:application-json "Authorization: Bearer $(local_token)" --verbose
Note:
local_token
above refers to a shell function inzsh
that contacts the identity provider in a locally deployed Kubernetes cluster and gets back a valid, signed JWT
Then I invoked a helper function in my shell that downloaded the generated report from my local cluster's blob store, which looks something like this:
# Helper function to grab the latest report in local minio.
# Usage: latest_local_report [-q] [---open]
# $1-$2 - `-q` for quiet, otherwise cat out the file
# `--open` to invoke Mac `open`, which opens Excel for xlsx documents
new_report="$(mc ls local/reports --json | jq -r -s '. | sort_by(.lastModified) | reverse | .[0] | .key')";
echo "Found report \"${new_report}\""
filetype="${new_report##*.}";
new_filename="report-$(date +%s).${filetype}";
echo "Downloading as ${new_filename}"
mc cp "local/${new_report}" "./${new_filename}" --quiet 1>/dev/null;
[[ "$1" != "-q" ]] && cat "${new_filename}" || shift;
[[ "$1" == "--open" ]] && open "./${new_filename}";
Putting it all together:
# http call first to generate the report locally (as above, omitted)
latest_local_report -q
nu extract-derivations.nu | jq '.'
Conclusion
By continually sharpening our tools, the scripts and environments we produce can complement each other, building on top of each thing we produce. The scripts and helper functions above created a really nice workflow to dev locally and interact with artifacts in a local Kubernetes environment that mirrors the production environment. This is great for the earliest parts of development as the test harness matures, or a one-off validation to validate local changes.
nu
enables new possibilities, and makes certain things nicer that they are otherwise (especially in Bash). One thing to note in the above is the emphasis that nu shell puts on functional paradigms. The pipelines above are idempotent and fit nicely into ideas like category theory and functors. Once these ideas poison your mind, it is hard to go back :) If you have never explored those concepts, I encourage you to do so! They have made me a stronger programmer with a much deeper appreciation for how I solve problems.
Next time you reach for a problem, see if nu
is a good fit!