"lbsh" (also called Pound-Shell) is a shell
monitor. It stands for lab-book shell. It is designed
to act as an automated lab-book for people that conduct
experiments in their standard shell environments. The
high-level goal of Pound-Shell is to allow users to log
data processing experiments (with little/no manual
effort or interference) in such a way that they can later
go back and recall (or rediscover) and reproduce their
steps. Basically, it's designed to help us determine our
own data's provenance. Further, Pound-Shell allows users to almost
automatically re-run old experiments with new data. This
type of work includes processing files with R, using
gnuplot, interactive perl/python, standard data processing
with shell commands (awk, sort, join, etc.), etc. Pound-Shell
does not attempt to trace commands if it detects
that a program has "captured the screen". Examples include
Vim, emacs, etc. In such cases, Pound-Shell waits for
the sub-process to end and then resumes recording.
With Pound-Shell, you can send a colleague your data-set(s) and
your lbsh lab-book, and they can understand/reproduce/verify your results
with Pound-Shell's automated tools.
In essence, you simply run "lbsh" and it automatically creates a
new shell (the same type [bash | tcsh | etc.] that you already
use) in a child process. Figure 1
shows how your session goes through
Pound-Shell to the real shell underneath. When you decide you
want to start logging commands for an experiment, you
tell Pound-Shell to start logging. Then Pound-Shell simply
records your commands (and also traces into the above mentioned
tools). It does not record passwords, or anything else
that is not output the screen. You can use tab-completes, issue
commands that span multiple lines, use shell history, etc.
When you are done, you tell
Pound-Shell to stop logging. Then you may do whatever and
Pound-Shell will quietly wait. You may repeat as needed
without having to restart Pound-Shell each time.
Pound-Shell is intended to provide the same utility as a
scientist's lab-book. After processing and analyzing and
producing data, it is often very desirable to know
``exactly'' what one did to arrive at their current state.
Sometimes we would like to recreate an old graph with
newer data, but it's been so long we're not sure how we
got it the first time. In other fields (such as the
natural sciences) scientists will take notes, but we often
type very fast and pursue numerous fruitless avenues.
Pound-Shell allows users to log each of their sessions (even as
they enter into other programs listed above) and recreate
an entire session to arrive at the same results.
The advances that Pound-Shell offers are:
it is able to provide its utility without requiring
the user to behave any differently. In a sense, users
are free from needing to learn the diligence practiced
by other scientists in their labs. In other words, it
should make people's lives easy w/o being painful.
it has processing scripts (file-provenance.pl,
exeggutor.pl, etc.) that allow one to specify a file
name (such as a graph) and see all of the experiments
that were needed to derive its data. The history
(or provenance) of files can be seen as an ordered list
of experiment IDs or as a Dataset Derivation Graph (or
DDG), like this one.
it can be used to re-run experiments with new or old
data to update findings or reproduce prior results using
a file's provenance.