Usage
Use case
You use jupyter notebooks and:
- nbconvert to convert
.ipynbfiles to.htmlfiles - nbstripout to avoid committing (potentially sensitive) data to git and get proper
git diffs on notebooks (only showing changes in code).
Forget to run nbconvert or use them in the wrong order (nbstripout before nbconvert) and you will have to re-run your notebooks before you can output HTML, which can be annoying when they are long-running. Especially when you use nbstripout as a pre-commit hook, this can happen quite often.
nb_prep can help to automatically process notebooks and (optionally) store versioned output in an output directory.
Using as a CLI
The CLI command nb_prep process takes a list of directories and/or files to find and process notebooks. For each notebook:
nbconvertis used to create an<filename>.htmlexport- A date prefix is added
YYYYMMDD_<filename>.html(can be turned off) - A placeholder for git hash is added
YYYYMMDD_<filename>_NBCONVERT_RENAME_COMMITHASH_PLACEHOLDER.html - The
.htmlfile is moved to anoutput-dir(if specified) - The
nbstripoutis used to strip output from the.ipynbfile
Now you can git add and git commit the changed notebook files. You can then use nb_prep rename to insert the commit hashes in the notebook filenames. For example:
20220101_my_notebook_NBCONVERT_RENAME_COMMITHASH_PLACEHOLDER.html -> 20220101_my_notebook_eac9e43.html
Tip
Add *.html to a .gitignore in the root of your repository to avoid committing HTML files to git.
Setting up as a pre-commit hook
You can setup this entire workflow once as pre-commit hook, and basically get an up-to-date analysis output directory for free at the specified --output-dir. Schematically:

You need to update the .pre-commit-config.yaml in your repository to include nb_prep:
repos:
- repo: https://github.com/allianz-direct/nb_prep
rev: v1.0.2
hooks:
- id: nb_prep_precommit
- id: nb_prep_postcommit
You need to install the pre-commit and the post-commit hooks separately:
pre-commit install
pre-commit install --hook-type post-commit
When you commit a notebook, you might see something like:
git add notebook.ipynb
git commit -m "Add notebook"
# nb_prep (pre-commit; process notebooks)...................................Failed
# - hook id: nb_prep_precommit
# - files were modified by this hook
# nb_prep (post-commit; replace hash placeholder in .html filenames)........Passed
nb_prep has used nbstripout to overwrite notebook.ipynb. It has also created a file in the output directory named something like 20211026_notebook_NBCONVERT_RENAME_COMMITHASH_PLACEHOLDER.html.
Re-add and re-commit the notebook again:
git add notebook.ipynb
git commit -m "Add notebook"
# nb_prep (pre-commit; process notebooks)...................................Passed
# nb_prep (post-commit; replace hash placeholder in .html filenames)........Passed
Because the output file already exists, nb_prep will not overwrite it, because if we would convert again and output it would be a stripped version without any cell outputs.
Now, you've committed a clean, stripped version of notebook.ipynb. and you have a local snapshot of your notebook named something like 20211026_notebook_eac9e43.html.