Jupyter notebooks and version control
Jupyter notebook is a wonderful tool to do rapid prototyping in Python and create quick visualisations of the data you’re working with. You can also share the resulting notebooks with other people to show what you’ve down. However, as nice as it is to work with, it is hard to work with collaboratively. The files themselves are an arcane JSON format, which is a wonderful recipe to create merge conflicts.
Fortunately there is an extension called jupytext to remedy this. This tool allows you to save your notebooks in various alternative functions. These formats generally do not store the intermediate result data, so when you your documents after a merge, you will need to recompute things. In this post, I will show how I set it up.
Installing jupytext
You can install jupytext from pip
or from your OS’s software repositories,
whichever you prefer. After installing it, you can launch jupyter notebook
and view it in your browser.
When you create a new notebook, you should see a new in the File
menu. Here,
you have a few options, but most importantly, you can see the option to pair a
notebook with a certain file type. This is where the magic happens.
Note: if this menu option does not appear, you may need to add
c.NotebookApp.contents_manager_class = "jupytext.TextFileContentsManager"
to
your jupyter config.
Pairing notebooks
Pairing a notebook makes your notebook effectively save in multiple formats. The
.ipynb
file will contain your notebook as you are used to, but the other file
of the pair will also contain a perfectly editable version of your work.
The context menu doesn’t give you every option (at the time of writing) available though; you can also just save your notebooks as actual valid and executable python files! This is available as what’s called a “Custom pairing” and you can make them by adding the following entry to your notebook’s metadata:
1
2
3
4
5
6
{
"jupytext": {
"encoding": "# -*- coding: utf-8 -*-",
"formats": "py:light,ipynb"
}
}
Now, when you save your file, a secondary .py
file will appear that contains
all of your comments and code.
All that remains is…
Setting up Git for your notebooks
To have your project nicely versioned by Git, you simply have to add the python
files to git and ignore the .ipynb
and related files. You can make your own
life a little easier by writing a .gitignore
.
1
2
.ipynb_checkpoints
*.ipynb
Combining Jupyter notebook with version control is still not the easiest. Merges often mess-up the splits between blocks and merge conflicts are still a pain (but not impossible!) to resolve, but when you edit unrelated parts it tends to work well so you can mostly collaborate on notebooks with git. If you have a better idea, hit me up.