This server includes private MRI and microscopy datasets, which have been curated and organized according to the BIDS convention.
git+ssh://data.neuro.polymtl.ca has a max size of ~1TB.
It hosts BIDS datasets, version-controlled using
It is locked behind a VPN because much of our data is under medical ethics protections, and needs to be kept off the general internet.
You must have a *nix OS with
Make sure you have an ssh key.
If not, run
ssh-keygen -t ed25519 -C firstname.lastname@example.org. Your keys will be in the hidden folder
Getting an account#
Send your ssh public key – that is, the contents of
~/.ssh/id_ed25519.pub (the .pub file) – to one of the server admins and ask them to create your account.
A pubkey should look like
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDE+b5vj+WvS5l6j56NF/leMpC2xT7JUCMUWDAqvWoVmNZ7UR3dGXQeTPTlmPmxPGD2Hk9/zFzxO2kYOt9o4lHQ0QQSKLUmTyuieyJE26wL1ZiLilmTgvgMxxkxvInF/Vr78V5Ll72zAmXzUxVSvuDGY2GRjnLreYheiqg1F3xTuD68uWInX8ZwA7NDtKpoZ7Aat063vD79WBrtiCfvAMbM8QhC3294zxqAjjy9fxs+TMTqAxtKdaWCA/eCs7sx9uvtFcj2Q9jxCMB3br5HyPLotgJMoIMt+fywj+vQG907LODRcqm9J0+ih+38/3Y6aqECMkHA9WWIfFywwjeA7EGr email@example.com
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJwsjlem+acuTOZGyNQKjyI7kJe9ULkhZo7N04QfC/tA firstname.lastname@example.org
Current server admins are:
The admins should follow Admin Guide > Add Users to create your account.
Because this server contains private medical data, you need to be on campus, connected to the VPN, or working from a server on campus, like
rosenberg to access it.
If connecting from off-campus, connect to polyvpn.
🏚️ Verify connectivity by running
ping data.neuro.polymtl.ca. If you cannot ping then you need to double-check your VPN connection; make sure it is connected, make sure you can reach
joplin, and if it still isn’t working ask the Poly network admins to unblock your account from this server.
Verify you can use the server by running
ssh email@example.com help. If it hangs, triple-check again your VPN. If it asks for
firstname.lastname@example.org's password, double-check that
ls -la ~/.ssh shows permissions of
drwx------ for the
. folder, and that the files
id_rsa.pub) exist with exactly those names. A successful connection looks like:
$ ssh email@example.com help Enter passphrase for key '/home/kousu/.ssh/id_ed25519.neuropoly': hello yourusername, this is git@data running gitolite3 3.6.11-2 (Debian) on git 2.27.0 list of remote commands available: D create desc git-annex-shell help info keys perms readme writable
During daily usage, you will need to be on the polyvpn network.
You should also make sure to configure git annex for the best performance.
To see what datasets you have available, use
info, for example:
ssh firstname.lastname@example.org info
And the output would look like this:
hello yourusername, this is git@data running gitolite3 3.6.11-2 (Debian) on git 2.27.0 R datasets/..* R datasets/basel-mp2rage R W datasets/bavaria-quebec-spine-ms ...
You are identified to the server by your ssh keys, but notice that this tells you the username you are known as.
To download an existing repository use
git clone email@example.com:datasets/sct-testing-large # download folders and metadata cd sct-testing-large git annex get . # download images
If you just want to explore, you can opt for a portion of the image files by specifying paths instead of the last step, for example:
git annex get sub-karo* # download images under any of sub-karo*/*
If you have already cloned a repository and you would like to get its latest version, do:
git pull && git annex sync --no-content && git annex get .
Despite not being hosted on Github, we are still using a pull-request workflow.
So, to make changes to a dataset, first ask an admin to grant you upload rights, then make a working branch for your changes. If your initials are
xy and you are working on
git checkout -b xy/some-topic # Edit your files, add new ones, etc. # Add all modified files to be commited git add . # To add specific files, do: git add path/to/new/file # Commit and write a useful commit message git commit
The first time before uploading, verify you have access with
info. You need “W” (for “Write”) permission, like this:
ssh firstname.lastname@example.org info datasets/uk-biobank
The output would look like:
hello yourusername, this is git@data running gitolite3 3.6.11-2 (Debian) on git 2.27.0 R W datasets/uk-biobank
Once you have access you can:
git annex copy --to=origin git annex sync --no-content --only-annex git push
Finally, ask one of that dataset’s reviewers to look at your pull request by opening an issue on neuropoly/data-management.
Reviewing Pull Requests#
If someone asks you to review their changes on branch
git fetch git checkout xy/some-topic git annex get .
Then look at the branch to see if it looks right to you.
To investigate what changed:
git log --stat master..HEAD # to see filenames git log -p master..HEAD # to see content, commit-by-commit git diff master..HEAD # to see content, overall
Also, it’s a good idea to run:
git annex whereis
To check that all the annexed files have been uploaded.
git-annexis not well-suited to a pull-request flow. It is mostly designed for a single person to share data among many computers, not for multiple people to share data between a few computers. We can make it work but it needs some patience. Have a cat to make it better: 🐈🌺
In order to join this group, someone already in it needs to grant you access:
ssh email@example.com perms datasets/my-new-repo + OWNERS yourusername
You can check if you have commit rights to a dataset “my-new-repo” by seeing if you appear in the group:
ssh firstname.lastname@example.org perms datasets/my-new-repo -l | grep OWNERS
Once a branch is finalized:
git checkout master git merge --ff-only xy/some-topic # or use git pull --squash xy/some-topic git push # no need for git-annex sync here, no annex files have been moved
(Optional) Clean up the branch:
git branch -d xy/some-topic git branch -d synced/xy/some-topic # redundancy git push origin :xy/some-topic git push origin :synced/xy/some-topic
To make a new repo, follow this recipe.
Then, to upload it, pick a name under
datasets/, e.g. “my-new-repo”, and do
git remote add origin email@example.com:datasets/my-new-repo git branch -M master git push -u origin master # initialize remote and upload metadata git annex sync --cleanup -a --no-content # initialize remote annex git annex copy --to origin # upload images to remote annex # verify your .nii.gz files were annexed and uploaded git annex whereis
To make a release, use an annotated git tag. Use the tag name for the name of the release, and the annotation for the release notes. Our naming convention for datasets is “rYYYYMMDD”.
For example, if today is September 8th, 2019, then to create a release do:
git tag -a r20190908
To view available releases, first download a dataset, then run
git tag -l
To see the release notes for a specific release, use
git show r20190908
To use a specific release, either download the dataset and then
git checkout r20190908
or, for example in a reproducible processing script, you can use
clone -b to download only that specific release:
git clone --depth 1 -b r20190908 firstname.lastname@example.org:datasets/example.git
You can grant others permissions to your repositories with
ssh email@example.com perms datasets/my-new-repo + WRITERS someone # grant someone upload rights ssh firstname.lastname@example.org perms datasets/my-new-repo - WRITERS someone # revoke someone's upload rights ssh email@example.com perms datasets/my-new-repo + OWNERS researcher2 # grant someone rights to add (and remove) others and to merge to master ssh firstname.lastname@example.org perms datasets/my-new-repo -l # view users ssh email@example.com perms datasets/my-new-repo -lr # view access rules
ssh firstname.lastname@example.org perms -h
and see https://gitolite.com/gitolite/user#setget-additional-permissions-for-repos-you-created for full details.
If you created or own a repo and decide it is no longer necessary:
ssh email@example.com D trash repo
The “trash” is cleaned out after a week. Except it’s not, yet: https://github.com/neuropoly/data-management/issues/54
Add extra devices#
Like with Github, you can authorize any number of secondary devices.
For example, to authorize yourself from
server2, log in to
server2 and make an ssh key if one doesn’t exist (
ssh-keygen), copy it (
~/.ssh/id_rsa.pub) to a device that is already authenticated (e.g. as
~/id_rsa.server2.pub), then authorize yourself by:
cat ~/id_rsa-server2.pub | ssh firstname.lastname@example.org keys add @server2
Test it by running, from
ssh email@example.com info
Datasets are stored as git repositories on the server, with the bulk of their data also stored on the server in each repo’s “annex” folder. Using
git-annex enables data on-demand – in our default configuration, only the data needed for the active branch is actually downloaded by a user, and it is also possible for the user to choose specific folders to focus on. Datasets are
git-annex ssh remotes.
gitolite manages users and their permissions. The repositories containing datasets are under
data.neuro.polymtl.ca:datasets/*, and the server also contains a few admin-only repositories outside of
The VM is monitored here (requires VPN to connect to the dashboard monitor).
ssh firstname.lastname@example.org keys list
To grant access to a lab member, as above, ask the lab member to generate an ssh key using
ssh-keygen and have them send you the public key. Save it to a file
firstnamelastname.pub and add them with
cat firstnamelastname.pub | ssh email@example.com keys add firstnamelastname
You can also paste the key in, followed by
ctrl-d; this looks like:
ssh firstname.lastname@example.org keys add firstnamelastname
The output looks like:
Enter passphrase for key '/home/kousu/.ssh/id_rsa.github': please supply the new key on STDIN (e.g. cat you.pub | ssh email@example.com keys add @laptop). ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAID11N3hQpJP4Okivd5xO3N0CuO24ioMwXYv+l/1PM/+z firstname.lastname@example.org Added SHA256:hwil2tmaw/prgIBX5odO8vOAj2i38gPrUGjGZnnkVvo : firstnamelastname.pub
You should use the person’s full name as their username, in the form
firstnamelastname, with no spaces or periods or anything. It’s essentially an arbitrary string that the user doesn’t really need to know, since everyone is authenticated using just their public/private keys without supplying a username. The only time users see them is when they run
info or use
perms. We would like to use the format email@example.com, but there is a bug, so just use
firstnamelastname. Once someone is registered they can add and remove their own keys without having to know their username.
As admin, you can add or revoke any permissions to any repo using
There is unfortunately no way to view permissions as another user so you will need to rely on people sending you screenshots if they are having problems but you can at least inspect the active sets of permissions on a repo with
ssh firstname.lastname@example.org perms <repo> -l
If you need to add new namespaces or finer grained permissions, first, reconsider if the extra complexity and the risk of locking yourself out is worth it. Everything you should need to manage the lab should be doable via
ssh email@example.com help. If you are sure, then review gitolite’s permissions model and official docs for this use case, then:
git clone firstname.lastname@example.org:gitolite-admin cd gitolite-admin vi conf/gitolite.conf # optional: investigate/change the repo definitions ls -R keydir/ # optional: investigate/change who has access; this *should* be unnecessary, use `keys` as above instead. git add -u . && git push
As an admin, you can rename a repo by connecting to the server directly:
ssh email@example.com sudo -u git -i cd repositories/datasets/ mv $dataset.git $new_name.git
You can also delete any repo using
You can also get rid of a dataset immediately by:
ssh firstname.lastname@example.org D unlock datasets/<dataset> ssh email@example.com D rm datasets/<dataset>
Backups are automatically made to MIC-UNF’s servers.
except they’re not, yet: https://github.com/neuropoly/data-management/issues/20
You can access these if you need to recover by:
If you are having a problem, please open an issue here. Please don’t be shy, if you don’t report the issue, we won’t know about it and it will never be solved 😉
If the server is doing something strange, contact someone with sysadmin-access to the server.
These people can investigate by following the gitolote guide in the sysadmin docs.
Patel, Hiren - Wildrepos in Gitolite – detailing how a research lab manages their code and publications collaboratively through