Commit 0a446263 authored by Kirill Smelkov's avatar Kirill Smelkov

ERP5 and Jupyter integrated together

This patch teaches ERP5 software release to automatically instantiate Jupyter
notebook web UI and tune it to connect to ERP5 by default. When Jupyter is
enabled, it also installs on-server erp5_data_notebook bt5 (erp5!29)
which handles code execution requested for Jupyter.

For ERP5 - for security and backward compatibility reasons - Jupyter
instantiation and erp5_data_notebook bt5 install happen only if jupyter is
explicitly enabled in instance parameters. The default is not to have Jupyter
out of the box.

On the other hand for Wendelin SR, which inherits from ERP5 SR, the
default is to have Jupyter out of the box, because Wendelin SR is fresh
enough without lots of backward compatibility needs, and Jupyter is
usually very handy for people who use Wendelin.

~~~~

For integration, we reuse already established in ERP5 infrastructure, to
request various slave instances, and request Jupyter in a way so it
automatically tunes and connects to balancer of one of Zope family.

Jupyter code itself is compiled by reusing
software/ipython_notebook/software.cfg, and Jupyter instance code is
reused by hooking software/ipython_notebook/instance.cfg.in into ERP5 SR
properly (the idea to override instance-jupyter not to render into
default template.cfg is taken from previous work by @tiwariayush).

~~~~

I tested this patch inside webrunner with create-erp5-site software type and
various configurations (whether to have or not have jupyter, to which zope
family to connect it, etc).

I have not tested frontend instantiation fully - because tests were done only
in webrunner, but I've tried to make sure generated buildout code is valid for
cases with frontend.

NOTE the code in this patch depends erp5_data_notebook bt5 (erp5!29) which just got merged to erp5.git recently (see erp5@f662b5a2)

NOTE even when erp5_data_notebook bt5 is installed, on a freshly installed ERP5, it
is required to "check site consistency" first, so that initial bt5(s) are
actually installed and erp5 is ready to function.

/cc @vpelletier, @Tyagov, @klaus, @Camata, @tiwariayush, @Kreisel, @jerome, @nexedi
/proposed-for-review-on !43
parent 7b199c9c
......@@ -39,6 +39,7 @@ This software release assigns the following port ranges by default:
zeo 2100-2149
balancer 2150-2199
zope 2200-*
jupyter 8888
==================== ==========
Non-zope partitions are unique in an ERP5 cluster, so you shouldn't have to
......
......@@ -235,6 +235,22 @@
"type": "object"
},
"type": "array"
},
"jupyter": {
"description": "Jupyter slave instance parameters",
"properties": {
"enable": {
"description": "Whether to enable creation of associated slave Jupyter instance",
"default": false,
"type": "boolean"
},
"zope-family": {
"description": "Zope family to connect Jupyter to by default",
"default": "<first instantiated Zope family>",
"type": "string"
}
},
"type": "object"
}
}
}
......@@ -40,6 +40,11 @@
"description": "Relational database access information",
"type": "string"
}
"jupyter-url": {
  • Please take care to not write invalid json's like this one.

  • @rafael thanks for feedback. I'm very newbie to json. Could you please clarify what is wrong here and how one should verify with a tool whether json is valid or not.

    Thanks beforehand,
    Kirill

  • @kirr : Hi, just from a look, I think you missed the comma (,) here after the closing bracket before jupyter-url key.

  • Yes this should be it. For validation there are tools online (e.g. http://jsonlint.com/) and of course if you try to parse this through Python (https://docs.python.org/2/library/json.html) it will crash.

  • @tiwariayush & @georgios.dagkakis thanks for goodspot which I've missied. I've installed jsonlint[1] and went with it over json files:

    kirr@teco:~/src/wendelin/slapos/slapos-master/software/erp5$ for j in *.json; do jsonlint-php $j ; done
    Valid JSON
    Valid JSON
    instance-erp5-output-schema.json: Parse error on line 42:
    ... "string"    }    "jupyter-url": {   
    --------------------^
    Expected one of: 'EOF', '}', ',', ']'
    Valid JSON
    Valid JSON
    Valid JSON
    Valid JSON

    so this is the only mistake (and thanks - next time I'll know how to validate).

    I was going to fix this, but @rafael already did in 4dea2f33.

    Thanks again,
    Kirill

    [1] https://packages.debian.org/sid/jsonlint

Please register or sign in to reply
"description": "Jupyter notebook web UI access information",
"type": "string",
"optional": true
}
},
"patternProperties": {
"family-.*": {
......
......@@ -41,6 +41,10 @@ url = ${slap-connection:server-url}
key = ${slap-connection:key-file}
cert = ${slap-connection:cert-file}
# ERP5 URL to use in Jupyter by default
# default value is empty - which means no default ERP5 URL
configuration.erp5-url =
[instance-parameter]
port = 8888
host = ${slapconfiguration:ipv6-random}
......@@ -141,6 +145,7 @@ rendered = ${directory:erp5_kernel_dir}/ERP5kernel.py
# Use ipython as executable python as we'll be needing requests library in kernel
context =
raw python_executable {{ bin_directory }}/ipython
key erp5_url slapconfiguration:configuration.erp5-url
[kernel-json]
<= dynamic-jinja2-template-base
......
......@@ -43,7 +43,7 @@ md5sum = d7d4a7e19d55bf14007819258bf42100
[erp5-kernel]
<= download-file-base
filename = ERP5kernel.py.jinja
md5sum = da2f592075c414d4bb26cf7a7dfd147b
md5sum = e6d7d9cee8095fb8788af3c6d82b9b6f
[kernel-json]
<= download-file-base
......@@ -60,7 +60,7 @@ recipe = slapos.recipe.template:jinja2
template = ${:_profile_base_location_}/instance.cfg.in
rendered = ${buildout:directory}/template.cfg
mode = 0644
md5sum = 0186ead0c3596e847b69ff8040a43a2f
md5sum = c6b82a386a72ed72301302c3132ffb71
context =
key bin_directory buildout:bin-directory
key develop_eggs_directory buildout:develop-eggs-directory
......
......@@ -9,9 +9,11 @@ import requests
import json
# erp5_url from buildout
# TODO: Uncomment after adding automated installation of erp5-data-notebook bt5
# url = ""
# url = "%s/erp5/Base_executeJupyter"%url
erp5_url = "{{ erp5_url }}"
if not erp5_url:
erp5_url = None
else:
erp5_url = "%s/erp5/Base_executeJupyter" % erp5_url
class MagicInfo:
"""
......@@ -69,9 +71,12 @@ class ERP5Kernel(Kernel):
super(ERP5Kernel, self).__init__(*args, **kwargs)
self.user = user
self.password = password
# Use URL provided by buildout during initiation
# By default use URL provided by buildout during initiation
# It can later be overridden
self.url = url
if url is None:
self.url = erp5_url
else:
self.url = url
self.status_code = status_code
self.reference = None
self.title = None
......
[buildout]
versions = versions
extends =
../../software/ipython_notebook/software.cfg
../../component/fluentd/buildout.cfg
../../component/matplotlib/buildout.cfg
../../component/ipython/buildout.cfg
../../component/pandas/buildout.cfg
../../component/wendelin.core/buildout.cfg
../../component/msgpack-python/buildout.cfg
......@@ -44,6 +42,10 @@ repository_id_list += wendelin
# we need to override it
list = ${erp5:location}/bt5 ${erp5:location}/product/ERP5/bootstrap ${wendelin:location}/bt5/
# Jupyter is by default enabled in Wendelin
[erp5-defaults]
jupyter-enable-default = true
[wendelin]
<= erp5
repository = https://lab.nexedi.com/nexedi/wendelin.git
......
......@@ -49,6 +49,7 @@ extends =
../../component/findutils/buildout.cfg
../../component/userhosts/buildout.cfg
../../component/postfix/buildout.cfg
../../software/ipython_notebook/software.cfg
../../software/neoppod/software-common.cfg
# keep neoppod extends last
......@@ -123,6 +124,15 @@ parts +=
# Create instance template
template
# jupyter
ipython-notebook
instance-jupyter
monitor-eggs
# override instance-jupyter not to render into default template.cfg
[instance-jupyter]
rendered = ${buildout:directory}/template-jupyter.cfg
[download-base]
<= download-base-neo
url = ${:_profile_base_location_}/${:filename}
......@@ -220,7 +230,7 @@ recipe = slapos.recipe.template:jinja2
# XXX: "template.cfg" is hardcoded in instanciation recipe
rendered = ${buildout:directory}/template.cfg
template = ${:_profile_base_location_}/instance.cfg.in
md5sum = 540956c635acc9707045510c11f80016
md5sum = 98a4edfb18cfd810ea570f56d502a2cc
mode = 640
context =
key mariadb_link_binary template-mariadb:link-binary
......@@ -250,6 +260,7 @@ context =
key haproxy_location haproxy:location
key instance_common_cfg instance-common:rendered
key jsl_location jsl:location
key jupyter_enable_default erp5-defaults:jupyter-enable-default
key kumo_location kumo:location
key libICE_location libICE:location
key libSM_location libSM:location
......@@ -283,6 +294,7 @@ context =
key template_create_erp5_site_real template-create-erp5-site-real:target
key template_erp5 template-erp5:target
key template_haproxy_cfg template-haproxy-cfg:target
key template_jupyter_cfg instance-jupyter:rendered
key template_kumofs template-kumofs:target
key template_mariadb template-mariadb:target
key template_mariadb_initial_setup template-mariadb-initial-setup:target
......@@ -314,7 +326,7 @@ rendered = ${monitor-template-dummy:target}
[template-erp5]
<= download-base
filename = instance-erp5.cfg.in
md5sum = 977119d0b876df827c97bb64e6e98273
md5sum = 66edf64eeaecded8977459acb26f4424
[template-zeo]
<= download-base
......@@ -384,6 +396,11 @@ update-command = ${:command}
[erp5_repository_list]
repository_id_list = erp5
# ERP5 defaults, which can be overridden in inheriting recipes (e.g. wendelin)
[erp5-defaults]
# Jupyter is by default disabled in ERP5
jupyter-enable-default = false
[erp5]
recipe = slapos.recipe.build:gitclone
repository = http://git.erp5.org/repos/erp5.git
......
......@@ -5,6 +5,9 @@
{% set inituser_login = slapparameter_dict.get('inituser-login', 'zope') -%}
{% set publish_dict = {'site-id': site_id, 'inituser-login': inituser_login} -%}
{% set has_posftix = slapparameter_dict.get('smtp', {}).get('postmaster') -%}
{% set jupyter_dict = slapparameter_dict.get('jupyter', {}) -%}
{% set has_jupyter = jupyter_dict.get('enable', jupyter_enable_default).lower() in ('true', 'yes') -%}
{% set jupyter_zope_family = jupyter_dict.get('zope-family', '') -%}
[request-common]
<= request-common-base
config-use-ipv6 = {{ dumps(slapparameter_dict.get('use-ipv6', False)) }}
......@@ -119,7 +122,11 @@ name = neo-${gen-neo-cluster-base:passwd}
return =
zope-address-list
hosts-dict
config-bt5 = {{ dumps(slapparameter_dict.get('bt5', 'erp5_full_text_myisam_catalog erp5_configurator_standard erp5_configurator_maxma_demo erp5_configurator_ung erp5_configurator_run_my_doc')) }}
{% set bt5_default_list = 'erp5_full_text_myisam_catalog erp5_configurator_standard erp5_configurator_maxma_demo erp5_configurator_ung erp5_configurator_run_my_doc' -%}
{% if has_jupyter -%}
{% set bt5_default_list = bt5_default_list + ' erp5_data_notebook' -%}
  • Given the warning about security in the description of erp5_data_notebook, installing it by default is questionable, in particular now that automatic creation of site is going to be the default.

    /cc @Tyagov @luke @Thetechguy

  • Indeed questionable, still access from Jupyter to ERP5 is user / password protected and in reality in its current implementation ERP5Kernel (i.e. Jupyter) is just making a HTTP requests to ERP5. Same can be done with any client or python code or browser, etc. Then question is: is HTTP user/password good protection ?

  • I'm not a Jupyter/ERP5 user, but I recall that Jupyter can also open shells. Are shells user/password-protected too ? Or did we disable this feature ?

  • @Nicolas , thanks. that's a good catch. I think that Python2 Kernel must be disabled (or changed?) as it can execute shell commands and it can also use os module to start processes. Of course this happens under current user running Jupyter (which itself is password protected) but still it is not safe.

  • I would like any person who is installing ERP5 to be able to use Jupyter together with ERP5 without any significant effort. The way this should be achieved can be discussed. For example, Jupyter runtime can run in a separate partition and can be instantiated only under certain conditions. ERP5 Jupyter kernel should be present by default, but it may need some steps to be activated (ex. adding an Extension in portal_components).

    The resulting situation should be that if one decides to use Wendelin, slapos request will do what has to be done to get Jupyter running and Wendelin configurator will also do what has to be done in terms of component installation for Jupyter to call ERP5 properly. Users should then get an "all-in-one" solution to start doing data sciences with Jupyer as UI and ERP5 as storage / processing.

    And if one does not want Jupyter to run around, this should also be possible.

    I view the use of Jupyter as something that we are going to do increasingly in all our work. How this will be done is still open. Luke should have good ideas about this.

    I am for example OK to run Jupyter in a seperate SlapoS instance (just like a Webrunner) because there are many uses of Jupyter beyond EPR5 and also because Jupyter will eventually replace Webrunner.

    I am also OK to run Jupyter as a library inside ERP5 because it can be very useful to produce certain types of reports using activities.

    I hope this gives some hints.

Please register or sign in to reply
{% endif -%}
config-bt5 = {{ dumps(slapparameter_dict.get('bt5', bt5_default_list)) }}
config-bt5-repository-url = {{ dumps(slapparameter_dict.get('bt5-repository-url', local_bt5_repository)) }}
config-cloudooo-url = ${request-cloudooo:connection-url}
config-deadlock-debugger-password = ${publish-early:deadlock-debugger-password}
......@@ -150,10 +157,17 @@ config-tidstorage-port = ${request-zodb:connection-tidstorage-port}
software-type = zope
{% set zope_family_dict = {} -%}
{% set jupyter_zope_family_default = [] -%}
{% for custom_name, zope_parameter_dict in slapparameter_dict.get('zope-partition-dict', {'1': {}}).items() -%}
{% set partition_name = 'zope-' ~ custom_name -%}
{% set section_name = 'request-' ~ partition_name -%}
{% do zope_family_dict.setdefault(zope_parameter_dict.get('family', 'default'), []).append(section_name) -%}
{% set zope_family = zope_parameter_dict.get('family', 'default') -%}
{# # default jupyter zope family is first zope family. -#}
{# # use list.append() to update it, because in jinja2 set changes only local scope. -#}
{% if not jupyter_zope_family_default -%}
{% do jupyter_zope_family_default.append(zope_family) -%}
{% endif -%}
{% do zope_family_dict.setdefault(zope_family, []).append(section_name) -%}
[{{ section_name }}]
<= request-zope-base
name = {{ partition_name }}
......@@ -168,6 +182,12 @@ config-port-base = {{ dumps(zope_parameter_dict.get('port-base', 2200)) }}
config-webdav = {{ dumps(zope_parameter_dict.get('webdav', False)) }}
{% endfor -%}
{# if not explicitly configured, connect jupyter to first zope family, which -#}
{# will be 'default' if zope families are not configured also -#}
{% if not jupyter_zope_family and jupyter_zope_family_default -%}
{% set jupyter_zope_family = jupyter_zope_family_default[0] -%}
{% endif -%}
{# We need to concatenate lists that we cannot read as lists, so this gets hairy. -#}
{% set zope_address_list_id_dict = {} -%}
{% set zope_family_parameter_dict = {} -%}
......@@ -190,6 +210,20 @@ config-url = ${request-balancer:connection-{{ family_name }}-v6}
{% endif -%}
{% endfor -%}
{% if has_jupyter -%}
{# request jupyter connected to balancer of proper zope family -#}
{{ request('jupyter', 'jupyter', 'jupyter', {}, key_config={'erp5-url': 'request-balancer:connection-' ~ jupyter_zope_family}) }}
{% if has_frontend -%}
[frontend-jupyter]
<= request-frontend-base
name = frontend-jupyter
config-url = ${request-jupyter:connection-url}
{# # override jupyter-url in publish_dict with frontend address -#}
{% do publish_dict.__setitem__('jupyter-url', '${frontend-jupyter:connection-site_url}') -%}
{% endif -%}
{%- endif %}
{% set balancer_dict = slapparameter_dict.get('balancer', {}) -%}
[request-balancer]
<= request-common
......
......@@ -64,6 +64,7 @@ extra-context =
import urllib urllib
[dynamic-template-erp5-parameters]
jupyter-enable-default = {{ jupyter_enable_default }}
local-bt5-repository = {{ local_bt5_repository }}
[dynamic-template-erp5]
......@@ -71,6 +72,7 @@ local-bt5-repository = {{ local_bt5_repository }}
template = {{ template_erp5 }}
filename = instance-erp5.cfg
extra-context =
key jupyter_enable_default dynamic-template-erp5-parameters:jupyter-enable-default
key local_bt5_repository dynamic-template-erp5-parameters:local-bt5-repository
import urlparse urlparse
import-list =
......@@ -177,6 +179,11 @@ filename = instance-create-erp5-site.cfg
extra-context =
section parameter_dict dynamic-template-create-erp5-site-parameters
# we need this value to be present in a section,
# for slapos.cookbook:switch-softwaretype to work
[dynamic-template-jupyter]
rendered = {{ template_jupyter_cfg }}
[switch-softwaretype]
recipe = slapos.cookbook:switch-softwaretype
override = {{ dumps(override_switch_softwaretype |default) }}
......@@ -195,3 +202,4 @@ postfix = dynamic-template-postfix:rendered
zodb-zeo = dynamic-template-zeo:rendered
zodb-neo = neo-storage-mysql:rendered
zope = dynamic-template-zope:rendered
jupyter = dynamic-template-jupyter:rendered
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment