Commit 47644132 authored by Alain Takoudjou's avatar Alain Takoudjou

monitor: move scripts wrapper and logrotate conf to buildout, uses some new...

monitor: move scripts wrapper and logrotate conf to buildout, uses some new promises from slapos.toolbox

monitor was updated in slapos.toolbox to not generate promise launcher scripts anymore. All
generated scripts are now in buildout.
Monitor promise run script is removed from cron, slapgrid is used to run promises.
Replace some old promises by the new ones from slapos.toolbox. Cleanup monitor configuration.
Monitor report and monitor-promises directory are now obsolete.
parent 5d1d3c72
......@@ -86,33 +86,28 @@ wrapper-path = ${monitor-directory:reports}/monitor-collect-csv-dump
[monitor-check-cpu-usage]
recipe = slapos.cookbook:wrapper
command-line = ${monitor-directory:bin}/python {{ monitor_check_system_health }} cpu ${init-monitor-parameters:cpu-load-file}
wrapper-path = ${monitor-directory:promises}/system-CPU-load-check
wrapper-path = ${directory:promises}/system-CPU-load-check
[monitor-check-memory-usage]
recipe = slapos.cookbook:wrapper
command-line = ${monitor-directory:bin}/python {{ monitor_check_system_health }} mem ${init-monitor-parameters:mem-free-file} ${directory:monitor}
wrapper-path = ${monitor-directory:promises}/system-MEMORY-usage-check
command-line = {{ buildout_bin}}/check-computer-memory
-db ${monitor-instance-parameter:collector-db}
--threshold ${slap-parameter:memory-percent-threshold}
--unit percent
wrapper-path = ${directory:promises}/check-computer-memory-usage
[publish-connection-information]
recipe = slapos.cookbook:publish
monitor-setup-url = https://monitor.app.officejs.com/#page=settings_configurator&url=${monitor-publish-parameters:monitor-url}&username=${monitor-publish-parameters:monitor-user}&password=${monitor-publish-parameters:monitor-password}
server_log_url = ${monitor-publish-parameters:monitor-base-url}/${slap-configuration:private-hash}/
[monitor-instance-parameter]
instance-configuration =
file max-cpu-load-per-core ${init-monitor-parameters:cpu-load-file}
file min-free-mem-percent ${init-monitor-parameters:mem-free-file}
[init-monitor-parameters]
recipe = plone.recipe.command
cpu-load-file = ${directory:monitor}/cpu-load-tolerance
mem-free-file = ${directory:monitor}/mem-free-limit
command =
if [ ! -s "${:cpu-load-file}" ]; then
echo "1.5" > ${:cpu-load-file}
fi
if [ ! -s "${:mem-free-file}" ]; then
echo "7.0" > ${:mem-free-file}
echo ${slap-parameter:cpu-load-threshold} > ${:cpu-load-file}
fi
[slap-configuration]
......@@ -124,3 +119,7 @@ key = ${slap-connection:key-file}
cert = ${slap-connection:cert-file}
private-hash = ${pwgen:passwd}${pwgen32:passwd}
frontend-domain =
[slap-parameter]
cpu-load-threshold = 2.0
memory-percent-threshold = 96
......@@ -28,7 +28,7 @@ mode = 0644
recipe = slapos.recipe.build:download
url = ${:_profile_base_location_}/instance-monitor.cfg.jinja2
destination = ${buildout:directory}/template-base-monitor.cfg
md5sum = 5cd3fe3edb4a7fbbc9671f051845d752
md5sum = db36ed784c690be892bf1834444c1fe7
mode = 0644
[template-monitor-distributor]
......@@ -85,6 +85,8 @@ scripts =
monitor.genstatus
monitor.configwrite
is-process-older-than-dependency-set
check-free-disk
check-computer-memory
[monitor-eggs]
eggs +=
......
......@@ -84,66 +84,23 @@ NB: You should use double $ (ex: $${monitor-template:rendered}) instead of one $
Add a promise
-------------
Monitor stack will include slapos promise directory etc/promise to promise folder. All files in that directory will be considered as a promise.
This mean that all slapos promises will be checked frequently by monitor.
To learn how to write a promise in SlapOS, please read this document:
https://www.erp5.com/slapos-TechnicalNote.General.SlapOS.Monitoring.Specifications
[monitor-conf-parameters]
...
promise-folder = ${directory:promises}
...
Monitor will run each promise every minutes and save the result in a json file. Here is an exemple of promise result:
{"status": "ERROR", "change-time": 1466415901.53, "hosting_subscription": "XXXX", "title": "vnc_promise", "start-date": "2016-06-21 10:47:01", "instance": "XXXX-title", "_links": {"monitor": {"href": "MONITOR_PRIVATE_URL"}}, "message": "PROMISE_OUPT_MESSAGE", "type": "status"}
A promise will be ran during a short time and report the status: ERROR or OK, plus an ouput message which says what was good or bad.
The promise should not run for more that 20 seconds, else it will be interrupted because of time out. However this value can be modified from monitoring web interface, see parameter "promise-timeout" of your hosting subscription.
On slapos, the default timeout value is also 20 seconds, if the value is modified on monitor (ex: to 50 seconds), it will still fails when slapgrid will process instance if the promise execution exceed 20 seconds.
Promises result are published in web public folder, access URL is: MONITOR_BASE_URL/private/PROMISE_NAME.status.json
Everytime monitor will run a promise an history of result will be also updated. The promise history will be updated during one day, after that a new history will be created.
To access promise history file as JSON, use URL MONITOR_BASE_URL/private/PROMISE_NAME.history.json
Add a promise: monitor promise
------------------------------
Monitor promise is also a promise like normal promise script but it will be placed into the folder ${monitor-directory:promises}:
[monitor-promise-xxxxx]
recipe = slapos.recipe.template:jinja2
rendered = ${monitor-directory:promises}/my-custom-monitor-promise
Theses promises will be executed only by monitor (not slapos) every minutes and will report same infor as default promises. This is another way to
add more custom promises to check if server is overloaded, or if network start to be slow, etc...
Add custom scripts to monitor
-----------------------------
Custom script will be automatically run by the monitor and result will be reported in monitor private folder. To add your custom script in ${monitor-directory:reports} folder. Here is an example:
[monitor-check-webrunner-internal-instance]
recipe = slapos.recipe.template:jinja2
template = ${monitor-check-webrunner-internal-instance:location}/${monitor-check-webrunner-internal-instance:filename}
rendered = $${monitor-directory:reports}/$${:filename}
filename = monitor-check-webrunner-internal-instance
mode = 0744
Writing a promise consists of defining a class called RunPromise which inherits from GenericPromise class and defining methods: anomaly(), sense() and test(). Python promises should be placed into the folder etc/plugin of the computer partition.
New promises should be placed into the folder etc/plugin, legacy promise are into the folder etc/promise. Legacy promises are bash or other executable promises script which does not use GenericPromise class.
The script will be executed every minutes by default. To change, put periodicity in script name:
- monitor-check-webrunner-internal-instance_every_1_minute
- monitor-check-webrunner-internal-instance_every_25_minute
- monitor-check-webrunner-internal-instance_every_1_hour
- monitor-check-webrunner-internal-instance_every_3_hour
- ...
Slapgrid will run each promise every time a partition is processed (every minutes in theory), if the partition is up to date, slapgrid will only run promises anomaly check and save the result in a json file. Here is an exemple of promise result:
the script name should end with _every_XX_hour or _every_XX_minute. With this, we can set filename as:
{"result": {"date": "2018-03-22T15:35:07", "failed": false, "message": "buildout is OK", "type": "Test Result"}, "path": "PARTITION_DIRECTORY/etc/plugin/buildout-slappart0-status.py", "name": "buildout-slappart0-status.py", "execution-time": 0.1, "title": "buildout-slappart0-status"}
filename = monitor-check-webrunner-internal-instance_every_2_minute
The promise execution time should be short, by default promise-timeout in slapgrid is to 20 seconds. If a promise runs in more than the defined promise-timeout, the process is killed and a "promise timed out" message is returned.
JSON in the folder `PARTITION_DIRECTORY/.slapgrid/promise/result`, and promise logs are in `PARTITION_DIRECTORY/.slapgrid/promise/log`.
You can get custom script results files at MONITOR_BASE_URL/private/FILE_NAME.
Monitor will expose promise JSON result in web public folder, access URL is: `MONITOR_BASE_URL/public/promise/PROMISE_TITLE.status.json`. Log files are exposed in monitor private web folder,
access URL is: `MONITOR_BASE_URL/private/log/monitor/promise`
Add custom file or directory to monitor
---------------------------------------
......@@ -158,20 +115,22 @@ Log or others files can be added in monitor public or private directory:
${directory:log}
...
files in public directory are accessible at MONITOR_BASE_URL/public, and for private directory: MONITOR_BASE_URL/private.
files in public directory are accessible at `MONITOR_BASE_URL/public`, and for private directory: `MONITOR_BASE_URL/private`.
Monitor RSS and OPML Feed
-------------------------
Monitor promise history, RSS and OPML Feed
------------------------------------------
Monitor generate rss containing latest result for all promises, this feed will be upaded every minutes. The Rss fee URL is
MONITOR_BASE_URL/public/feed
Monitor read every minutes JSON promises result files to build Rss and full instance state. The Rss feed URL is
`MONITOR_BASE_URL/public/feed`
OPML Feed is used to aggregate many feed URL, this is used on monitor to link many single monitor instances. For example, a resilient
webrunner has 5 instances at least, each instance has a monitor which leads to 5 monitor instances too. One main instance (usally the root instance)
will collect rss feeds of all others monitor's in a single OPML file. This file is published and used to configure a monitor backend to the web interface.
The URL of OPML feed is: MONITOR_BASE_URL/public/feeds
webrunner has 3 instances at least, each instance has a monitor which leads to 3 monitor instances too. One main instance (usally the root instance)
will collect rss feed of all others monitor's in a single OPML file. This file is published and used to configure a monitor backend to the web interface.
The URL of OPML feed is: `MONITOR_BASE_URL/public/feeds`
Everytime monitor will also produce history of for each promise in a single JSON file.
To access promise history file as JSON, use URL `MONITOR_BASE_URL/public/PROMISE_TITLE.history.json`
Monitor Base web directory tree
-------------------------------
......@@ -184,16 +143,13 @@ Monitor Base web directory tree
(webdav) X Y
|
---------------------------------
| | | |
public private jio_public jio_private
X Y | |
.jio_documents .jio_documents
| |
public private
X Y
MONITOR_BASE_URL/public or private is for normal HTTPS.
MONITOR_BASE_URL/share is the webdav URL. public/ and private/ are linked to public and private directories.
webdav also has jio_public/.jio_documents and jio_private/.jio_documents which are linked to public and private directory and they works with jio webdav pluging.
Access to Monitor
......
......@@ -44,6 +44,7 @@ eggs =
collective.recipe.template
cns.recipe.symlink
slapos.toolbox
slapos.core
# Do no generate any scripts here as all of them are generated by extraeggs
scripts =
......@@ -72,22 +73,12 @@ md5sum = 1695c9a06a2b11ccfe893d7a224e489d
[monitor-conf]
<= monitor-template-base
filename = monitor.conf.in
md5sum = 888e2845d09bfaa59c25f56f5bcf76b1
[monitor-instance-info]
<= monitor-template-base
filename = instance-info.conf.in
md5sum = 1bdb4e05c6be04f4e5766c64467fbcec
md5sum = 91c4c9bba1f7df788b9b7a059ed89ac2
[monitor-httpd-cors]
<= monitor-template-base
filename = httpd-cors.cfg.in
md5sum = 683ea85fc054094248baf5752dd089bf
[monitor-check-free-disk-space]
<= monitor-template-base
filename = check_free_disk.in
md5sum = bab457ac4d139ed31d0b343a7d14d996
# End templates files
# XXX keep compatibility (with software/ipython_notebook/software.cfg )
......@@ -112,7 +103,6 @@ context =
raw monitor_configwrite ${buildout:directory}/bin/monitor.configwrite
raw monitor_conf_template ${monitor-conf:location}/${monitor-conf:filename}
raw monitor_https_cors ${monitor-httpd-cors:location}/${monitor-httpd-cors:filename}
raw monitor_instance_info ${monitor-instance-info:location}/${monitor-instance-info:filename}
raw curl_executable_location ${curl:location}/bin/curl
raw dash_executable_location ${dash:location}/bin/dash
raw dcron_executable_location ${dcron:location}/sbin/crond
......@@ -122,7 +112,7 @@ context =
raw python_executable ${buildout:executable}
raw python_with_eggs ${buildout:directory}/bin/${extra-eggs:interpreter}
raw template_wrapper ${monitor-template-wrapper:location}/${monitor-template-wrapper:filename}
raw template_check_disk_space ${monitor-check-free-disk-space:location}/${monitor-check-free-disk-space:filename}
raw check_disk_space ${buildout:directory}/bin/check-free-disk
raw bin_directory ${buildout:directory}/bin
depends =
${monitor-eggs:eggs}
......
......@@ -15,4 +15,4 @@
# not need these here).
[monitor2-template]
filename = instance-monitor.cfg.jinja2.in
md5sum = 75fe1b222c269e69226796bf6059a747
md5sum = 33cdaacb84bac4fc237aef369cc4cb59
#!{{ python_bin }}
import os
import sys
def free_space(path, fn):
while True:
try:
disk = os.statvfs(path)
return fn(disk)
except OSError:
pass
if os.sep not in path:
break
path = os.path.split(path)[0]
def user_free_space(path):
return free_space(path, lambda d: d.f_bsize * d.f_bavail)
def check_inode_usage(path):
max_inode_usage = 97.99 # < 98% usage
st = os.statvfs(path)
usage_output = ""
total_inode = st.f_files
free_inode = st.f_ffree
usage = round((float(total_inode - free_inode) / total_inode), 4) * 100
if usage > max_inode_usage:
return "Disk Inodes are widely used: %s%%" % usage
elif os.path.exists('/tmp'):
# check if /tmp is mounted on another disk than path
tmp_st = os.statvfs('/tmp')
if tmp_st.f_blocks != st.f_blocks:
tmp_usage = round((float(tmp_st.f_files - tmp_st.f_ffree) / tmp_st.f_files), 4) * 100
if tmp_usage > max_inode_usage:
return "Disk Inodes are widely used: %s%%" % tmp_usage
return ""
if __name__ == '__main__':
home_path = '{{ home_path }}'
config_file = '{{ config_file }}'
min_free_size = 1024*1024*1024*2 # 2G by default
if os.path.exists(config_file):
with open(config_file) as f:
min_size_str = f.read().strip()
if min_size_str == '0':
# disable check
print "Free disk space check is disabled\n set a number up to 0 to enable!"
exit(0)
if min_size_str.isdigit():
value = int(min_size_str)
if value >= 200:
# Minimum value is 200Mb, it's already low
min_free_size = int(min_size_str)*1024*1024
else:
with open(config_file, 'w') as f:
f.write(str(min_free_size/(1024*1024)))
real_free_space = user_free_space(home_path)
if real_free_space > min_free_size:
inode_usage = check_inode_usage(home_path)
if inode_usage:
print inode_usage
exit(2)
print "Disk usage: OK"
exit(0)
real_space_g = round(real_free_space/(1024.0*1024*1024), 2)
min_space_g = round(min_free_size/(1024.0*1024*1024), 2)
print 'Free disk space low: remaning %s G (threshold: %s G)' % (
real_space_g, min_space_g)
print 'You can modify minimum value on your monitor interface.'
exit(1)
[instance]
name = {{ instance_dict['name'] }}
root-name = {{ instance_dict['root-name'] }}
computer = {{ instance_dict['computer-id'] }}
ipv4 = {{ instance_dict['ipv4'] }}
ipv6 = {{ instance_dict['ipv6'] }}
software-release = {{ instance_dict['software-release'] }}
software-type = {{ instance_dict['software-type'] }}
partition = {{ instance_dict['partition-id'] }}
\ No newline at end of file
......@@ -11,4 +11,9 @@ monitor-url-list =
{% for key, value in monitor_base_urls.items() -%}
{{ ' ' ~ value }}
{% endfor -%}
{% endif -%}
{% endif %}
[promises]
{% for key, value in promise_parameter_dict.items() -%}
{{ key }} = {{ value.strip().replace("\n", "\n ") }}
{% endfor -%}
\ No newline at end of file
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment