Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
S
slapos.core
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
slapos.core
Commits
a0a34066
Commit
a0a34066
authored
Sep 17, 2014
by
Rafael Monnerat
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
slapos.collect: Include collect basic documentation/information
parent
e63e3832
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
223 additions
and
0 deletions
+223
-0
slapos/collect/README.txt
slapos/collect/README.txt
+223
-0
No files found.
slapos/collect/README.txt
0 → 100644
View file @
a0a34066
Collecting Data
================
The "slapos node collect" command collects data from a computer taking a
few snapshot on different scopes and storing it (currently on sqllite3).
Scopes of Snapshots are:
- User Processes: Collects data from all user's process related to SlapOS (ie.: slapuser*)
- System Information: Collects data from the System Usage and Computer Hardware.
So on every slapos node collect calls (perfomed by cron on every minute), the
slapos stores the all snapshots for future analizes.
User's Processes Snapshot
==========================
Collect command search for all process launched by all users related to the
slapos [1]. After this, for each process it uses psutil (or similars tools) to
collect all available information for every process pid [2].
Once Collected, every Process information is stored on sqllite3 [3], in other
words, we have 1 line per pid for a giving time. It's used pid number and
process creation date for create a UID for the process, and it is omitted the
command name in order to annonymalize the data (so the risk of information
leak is reduced).
The measuring of process only consider CPU, memory and io operations (rw and
cycles), we are studying how to measure network (without be intrusive).
System Information Snapshot
============================
Those snapshots has 2 different goals, first is collect current load from existing
computer (cpu, memory, disk, network...) and the second goal is collect the
available resources the computer has installed [4].
We use 3 types of snapshots for determinate the load and the available resources
(all mostly use psutils to collect data):
- System Snapshot [5]: It collects general computer usage like CPU, Memory
and Network IO usage.
- Computer Snapshot [6]: It collects for now number of CPU cores and available
memory, however we wish to collect more details.
- Disk Snapshot [7]: It collects the informations related to the a disk
(1 snapshot per disk), which contains total, usage and
io informations.
"Real-time" Partial dump (Dygraph)
===================================
On every run, we dump data from the current day on csv [8] (2 axes), in order to
plot easily with dygraph, so there will be few files available like this:
- system_cpu_percent.csv
- system_disk_memory_free__dev_sda1.csv
- system_disk_memory_free__dev_sdb1.csv
- system_disk_memory_used__dev_sda1.csv
- system_disk_memory_used__dev_sdb1.csv
- system_loadavg.csv
- system_memory_free.csv
- system_memory_used.csv
- system_net_in_bytes.csv
- system_net_in_dropped.csv
- system_net_in_errors.csv
- system_net_out_bytes.csv
- system_net_out_dropped.csv
- system_net_out_errors.csv
All contains only information from computer usage, for global usage (for now). It
is perfectly acceptable keep a realtime copy in csv of the most recently data.
Logrotate
=========
Slapos collects contains its on log rotating policy [9] and gargabe collection [10].
- We dump in folders YYYY-MM-DD, all data which are not from the current day.
- Every table generates 1 csv with the date from the dumped day.
- All dumped data is marked as reported on sqllite (column reported)
- All data which are older them 3 days and it is already reported is removed.
- All folders which contains dumped data is compressed in a tar.gz file.
Data Structure
===============
The header of the CSVs are not included on the dumped file (it is probably a
mistake), but it corresponds to (same as columns on the sqllite) which can be
easily described like bellow [11]:
- user
partition (text)
pid (real)
process (text)
cpu_percent (real)
cpu_time (real)
cpu_num_threads (real)
memory_percent (real)
memory_rss (real)
io_rw_counter (real)
io_cycles_counter (real)
date (text)
time (text)
reported (integer)
- computer
cpu_num_core (real)
cpu_frequency (real
cpu_type (text)
memory_size (real)
memory_type (text)
partition_list (text)
date (text)
time (text)
reported (integer)
- system
loadavg (real)
cpu_percent (real)
memory_used (real)
memory_free (real)
net_in_bytes (real)
net_in_errors (real)
net_in_dropped (real)
net_out_bytes (real)
net_out_errors (real)
net_out_dropped (real)
date (text)
time (text)
reported (integer)
- disk
partition (text)
used (text)
free (text)
mountpoint (text)
date (text)
time (text)
reported (integer)
Probably a more formal way to collect data data can be introduced.
Download Collected Data
========================
Data is normally available on the server file system, we use a simple software
"slapmonitor" which can be deployed on any machine which allow us download via
HTTP the data.
Slapmonitor can be also used to determinate de availability of the machine (it
returns "OK" if accessed on his "/" address), and it servers the data on a url
like:
- https://<address>/ -> just return "OK"
- https://<address>/<secret hash>/server-log/ -> you can see all files
The slapmonitoring can be easily extented to include more sensors (like
temperature, benchmarks...) which normally requires more speficic software
configurations.
Planned Non core extensions and benchmarking
=============================================
It is planned to include 4 simple benchmarks measure machines performance
degradation overtime:
- CPU benchmark with Pystone
- SQL Benchmark on SQLlite (for now)
- Network Uplink Benchmark
- Network Download Benchmark
This part is not included or coded, but we intent to measure performance
degradation in future, to stop to allocate if the machine is working but
cannot mantain a minimal Service Quality (even if it is not looks like
overloaded).
Servers Availability
=====================
All servers contacts the slapos master on regular bases (several times a minute),
it is possible to determinate the general availability of a server by looking at
apache log using this script:
- http://git.erp5.org/gitweb/cloud-quote.git/blob/HEAD:/py/my.py
It produces a json like this:
- http://git.erp5.org/gitweb/cloud-quote.git/blob/HEAD:/data/stats.json
However, this is a bit draft and rudimentar to determinate problems on the
machine, as the machine completly "death" is rare, normally most of failures are
pure network problems or human/environmental problem (normally not depends of
the machine load).
[1] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/entity.py?js=1#l58
[2] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/snapshot.py?js=1#l37
[3] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/db.py?js=1#l130
[4] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/entity.py?js=1#l77
[5] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/snapshot.py?js=1#l62
[6] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/snapshot.py?js=1#l95
[7] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/snapshot.py?js=1#l81
[8] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/reporter.py?js=1#l75
[9] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/reporter.py?js=1
[10] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/db.py?js=1#l192
[11] http://git.erp5.org/gitweb/slapos.core.git/blob/HEAD:/slapos/collect/db.py?js=1#l39
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment