Documentation for version: v0.2.0
Edge devices can collect metrics from all deployed workloads and the system of
the device itself. The collected metrics are sent to a metrics receiver
. This
document describes how that functionality can be configured.
System metrics collection is enabled by default and the Flotta agent will start
gathering them when the device is started - with default intervals of 60
seconds. Said interval can be customized by setting desired frequency (in
seconds) in an EdgeDevice
CR.
For instance, following spec snippet would instruct the device worker to collect system metrics every 5 minutes.
spec:
metrics:
system:
interval: 300
By default, the device worker would collect only pre-defined, narrow list of system metrics; user can modify the set of collected metrics using system metrics allow-list.
Allow-list configuration comprises two elements:
ConfigMap
containing a list of metrics to be collected (exclusively)ConfigMap
in the EdgeDevice
system metrics configurationSample allow-list ConfigMap
(mind metrics_list.yaml
key):
apiVersion: v1
kind: ConfigMap
metadata:
name: system-allow-list
namespace: devices
data:
metrics_list.yaml: |
names:
- node_disk_io_now
- node_memory_Mapped_bytes
- node_network_speed_bytes
Reference to the above ConfigMap
in an EdgeDevice
spec:
spec:
metrics:
system:
allowList:
name: system-allow-list
The devices can be configured to write the metrics to a remote server. The client in the device uses Prometheus Remote Write API (see also Prometheus Integrations). The device writes metrics until it reaches the end of the TSDB contents. It then waits 5 minutes for more metrics to be collected.
The feature is disabled by default. It is configured via EdgeDevice/EdgeDeviceSet CRs. Example with inline documentation and defaults:
spec:
metrics:
receiverConfiguration:
caSecretName: receiver-tls # secret containing CA cert. Secret key is 'ca.crt'. Optional
requestNumSamples: 10000 # maximum number of samples in each request from device to receiver. Optional
timeoutSeconds: 10 # timeout for requests to receiver. Optional
url: https://receiver:19291/api/v1/receive # the receiver's URL. Used to indicate HTTP/HTTPS. Set to empty in order to disable writing to receiver
We prepared an example for deploying a Thanos receiver.
Example includes deployment with and without TLS. The receiver listens on port
19291
for incoming writes. The deployment’s pod includes a container that
executes a Thanos querier. You can use it for querying the received metrics. It
listens on port 9090
.
Without TLS:
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-receiver
labels:
app: thanos-receiver
spec:
replicas: 1
selector:
matchLabels:
app: thanos-receiver
template:
metadata:
labels:
app: thanos-receiver
spec:
containers:
- name: receive
image: quay.io/thanos/thanos:v0.24.0
command:
- /bin/thanos
- receive
- --label
- "receiver=\"0\""
- name: query
image: quay.io/thanos/thanos:v0.24.0
command:
- /bin/thanos
- query
- --http-address
- 0.0.0.0:9090
- --grpc-address
- 0.0.0.0:11901
- --endpoint
- 127.0.0.1:10901
With TLS:
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-receiver
labels:
app: thanos-receiver
spec:
replicas: 1
selector:
matchLabels:
app: thanos-receiver
template:
metadata:
labels:
app: thanos-receiver
spec:
initContainers:
- name: http-config
image: fedora
command: ["/bin/sh"]
args: ["-c", "echo -e \"tls_server_config:\\n cert_file: /etc/server-tls/tls.crt\\n key_file: /etc/server-tls/tls.key\" > /etc/shared/http.config"]
volumeMounts:
- name: shared
mountPath: /etc/shared
containers:
- name: receive
image: quay.io/thanos/thanos:v0.24.0
command:
- /bin/thanos
- receive
- --label
- "receiver=\"0\""
- --remote-write.server-tls-cert
- /etc/server-tls/tls.crt
- --remote-write.server-tls-key
- /etc/server-tls/tls.key
volumeMounts:
- name: server-tls
mountPath: /etc/server-tls
- name: query
image: quay.io/thanos/thanos:v0.24.0
command:
- /bin/thanos
- query
- --http-address
- 0.0.0.0:9090
- --grpc-address
- 0.0.0.0:11901
- --endpoint
- 127.0.0.1:10901
- --http.config
- /etc/shared/http.config
volumeMounts:
- name: server-tls
mountPath: /etc/server-tls
- name: shared
mountPath: /etc/shared
volumes:
- name: server-tls
secret:
secretName: thanos-receiver-tls
- name: shared
emptyDir: {}
In order to publish the metrics to Openshift, the following step yaml need to to be applied:
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
EOF
In order to install Grafana in flotta namespace, the use the following script which will install the Grafana and Grafana Dashboard.
export KUBECONFIG=your-kubeconfig-file
tools/deploy_grafana.sh -d contrib/metrics/flotta-dashboard.json
To import any additional Grafana dashboard to existing Grafana in flotta namespace, use following script:
export KUBECONFIG=your-kubeconfig-file
tools/import_grafana_dashboards.sh -d <dashboard file>
Specifically, it can be used to install edge device health monitoring dashboard (flotta-operator/docs/metrics/flotta-devices-health.json):
tools/import_grafana_dashboards.sh -d contrib/metrics/flotta-devices-health.json
All these scripts are part of flotta-operator github repo.