Exploring exporter metrics
All the exporters we deployed expose metrics in Prometheus format. We can observe them by sending a simple HTTP request. Since the services do not publish any ports, the only way we can communicate with them is through the monitor network attached to those exporters.
We'll create a new utility service and attach it to the monitor network.
docker service create \ --name util \ --network monitor \ --mode global \ alpine sleep 100000000
We created a service based on the alpine image, named it util, and attached it to the monitor network so that it can communicate with exporters we deployed. We made the service global so that it runs on every node. That guaranteed that a replica runs on the node we're in. Since alpine does not have a long running process, without sleep, it would stop as soon as it started, Swarm would reschedule it, only to detect that it stopped again, and so on. Without sleep it would enter a never ending loop of failures and rescheduling.
ID=$(docker container ls -q \ -f "label=com.docker.swarm.service.name=util") docker container exec -it $ID \ apk add --update curl
Next, we found the ID of the container, entered it, and installed curl.
Now we're ready to send requests to the exporters:
docker container exec -it $ID \ curl node-exporter:9100/metrics
Partial output of the request to the node-exporter is as follows.
... # HELP process_cpu_seconds_total Total user and system CPU time\
spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 3.05 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1.048576e+06 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 7 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 1.6228352e+07 # HELP process_start_time_seconds Start time of the process \
since unix epoch in seconds. # TYPE process_start_time_seconds gauge
process_start_time_seconds 1.49505618366e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.07872e+07 ...
As you can see, each metric contains a help entry that describes it, states the type, and displays metric name followed with a value.
We won't go into details of all the metrics provided by node-exporter. The list is quite big, and it would require a whole chapter (maybe even a book) to go through all of them. The important thing, at this moment, is to know that almost anything hardware and OS related is exposed as a metric.
Please note that Overlay network load-balanced our request and forwarded it to one of the replicas of the exporter. We don't know what the origin of those metrics is. It could be a replica running on any of the nodes of the cluster. That should not be a problem since, at this moment, we're interested only in observing how metrics look like. If you go back to the configuration screen, you'll notice that targets are configured to use tasks.[SERVICE_NAME] format for addresses. When a service name is prefixed with tasks., Swarm returns the list of all replicas (or tasks) of a service.
Let's move to cadvisor metrics.
docker container exec -it $ID \ curl cadvisor:8080/metrics
Partial output of the request to cadvisor metrics is as follows.
... # HELP container_network_receive_bytes_total Cumulative count of bytes received # TYPE container_network_receive_bytes_total counter container_network_receive_bytes_total{id="/",interface="dummy0"} 0 container_network_receive_bytes_total{id="/",interface="eth0"} 6.6461026e+07 container_network_receive_bytes_total{id="/",interface="eth1"} 1.3054141e+07 ... container_network_receive_bytes_total{container_label_com_docker_stack\
_namespace="proxy",container_label_com_docker_swarm_node_id="zvn1kazs\
toa12pu3rfre9j4sw",container_label_com_docker_swarm_service_id="gfoia\
s8w9bf1cve5dujzzlpfh",container_label_com_docker_swarm_service_name=\
"proxy_swarm-listener", container_label_com_docker_swarm_task="",\ container_label_com_docker_swarm_task_id="39hgd75s8vt051smew3ke4imw",\
container_label_com_docker_swarm_task_name="proxy_swarm-listener.1.39\
hgd75s8vt051smew3ke4imw", id="/docker/f2232d2ddf801b1ff41120bb1b9521\
3be15767fe0e6d45266b3b8bba149b3634",image="vfarcic/docker-flow-swarm-\
listener:latest@sha256:d67494f08aa3efba86d5231adba8ee7281c29fd401a5f6\
7377ee026cc436552b",interface="eth0",name="proxy_swarm-listener.1.39h\
gd75s8vt051smew3ke4imw"} 112764 ...
The major difference, when compared to node-exporter, is that cadvisor provides a lot of labels. They help a lot when querying metrics, and we'll use them soon.
Just like with node-exporter, we won't go into details of each metric exposed through cadvisor. Instead, as we're progressing towards creating a self-healing system, we'll gradually increase the number of metrics we're using and comment on them as they come.
Now that we have the metrics and that Prometheus is scraping and storing them in its database, we can turn our attention to queries we can execute.