Skip to main content
Version: Self Hosted Turbo

Monitor Clickhouse DB on Instance

Overview

Clickhouse database (DB) on instances is monitored using sfAgent configured with Clickhouse plugin.

note

The Clickhouse plugin has been tested on followings versions and OS.

Clickhouse version - 23.3.8.21

OS - Ubuntu 20.04.1

Get Started

To get started with Clickhouse integration, create a user and grant permissions to collect data from your Clickhouse DB.

  1. Create a read-only user with the following code in the user section of /etc/clickhouse-server/users.xml file.

     <!—create user in <users> -->
    <new-user-name>
    <!—enter password-->
    <password>password</password>
    <!--set profile to readonly -->
    <profile>readonly</profile>
    </ new-user-name>
  2. Save and restart the server.

Configuration

Add the below-mentioned configuration to the config.yaml which is located at the following path /opt/sfagent/ directory .

metrics:
plugins:
- name: clickhouse
enabled: true
interval: 60
config:
documentsTypes:
- hostDetails
- clusterDetails
- tableStats
- osDetails
- exceptionDetails
- slowQueryDetails
password: password
port: 9000
user: username

View Database Metrics

  1. Go to the Application tab in SnappyFlow and navigate to your Project > Application > Dashboard.

  2. In the dashboard window, Clickhouse DB metrics are displayed in the Metric section.

    note

    Once the Clickhouse configuration settings are done, the Clickhouse plugin will be automatically detected within the Metrics section.

  3. To access the unprocessed data gathered from the plugins, navigate to the Browse data section and choose the Index: Metric, Instance: Endpoint, Plugin, and Document Type.

Metric List

Host Details
NameDescription
ContextLockNumber of times the lock of Context was acquired or tried to acquire. This is global lock
CompressedReadBufferBlocksNumber of compressed blocks from compressed sources (files, network)
CompressedReadBufferBytesNumber of uncompressed bytes read from compressed sources (files, network)
DiskReadElapsedMicrosecondsTotal time spent waiting for read syscall. This include reads from page cache
DiskWriteElapsedMicrosecondsTotal time spent waiting for write syscall. This include writes to page cache
DistributedConnectionFailAtAllTotal number of distributed connection failures that have occurred on the ClickHouse DB server
MergeTreeDataWriterCompressedBytesBytes written to filesystem for data INSERTed to MergeTree tables
QueryThreadNumber of query processing threads
UptimeTime from server starting (in seconds)
FileOpenNumber of files opened
ZooKeeperUserExceptionsNumber of ZooKeeper user exceptions
ZooKeeperRequestNumber of requests to ZooKeeper in fly
BackgroundDistributedSchedulePoolTaskNumber of active tasks in BackgroundDistributedSchedulePool. This pool is used for distributed sends that is done in background
BackgroundMovePoolTaskNumber of active tasks in BackgroundProcessingPool for moves
BackgroundSchedulePoolTaskNumber of active tasks in BackgroundSchedulePool. This pool is used for periodic ReplicatedMergeTree tasks, like cleaning old data parts, altering data parts, replica re-initialization, etc.
GlobalThreadActiveNumber of threads in global thread pool running a task
GlobalThreadNumber of threads in global thread pool
LocalThreadActiveNumber of threads in local thread pools running a task
LocalThreadNumber of threads in local thread pools. The threads in local thread pools are taken from the global thread pool
HttpConnectionNumber of connections to HTTP server
MaxPartCountForPartitionMaximum number of parts per partition across all partitions of all tables of MergeTree family. Values larger than 300 indicates misconfiguration, overload, or massive data loading.
ZooKeeperWaitMicrosecondsTotal time spent waiting for ZooKeeper in microseconds
PartMutationNumber of mutations (ALTER DELETE/UPDATE)
ReplicasMaxInsertsInQueueMaximum number of INSERT operations in the queue (still to be replicated) across Replicated tables
ReplicasMaxMergesInQueueMaximum number of merge operations in the queue (still to be applied) across Replicated tables
ReplicasSumInsertsInQueueSum of INSERT operations in the queue (still to be replicated) across Replicated tables.
ReplicasSumMergesInQueueSum of merge operations in the queue (still to be applied) across Replicated tables.
JemallocBackgroundThreadNumRunsAn internal metric of the low-level memory allocator (jemalloc)
Cluster Details
NameDescription
NumberOfDatabasesNumber of Databases available in server
NumberOfTablesTotal number of tables in database
ClusterNameCluster name
ReadonlyReplicaNumber of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured
ArenaAllocBytesNumber of bytes allocated for memory Arena (used for GROUP BY and similar operations)
HardPageFaultsThe number of hard page faults in query execution threads. High values indicate either that you forgot to turn off swap on your server, or eviction of memory pages of the ClickHouse binary during very high memory pressure, or successful usage of the 'mmap' read method for the tables data
UserTimeMicrosecondsTotal time spent in processing (queries and other tasks) threads executing CPU instructions in user space. This include time CPU pipeline was stalled due to cache misses, branch mispredictions, hyper-threading, etc.
ZooKeeperTransactionsTotal transactions in ZooKeeperRequest, ZooKeeperWatch, ZooKeeperSession. ZooKeeperRequest - Number of sessions (connections) to ZooKeeper. ZooKeeperWatch - Number of watches (event subscriptions) in ZooKeeper. ZooKeeperSession - Number of requests to ZooKeeper in fly.
DelayedInsertsNumber of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table.
RealTimeMicrosecondsTotal time spent in processing (queries and other tasks) threads.
LongestRunningQueryLongest running query (in ms)
QueryNumber of queries to be interpreted and potentially executed. May include internal queries initiated by ClickHouse itself. Does not count subqueries
SelectQuerySame as Query, but only for SELECT queries.
MergedUncompressedBytesUncompressed bytes (for columns as they stored in memory) that was read for background merges. This is the number before merge.
MergeNumber of launched background merges.
MergedRowsRows read for background merges. This is the number of rows before merge.
MergesTimeMillisecondsTotal time spent for background merges
DnsErrorNumber of DNS errors
InsertQueryNumber of insert Query Performed.
FailedQueryNumber of failed queries.
FailedSelectQueryNumber of failed select queries.
InsertedRowsNumber of rows INSERTed to all tables.
InsertedBytesNumber of bytes (uncompressed; for columns as they stored in memory) INSERTed to all tables.
MergeTreeDataWriterRowsNumber of rows INSERTed to MergeTree tables.
ContextLockWaitNumber of threads waiting for lock in Context. This is global lock.
Table Stats
NameDescription
TablePartsRowsNumber of rows in all table parts
TablePartitionsNumber of table partitions
TablePartsBytesSize of all table parts in bytes
TablePartsNumber of table parts
OS Details
NameDescription
OsCpuWaitMicrosecondsTotal time a thread was ready for execution but waiting to be scheduled by OS, from the OS point of view.
OsIoWaitMicrosecondsTotal time spent waiting for I/O in microseconds.
OsWriteBytesNumber of bytes written to disks or block devices.
Exception Details
NameDescription
ExceptionQueryThe query that caused the exception.
ExceptionQueryDurationThe duration of the query that caused the exception.

Slow Query Details

NameDescription
SlowQueryThe query that took more than 1000 ms to execute.
SlowqueryDurationThe duration of the slow query.