Skip to main content

Clickhouse on Kubernetes

Clickhouse running in Kubernetes can be monitored in SnappyFlow using Prometheus exporter.

Clickhouse monitoring with Prometheus

Refer to Prometheus Exporter overview to understand how SnappyFlow monitors using Prometheus exporters.

Pre-requisites

Prometheus exporter is deployed as a side-car in the application container and the exporter port is accessible to sfPod

Metrics list

Cluster Details
NameDescription
DNSErrorTotal count of errors in DNS resolution
DelayedInsertsNumber of times the INSERT of a block to a MergeTree table was throttled due to high number of active data parts for partition.
ContextLocksNumber of times the lock of Context was acquired or tried to acquire. This is global lock.
MergedUncompressedBytesUncompressed bytes (for columns as they stored in memory) that was read for background merges. This is the number before merge.
MergesTimeMillisecondsTotal time spent for background merges.
DiskReadElapsedMicrosecondsTotal time spent waiting for read syscall. This include reads from page cache.
DiskWriteElapsedMicrosecondsTotal time spent waiting for write syscall. This include writes to page cache.
MergeTreeDataWriterCompressedBytesBytes written to filesystem for data INSERTed to MergeTree tables.
MergeTreeDataWriterRowsNumber of rows INSERTed to MergeTree tables.
NumberOfTablesNumber of tables
InsertedBytesNumber of bytes (uncompressed; for columns as they stored in memory) INSERTed to all tables.
InsertedRowsNumber of rows INSERTed to all tables.
MergeNumber of launched background merges.
QueryNumber of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.
FailedQueryNumber of failed queries.
SelectQuerySame as Query, but only for SELECT queries.
FailedSelectQuerySame as FailedQuery, but only for SELECT queries.
fileopenNumber of files opened.
NumberOfDatabasesNumber of databases
ReadonlyReplicaNumber of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured.
OSCPUWaitMicrosecondsTotal time a thread was ready for execution but waiting to be scheduled by OS, from the OS point of view.
OSIOWaitMicrosecondsTotal time a thread spent waiting for a result of IO operation, from the OS point of view. This is real IO that doesn't include page cache.
UserTimeMicrosecondsTotal time spent in processing (queries and other tasks) threads executing CPU instructions in user space. This include time CPU pipeline was stalled due to cache misses, branch mispredictions, hyper-threading, etc.
OSWriteBytesNumber of bytes written to disks or block devices. Doesn't include bytes that are in page cache dirty pages. May not include data that was written by OS asynchronously.
QueryDuration (Prom Metric: chi_clickhouse_event_RealTimeMicroseconds)Total (wall clock) time spent in processing (queries and other tasks) threads (not that this is a sum).
MergedRowsRows read for background merges. This is the number of rows before merge.
LongestRunningQueryLongest running query time
HardPageFaultsAn exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations
Host Details
NameDescription
DNSErrorTotal count of errors in DNS resolution
HttpconnectionThe number of connections to HTTP server
CompressedReadBufferBlocksNumber of compressed blocks (the blocks of data that are compressed independent of each other) read from compressed sources (files, network)
CompressedReadBufferBytesNumber of uncompressed bytes (the number of bytes after decompression) read from compressed sources (files, network).
DiskReadElapsedMicrosecondsTotal time spent waiting for read syscall. This include reads from page cache.
DiskWriteElapsedMicrosecondsTotal time spent waiting for write syscall. This include writes to page cache.
MergeTreeDataWriterCompressedBytesBytes written to filesystem for data INSERTed to MergeTree tables.
MergeTreeDataWriterRowsNumber of rows INSERTed to MergeTree tables.
DistributedConnectionFailAtAllTotal count when distributed connection fails after all retries finished
InsertQuerySame as Query, but only for INSERT queries.
NumberOfTablesNumber of tables
Query ThreadsNumber of query processing threads
ZooKeeperUserExceptionsZooKeeper User Exceptions
BackgroundDistributedSchedulePoolTaskNumber of active tasks in BackgroundDistributedSchedulePool. This pool is used for distributed sends that is done in background.
BackgroundMovePoolTaskNumber of active tasks in BackgroundProcessingPool for moves
BackgroundSchedulePoolTaskNumber of active tasks in BackgroundSchedulePool. This pool is used for periodic ReplicatedMergeTree tasks, like cleaning old data parts, altering data parts, replica re-initialization, etc
ContextLocksNumber of times the lock of Context was acquired or tried to acquire. This is global lock.
ContextLockWaitNumber of threads waiting for lock in Context. This is global lock.
GlobalThreadActiveNumber of threads in global thread pool running a task.
GlobalThreadTotalNumber of threads in global thread pool.
LocalThreadActiveNumber of threads in local thread pools running a task.
LocalThreadTotalNumber of threads in local thread pool.
PartMutationNumber of mutations (ALTER DELETE/UPDATE)
FailedQueryNumber of failed queries.
SelectQuerySame as Query, but only for SELECT queries.
FailedSelectQuerySame as FailedQuery, but only for SELECT queries.
FileopenNumber of files opened.
MergedRowsRows read for background merges. This is the number of rows before merge.
MergeNumber of launched background merges.
QueryNumber of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.
InsertedBytesNumber of bytes (uncompressed; for columns as they stored in memory) INSERTed to all tables.
InsertedRowsNumber of rows INSERTed to all tables.
MergedUncompressedBytesUncompressed bytes (for columns as they stored in memory) that was read for background merges. This is the number before merge.
MergesTimeMillisecondsTotal time spent for background merges.
ReplicasMaxInsertsInQueue
ReplicasMaxMergesInQueue
ReplicasSumInsertsInQueue
ReplicasSumMergesInQueue
jemalloc_background_thread_num_runs
MaxPartCountForPartition
MemoryTrackingForMergesTotal amount of memory (bytes) allocated for background merges. Included in MemoryTrackingInBackgroundProcessingPool. Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks.
ZooKeeperWaitMicroseconds
ArenaAllocBytes
TableStats
NameDescription
TableName of the table
DatabaseName of the database
NumPartitionsNumber of partitions of the table
NumTablePartsNumber of parts of the table
TableSizeTable size in bytes
NumRowNumber of rows in the table

Configuration 

Clickhouse Deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
name: clickhouse-operator
spec:
selector:
matchLabels:
app: clickhouse-operator
template:
metadata:
annotations:
prometheus.io/port: "8888"
prometheus.io/scrape: "true"
labels:
app: clickhouse-operator
spec:
containers:
- env:
- name: OPERATOR_POD_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: altinity/clickhouse-operator:0.10.0
imagePullPolicy: Always
name: clickhouse-operator
volumeMounts:
- mountPath: /etc/clickhouse-operator
name: etc-clickhouse-operator-folder
- mountPath: /etc/clickhouse-operator/conf.d
name: etc-clickhouse-operator-confd-folder
- mountPath: /etc/clickhouse-operator/config.d
name: etc-clickhouse-operator-configd-folder
- mountPath: /etc/clickhouse-operator/templates.d
name: etc-clickhouse-operator-templatesd-folder
- mountPath: /etc/clickhouse-operator/users.d
name: etc-clickhouse-operator-usersd-folder
- image: altinity/metrics-exporter:0.10.0
imagePullPolicy: Always
name: metrics-exporter
ports:
- containerPort: 8888
name: metrics
protocol: TCP
volumeMounts:
- mountPath: /etc/clickhouse-operator
name: etc-clickhouse-operator-folder
- mountPath: /etc/clickhouse-operator/conf.d
name: etc-clickhouse-operator-confd-folder
- mountPath: /etc/clickhouse-operator/config.d
name: etc-clickhouse-operator-configd-folder
- mountPath: /etc/clickhouse-operator/templates.d
name: etc-clickhouse-operator-templatesd-folder
- mountPath: /etc/clickhouse-operator/users.d
name: etc-clickhouse-operator-usersd-folder
volumes:
- configMap:
defaultMode: 420
name: etc-clickhouse-operator-files
name: etc-clickhouse-operator-folder
- configMap:
defaultMode: 420
name: etc-clickhouse-operator-confd-files
name: etc-clickhouse-operator-confd-folder
- configMap:
defaultMode: 420
name: etc-clickhouse-operator-configd-files
name: etc-clickhouse-operator-configd-folder
- configMap:
defaultMode: 420
name: etc-clickhouse-operator-templatesd-files
name: etc-clickhouse-operator-templatesd-folder
- configMap:
defaultMode: 420
name: etc-clickhouse-operator-usersd-files
name: etc-clickhouse-operator-usersd-folder

Viewing data and dashboards

Data collected by plugins can be viewed in SnappyFlow’s browse data section    

  • Plugin = kube-prom-clickhouse
  • documentType= clusterDetails, hostDetails, tableStats
  • Dashboard template: Clickhouse_Kube_Prom