Skip to main content
Version: Cloud

Monitor ElastiCache

ElastiCache

Overview

ElastiCache is the distributed in memory cache environments in the AWS cloud. Amazon ElastiCache supports the Memcached and Redis cache engines.

Prerequisites:

CloudWatch Access for IAM Role​

Provide Read only access for CloudWatch to the dedicated IAM Role used for APM. You can use AWS managed polices that addresses many common use cases by providing standalone IAM policies that are created and administered by AWS. Attach this AWS policy CloudWatchReadOnlyAccess to IAM role to get read access for all CloudWatch else create the below custom policy and attach it to IAM.

Required Permissions:

  • cloudwatch:GetMetricData,
  • elasticache: ListTagsForResource,
  • elasticache:DescribeCacheClusters,
  • elasticache:DescribeEvents

Metrics list

Host Level Stats

Host level metrics are common for both Redis and Memcached engines

MetricDescription
BytesReadIntoMemcachedThe number of bytes that have been read from the network by the cache node.
BytesUsedForCacheItemsThe number of bytes used to store cache items.
BytesWrittenOutFromMemcachedThe number of bytes that have been written to the network by the cache node.
BytesUsedForHashThe number of bytes currently used by hash tables.
CurrConfigThe number of bytes currently used by hash tables
EvictedUnfetchedThe number of valid items evicted from the least recently used cache (LRU) which were never touched after being set.
ExpiredUnfetchedThe number of expired items reclaimed from the LRU which were never touched after being set.
SlabsMovedThe total number of slab pages that have been moved.
NewItemsThe number of new items the cache has stored
UnusedMemoryThe amount of memory not used by data.
NewConnectionsThe number of new connections the cache has received.
ReclaimedThe number of expired items the cache evicted to allow space for new writes.
EvictionsThe number of non-expired items the cache evicted to allow space for new writes.
CurrConnectionsA count of the number of connections connected to the cache at an instant in time.
CurrItemsA count of the number of items currently stored in the cache.

Redis Node Stats

MetricDescription
ActiveDefragHitsThe number of value reallocations per minute performed by the active defragmentation process.
AuthenticationFailuresThe total number of failed attempts to authenticate to Redis using the AUTH command.
BytesUsedForCacheThe total number of bytes allocated by Redis for all purposes including the dataset, buffers and so on.
BytesReadFromDiskThe total number of bytes read from disk per minute. Supported only for clusters using Data tiering.
BytesWrittenToDiskThe total number of bytes written to disk per minute. Supported only for clusters using Data tiering.
CommandAuthorizationFailuresThe total number of failed attempts by users to run commands they don not have permission to call.
CacheHitRateIndicates the usage efficiency of the Redis instance.
CurrVolatileItemsTotal number of keys in all databases that have a ttl set. This is derived from the Redis expires statistic, summing all of the keys with a ttl set in the entire keyspace.
DatabaseMemoryUsagePercentagePercentage of the memory for the cluster that is in use. This is calculated using used_memory/maxmemory from Redis INFO.
DatabaseMemoryUsageCountedForEvictPercentagePercentage of the memory for the cluster that is in use, excluding memory used for overhead and COB.
DB0AverageTTLExposes avg_ttl of DBO from the keyspace statistic of Redis INFO command. Replicas do not expire keys, instead they wait for primary nodes to expire keys.
EngineCPUUtilizationProvides CPU utilization of the Redis engine thread
GlobalDatastoreReplicationLagThis is the lag between the secondary Regions primary node and the primary Regions primary node. For cluster mode enabled Redis, the lag indicates the maximum delay among the shards.
IsMasterIndicates whether the node is the primary node of current shard/cluster. The metric can be either 0 (not primary) or 1 (primary).
KeyAuthorizationFailuresThe total number of failed attempts by users to access keys they don not have permission to access.
KeysTrackedThe number of keys being tracked by Redis key tracking as a percentage of tracking-table-max-keys. Key tracking is used to aid client-side caching and notifies clients when keys are modified.
MemoryFragmentationRatioIndicates the efficiency in the allocation of memory of the Redis engine.
NumItemsReadFromDiskThe total number of items retrieved from disk per minute. Supported only for clusters using Data tiering.
NumItemsWrittenToDiskThe total number of items written to disk per minute. Supported only for clusters using Data tiering.
MasterLinkHealthStatusThis status has two values: 0 or 1. The value 0 indicates that data in the ElastiCache primary node is not in sync
ReplicationBytesFor nodes in a replicated configuration, ReplicationBytes reports the number of bytes that the primary is sending to all of its replicas.
ReplicationLagThis metric is only applicable for a node running as a read replica. It represents how far behind, in seconds, the replica is in applying changes from the primary node. For Redis engine version 5.0.6 onwards, the lag can be measured in milliseconds.
SaveInProgressThis binary metric returns 1 whenever a background save (forked or forkless) is in progress, and 0 otherwise.
TrafficManagementActiveIndicates whether ElastiCache for Redis is actively managing traffic by adjusting traffic allocated to incoming commands, monitoring or replication.
NewConnectionsThe number of new connections the cache has received.
ReclaimedThe number of expired items the cache evicted to allow space for new writes.
EvictionsThe number of non-expired items the cache evicted to allow space for new writes.
CurrConnectionsA count of the number of connections connected to the cache at an instant in time. ElastiCache uses two to three of the connections to monitor the cluster.
CurrItemsA count of the number of items currently stored in the cache.

Redis Operational Stats

MetricDescription
ClusterBasedCmdsThe total number of commands that are cluster-based.
ClusterBasedCmdsLatencyLatency of cluster-based commands.
EvalBasedCmdsThe total number of commands for eval-based commands.
EvalBasedCmdsLatencyLatency of eval-based commands.
GeoSpatialBasedCmdsThe total number of commands for geospatial-based commands.
GeoSpatialBasedCmdsLatencyLatency of geospatial-based commands.
GetTypeCmdsThe total number of read-only type commands.
GetTypeCmdsLatencyLatency of read commands.
HashBasedCmdsThe total number of commands that are hash-based.
HashBasedCmdsLatencyLatency of hash-based commands.
HyperLogLogBasedCmdsThe total number of HyperLogLog-based commands.
HyperLogLogBasedCmdsLatencyLatency of HyperLogLog-based commands.
JsonBasedCmdsThe total number of commands that are JSON-based.
JsonBasedCmdsLatencyExposes the aggregate latency (server side CPU time) calculated as Delta[Usec]/Delta[Calls] of all commands that act upon one or more JSON document objects.
KeyBasedCmdsThe total number of commands that are key-based.
KeyBasedCmdsLatencyLatency of key-based commands.
ListBasedCmdsThe total number of commands that are list-based.
ListBasedCmdsLatencyLatency of list-based commands.
PubSubBasedCmdsThe total number of commands for pub/sub functionality.
PubSubBasedCmdsLatencyLatency of pub/sub-based commands.
SetBasedCmdsThe total number of commands that are set-based.
SetBasedCmdsLatencyLatency of set-based commands.
SetTypeCmdsThe total number of write types of commands.
SetTypeCmdsLatencyLatency of write commands.
SortedSetBasedCmdsThe total number of commands that are sorted set-based.
SortedSetBasedCmdsLatencyLatency of sorted-based commands.
StringBasedCmdsThe total number of commands that are string-based.
StringBasedCmdsLatencyLatency of string-based commands.
StreamBasedCmdsThe total number of commands that are stream-based.
StreamBasedCmdsLatencyLatency of stream-based commands.

Memcached System Stats

MetricDescription
BytesReadIntoMemcachedThe number of bytes that have been read from the network by the cache node.
BytesUsedForCacheItemsThe number of bytes used to store cache items.
BytesWrittenOutFromMemcachedThe number of bytes that have been written to the network by the cache node.
BytesUsedForHashThe number of bytes currently used by hash tables.
CurrConfigThe current number of configurations stored.
EvictedUnfetchedThe number of valid items evicted from the least recently used cache (LRU) which were never touched after being set.
ExpiredUnfetchedThe number of expired items reclaimed from the LRU which were never touched after being set.
SlabsMovedThe total number of slab pages that have been moved.
NewItemsThe number of new items the cache has stored
UnusedMemoryThe amount of memory not used by data.
NewConnectionsThe number of new connections the cache has received.
ReclaimedThe number of expired items the cache evicted to allow space for new writes.
EvictionsThe number of non-expired items the cache evicted to allow space for new writes.
CurrConnectionsA count of the number of connections connected to the cache at an instant in time.
CurrItemsA count of the number of items currently stored in the cache.

Memcached Operational Stats

MetricDescription
DecrHitsThe number of decrement requests the cache has received where the requested key was found.
DecrMissesThe number of decrement requests the cache has received where the requested key was not found.
DeleteHitsThe number of delete requests the cache has received where the requested key was found.
DeleteMissesThe number of delete requests the cache has received where the requested key was not found.
GetHitsThe number of get requests the cache has received where the key requested was found.
GetMissesThe number of get requests the cache has received where the key requested was not found.
IncrHitsThe number of increment requests the cache has received where the key requested was found.
IncrMissesThe number of increment requests the cache has received where the key requested was not found.
TouchHitsThe number of keys that have been touched and were given a new expiration time.
TouchMissesThe number of items that have been touched, but were not found.
CasHitsThe number of Cas requests the cache has received where the requested key was found and the Cas value matched.
CasMissesThe number of Cas requests the cache has received where the key requested was not found.
CmdFlushThe number of flush commands the cache has received.
CmdGetThe number of get commands the cache has received.
CmdSetThe number of set commands the cache has received.
CmdConfigGetThe cumulative number of config get requests.
CmdConfigSetThe cumulative number of config set requests.
CmdTouchThe cumulative number of touch requests.
CasBadvalThe number of CAS (check and set) requests the cache has received where the Cas value did not match the Cas value stored.

sfPoller Configuration

Select ElastiCache Endpoint Type in Add Endpoints and add the Cluster Name

  • Add Endpoint

  • Select ElastiCache Endpoint

  • Enter the ClusterName

  • Select the plugin from the dropdown under Plugins tab and config the polling interval. Plugin configuration for ElastiCache services this includes cloudwatch-elasticache-redis and cloudwatch-elasticache-memcached plugin. You can enable/disable any of the plugin based on your needs and instance support.

  • cloudwatch-elasticache-redis:

    a monitoring support for AWS ElastiCache Redis Cluster, collects all the hostlevel, node level and operational stats of a redis cluster.

  • cloudwatch-elasticache-memcached:

    a monitoring support for AWS ElastiCache memcached Cluster, collects all the hostlevel, system level and operational stats of a memcached cluster.

View Data and Dashboards

All CloudWatch metrics are collected and tagged based on their ElastiaCache type to get displayed in their respective dashboard template. Use ElastiCache_Redis or ElastiCache_Memcached for data visualization as per the Engine.