How Kubernetes Works and Why It’s So Complicated

Table of Contents

Written By

Asaf Liveanu

Co-Founder & CPO

Asaf is the CPO and co-founder of Finout. He has more than 12 years of experience in software engineering, QA and product management at companies like Taboola and Intel. In his last position at Logz.io, he met Roi, and together they decided to embark on the Finout journey.

In a 2021 survey, 88% of organizations claimed to be already using Kubernetes for container orchestration. On account of its fundamental features for abstracting the provisioning of cluster resources, Kubernetes has now become the standard platform for the orchestration of microservices and container-based workloads. However, although Kubernetes simplifies deployment, its distributed ecosystem also introduces challenges in terms of cost management and the tracking of consumption metrics for clusters.

In this article, we explore the complexities of a Kubernetes cluster, the challenges of managing costs due to such innate complexities, and best practices to improve cost optimization.

Why Managing Kubernetes Costs Is So Complicated

While Kubernetes offers enhanced agility, superior fault tolerance, improved velocity, and increased productivity, the platform comes with inherent complexities when it comes to managing and monitoring costs. As containers in a Kubernetes ecosystem are ephemeral, observing costs and resource usage patterns over a period of time is challenging.

Kubernetes clusters often run in distributed environments (disparate on-premises and Cloud environments) with different resource deployment and pricing options. A cluster is typically characterized by immutable resources that are frequently spun up or terminated. A cluster may even be spread across different Cloud providers and services. This makes cost management, allocation, and analysis an arduous undertaking.

How Kubernetes Works

Kubernetes enables dynamic resource provisioning by abstracting machine resources and presenting them to workloads using API objects. Machines running containerized workloads in Kubernetes are referred to as nodes. The platform follows a client-server pattern, with server machines called master nodes (collectively known as the control plane) and client machines called worker nodes.

Figure 1: Components of a Kubernetes cluster (Source: Kubernetes)

Master Nodes (Control Plane)

A control plane is responsible for running and managing an entire cluster. Components of the control plane include:

API server: Serves as the entry point for enabling interaction between the cluster and clients (including the dashboard UI, Kubernetes API, and CLI terminal)
Controller manager: Monitors and logs the state of cluster nodes
Scheduler: Ensures the placement of containerized applications with an appropriate worker node based on pod requirements
Etcd: Serves as the key-value database that stores the cluster state

Each operating cluster needs at least one control plane; however, for production clusters, a common approach is to host the control plane across different nodes to ensure high availability and fault tolerance.

Worker Nodes

These are the cluster machines that host the pods – the Kubernetes objects that encapsulate a containerized application. The primary components of the worker node include:

Kubelet: Enables communication between the node and control plane by implementing instructions from the API server to manage containerized applications
Kube-proxy: Enables communication between cluster services
Container runtime: Executes the application within a container

Kubernetes Objects & Workload Implementation

Kubernetes uses various objects to represent the state of a cluster. These are persistent entities used for almost all fundamental operations of a cluster, including deployment, scaling, and maintenance. Several of these Kubernetes objects are discussed below.

Pods, ReplicaSets, and Deployments

These comprise the deployment objects that are used to host containers on worker nodes:

The smallest deployment unit in Kubernetes, a pod, is used to run containers. Each pod has a unique ID and IP address that represents a single instance of a containerized process.
A ReplicaSet is a template used to define a set of identical pods (also known as replicas).
A deployment defines the desired cluster state that enforces declarative updates for ReplicaSets and pods.

Volumes, PVs, and PVCs

These are volume abstractions used to allocate storage resources to applications within pods:

A volume is a data directory connected to containers in a pod; volumes are ephemeral and are deleted as soon as the pods connected to them terminate.
A Persistent Volume (PV) is a data directory connected to the pod whose lifecycle is independent of the pod.
A Persistent Volume Claim (PVC) is a request for PV storage by an application/process.

Namespaces and Services

A namespace enables the isolation of resources within the cluster by partitioning a Kubernetes cluster into multiple virtual clusters; these are logically separated but can communicate with each other. Namespaces help manage resources across multiple environments, teams, or projects since resource names are always unique within a namespace.

Services, on the other hand, enable networking by defining a set of pods and a policy for accessing them.

Challenges in Managing Costs for Kubernetes Clusters and Applications

In this section, we discuss some of the challenges in managing Kubernetes costs.

“Set and forget” Autoscaling Policies

While autoscaling is a powerful feature for high-availability cluster management, one common scenario is that developers set an autoscaling policy but fail to monitor it. If recommended cost optimization techniques, such as rightsizing, are not enforced, Kubernetes will spin up additional resources as soon as the current resource needs are not met. This often results in over-resourcing, i.e., in provisioning unused resources within a cluster. More on this below.

Overprovisioning of Resources

While prioritizing resource availability and workload performance, administrators often set resource limits that are considerably higher than the workload actually requires. This results in a cluster with overprovisioned resources that are partially or rarely consumed. Overprovisioning also obfuscates attempts at actual resource cost estimation incurred by workloads.

Multi-tenant & Multi-cloud: Challenges to Cost Allocation

With applications that are hosted on multi-tenant, multi-cloud clusters, there is a lack of visibility. This makes it particularly complex to link the application to specific resources and allocate costs.

Inadequate Cost Management Tooling

Kubernetes does not offer native tools that provide a standard approach to cost management. It is usually up to the cluster administrators and developers to connect Kubernetes APIs to metric monitoring and visualization tools that help determine costs at the object level. This means that efficient cluster cost management depends on the quality of the cloud cost observability toolset selected.

Beyond tooling, organizations managing complex multi-tenant and multi-cloud clusters often benefit from working with a Kubernetes consulting partner — particularly one with certified engineers (CKA/CKAD/CKS) and hands-on production experience — to implement cost governance frameworks and avoid common pitfalls like overprovisioning and unmonitored autoscaling policies.

Complexities of a Hybrid Setup

While running clusters across cloud instances helps with the high availability of workloads, cloud providers typically offer varying structures of cost determination and billing reports. Generating billing calculations and cost data from multiple providers in a hybrid infrastructure complicates the tracking of usage costs.

Need for Specialized Accounting Mechanisms

Being ephemeral, a container’s lifespan may be short, terminating after running an intended process. As a result, accounting for containers requires specialized cost management systems that can log ephemeral entities along with the processes they run and their associated costs.

Best Practices for Kubernetes Cost Monitoring and Optimization

While Kubernetes aids complex Cloud-native deployments, the dynamic nature of production clusters makes cost allocation, optimization, and management a persistent challenge. However, with the adoption of the right FinOps tools and best practices, organizations can bring financial discipline to optimize and manage their Cloud expenses.

Develop Allocation Budgets Using Unit Costs

To enable effective budgets, all Kubernetes tenant cost calculations should begin with evaluating the cost at the unit level (for instance, the cost of operating a container). A unit cost is typically determined using the consumed resource units, the operating cost of the resource, and the duration for which the resource is consumed by a Kubernetes workload.

This calculated data can be tallied into hourly, daily, or monthly durations with supplementary data points to help administrators assess usage costs at the most granular level.

Label All Cluster Resources

Labels and tags help establish transparency since they enable the efficient identification of resources across distributed deployment environments. Labels further enable precise documentation that makes it easy to reproduce and audit cost allocation figures. Labeling enforces accurate billing of short-lived, yet expensive, processes that run on ephemeral containers.

Use Monitoring Tools and Dashboards to Enforce Visibility

As resource requirements keep changing, it is recommended that organizations monitor demand in their workloads as an ongoing process to determine average resource consumption over a specific time interval.

This approach is massively simplified by deploying monitoring solutions with intuitive UIs that help visualize the relationship between resource consumption and overall Cloud spend. Monitoring tools also recommend areas where resource consumption can be reduced for cost optimization. While Cloud service providers offer billing summaries for resources consumed, monitoring tools enable the correlation of these bills across processes and objects consuming the resources, thus helping with cost observability.

Rightsize Workloads

Rightsizing is the process of provisioning Cloud instances with adequate resources for optimal workload performance at the lowest possible cost. Rightsizing workloads is an effective mechanism to reduce resource wastage since it minimizes overprovisioning and promotes cost optimization.

Identify and Terminate Unused Resources

Organizations often end up deploying objects that remain unused and add to resource costs. It is important to practice regular cleanups and terminate resources that are no longer required. As a recommended practice, organizations should also baseline the Total Cost of Ownership (TCO) and adopt longer-term strategies to keep TCO to a minimum.

Employ a Cloud Cost Monitoring and Optimization (CCMO) Tool

CCMO tools go beyond monitoring and visualization to offer recommendations on optimizing Cloud spend. Such tools also offer out-of-the-box optimization and comprehensive multi-cloud observability, thereby simplifying the allocation and management of costs in Kubernetes environments.

With the right CCMO tool, organizations can align the goals of development and financial management teams by offering accurate cost visibility through IT Showback.

Conclusion

Kubernetes abstracts distributed resources for simpler deployment operations, by doing so, reduces visibility into how each process affects total Cloud spend.

In most Kubernetes clusters, the complexities of an underlying infrastructure are often ignored during the initial stages of deployment. However, as a cluster matures, organizations are increasingly forced to deal with challenges that have been present since day one. As presented here, there are several strategies that may be applied to reduce costs and improve visibility over where those costs arise.

Adopt the new standard for
cloud & AI spend

Start free trial now

How Kubernetes Works and Why It’s So Complicated

Written By

Asaf Liveanu

Why Managing Kubernetes Costs Is So Complicated

How Kubernetes Works

Master Nodes (Control Plane)

Worker Nodes

Kubernetes Objects & Workload Implementation

Pods, ReplicaSets, and Deployments

Volumes, PVs, and PVCs

Namespaces and Services

Challenges in Managing Costs for Kubernetes Clusters and Applications

“Set and forget” Autoscaling Policies

Overprovisioning of Resources

Multi-tenant & Multi-cloud: Challenges to Cost Allocation

Inadequate Cost Management Tooling

Complexities of a Hybrid Setup

Need for Specialized Accounting Mechanisms

Best Practices for Kubernetes Cost Monitoring and Optimization

Develop Allocation Budgets Using Unit Costs

Label All Cluster Resources

Use Monitoring Tools and Dashboards to Enforce Visibility

Rightsize Workloads

Identify and Terminate Unused Resources

Employ a Cloud Cost Monitoring and Optimization (CCMO) Tool

Conclusion

FAQs

One platform.
Every team. Complete control.

How Kubernetes Works and Why It’s So Complicated

Written By

Asaf Liveanu

Why Managing Kubernetes Costs Is So Complicated

How Kubernetes Works

Master Nodes (Control Plane)

Worker Nodes

hbspt.cta._relativeUrls=true;hbspt.cta.load(8840961, '3f740fc2-9e20-4775-a12e-812fb568f292', {"useNewLoader":"true","region":"na1"});

Kubernetes Objects & Workload Implementation

Pods, ReplicaSets, and Deployments

Volumes, PVs, and PVCs

Namespaces and Services

Challenges in Managing Costs for Kubernetes Clusters and Applications

“Set and forget” Autoscaling Policies

Overprovisioning of Resources

Multi-tenant & Multi-cloud: Challenges to Cost Allocation

Inadequate Cost Management Tooling

Complexities of a Hybrid Setup

Need for Specialized Accounting Mechanisms

Best Practices for Kubernetes Cost Monitoring and Optimization

Develop Allocation Budgets Using Unit Costs

Label All Cluster Resources

Use Monitoring Tools and Dashboards to Enforce Visibility

Rightsize Workloads

Identify and Terminate Unused Resources

Employ a Cloud Cost Monitoring and Optimization (CCMO) Tool

Conclusion

FAQs

Stay ahead of FinOps trends

One platform. Every team. Complete control.

One platform.
Every team. Complete control.