Command Cheat Charts

DevOps Command Cheat Sheets

118 handy commands across 19 operational areas, grouped from simple checks to complex production moves.

Kubernetes / kubectl

Pods, deployments, events, logs, rollouts, and cluster debugging.

7 commands

Simple

List pods with status

kubectl get pods -n payments -o wide

Shows pod status, node placement, restarts, and pod IPs. This is the first quick scan during most Kubernetes incidents.

Intermediate

Describe a failing pod

kubectl describe pod payments-api-7c9df -n payments

Use this for events, image pull failures, probe failures, scheduling errors, resource pressure, and volume mount problems.

Advanced

Debug with an ephemeral container

kubectl debug -n payments -it pod/payments-api-7c9df --image=nicolaka/netshoot --target=app

Attaches a temporary troubleshooting container to inspect DNS, TCP, routes, certificates, and process-level symptoms without rebuilding the app image.

Complex

Find pods using the most memory

kubectl top pods -A --sort-by=memory

Use during node pressure or OOM investigations to identify noisy workloads before checking limits, requests, JVM heap, or sidecars.

Intermediate

Show recent namespace events

kubectl get events -n payments --sort-by=.lastTimestamp

Events explain scheduling, image pull, probe, OOM, and admission failures that may not appear in application logs.

Advanced

Check rollout history and rollback

kubectl rollout history deployment/payments-api -n payments && kubectl rollout undo deployment/payments-api -n payments

Use history to confirm the bad revision, then roll back quickly when a deployment is clearly causing user impact.

Complex

Inspect service endpoints

kubectl get endpointslices -n payments -l kubernetes.io/service-name=payments-api -o wide

Confirms whether ready pods are actually behind the Service. This catches selector mismatches and readiness-gate failures.

Docker

Images, containers, logs, runtime inspection, and local troubleshooting.

7 commands

Simple

See running containers

docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

Gives a clean status table without noisy columns. Useful when checking whether a local service actually started.

Intermediate

Follow recent logs

docker logs --since=15m -f api

Streams only recent logs so you do not waste time scrolling through old startup output.

Advanced

Inspect container network settings

docker inspect api --format "{{json .NetworkSettings.Networks}}"

Shows networks, aliases, IP addresses, and gateway information when containers cannot reach each other.

Complex

Check filesystem changes inside a container

docker diff api

Lists files changed since the container started. Helpful for debugging unexpected writes, generated config, or missing mounted volumes.

Intermediate

Run a container with an override shell

docker run --rm -it --entrypoint sh api:latest

Bypasses the default entrypoint so you can inspect files, permissions, environment expectations, and installed tools.

Advanced

See container resource usage

docker stats --no-stream

Shows CPU, memory, network, and block I/O usage for quick local saturation checks.

Complex

Prune dangling build cache

docker builder prune --filter until=24h

Safely cleans older build cache while keeping recent layers available. Useful when local Docker builds fill disk.

Docker Compose

Multi-container local stacks, service dependencies, health checks, and logs.

7 commands

Simple

Start services in the background

docker compose up -d

Creates networks, volumes, and containers from compose YAML without blocking your terminal.

Intermediate

Validate resolved config

docker compose config

Renders the final compose file after variable interpolation and merges. Use this before blaming Docker for a YAML/env problem.

Advanced

Restart one service after rebuild

docker compose up -d --build app

Rebuilds and recreates only the app service, keeping databases and supporting services running.

Complex

Run a one-off diagnostic shell

docker compose run --rm --entrypoint sh app

Starts a temporary container using the app service config so you can inspect env vars, DNS, mounted files, and binaries.

Intermediate

Follow logs for one service

docker compose logs -f --tail=100 app

Keeps database and helper service logs out of the way while debugging the app container.

Advanced

Recreate without dependencies

docker compose up -d --no-deps --build app

Rebuilds only the changed service and avoids bouncing databases, queues, or observability services.

Complex

Check container health status

docker inspect --format "{{json .State.Health}}" thedevopsproject-app-1

Reads Docker health-check state directly when compose says a service is running but the app is not ready.

Terraform

Plan review, state inspection, drift checks, imports, and safe infrastructure changes.

7 commands

Simple

Initialize providers and backend

terraform init

Downloads providers and configures backend state. Run this first in a new workspace or after provider/backend changes.

Intermediate

Save a plan file

terraform plan -out=tfplan

Creates an exact plan artifact. Applying this file prevents accidental differences between review and execution.

Advanced

Show state for one resource

terraform state show aws_instance.api

Inspects the current state values Terraform believes exist, useful when provider drift or imports are confusing.

Complex

Move resource address safely

terraform state mv aws_instance.old aws_instance.new

Renames a resource in state without recreating infrastructure. Use during refactors after reviewing a plan carefully.

Intermediate

Validate and format modules

terraform fmt -recursive && terraform validate

Catches syntax, provider schema, and formatting issues before plan review.

Advanced

Plan with variables file

terraform plan -var-file=prod.tfvars -out=prod.tfplan

Keeps environment inputs explicit and saves the exact plan that should be reviewed and applied.

Complex

Import existing infrastructure

terraform import aws_s3_bucket.logs company-prod-logs

Brings an existing resource under Terraform state. Always follow with plan review to align configuration.

AWS CLI

Identity, EC2, IAM, EKS, S3, CloudWatch, RDS, and production triage.

7 commands

Simple

Confirm active identity

aws sts get-caller-identity

Always verify the account, role, and user before making production changes.

Intermediate

Find recent CloudWatch log errors

aws logs filter-log-events --log-group-name /aws/eks/payments --filter-pattern "ERROR" --limit 20

Quickly searches managed logs without opening the console. Add start time filters during real incidents.

Advanced

Check RDS events

aws rds describe-events --source-type db-instance --duration 60

Shows recent RDS failovers, maintenance, backups, parameter changes, and availability events.

Complex

Find public S3 buckets from account policy status

aws s3api list-buckets --query "Buckets[].Name" --output text

Start with inventory, then inspect each bucket policy, ACL, block public access, encryption, lifecycle, and access logs.

Intermediate

Update kubeconfig for EKS

aws eks update-kubeconfig --region us-east-1 --name prod-platform

Writes the cluster context locally so kubectl can authenticate through AWS IAM.

Advanced

Find failed ECS tasks

aws ecs list-tasks --cluster prod --desired-status STOPPED --query "taskArns[0:10]"

Starts ECS incident triage by locating recently stopped tasks before describing exit codes and stopped reasons.

Complex

Query CloudTrail for risky IAM actions

aws cloudtrail lookup-events --lookup-attributes AttributeKey=EventName,AttributeValue=AttachRolePolicy --max-results 20

Helps investigate privilege changes, emergency access, and unexpected role policy attachments.

Linux Ops

CPU, memory, disk, processes, networking, cron, rsync, and systemd.

7 commands

Simple

Check disk pressure

df -h

Shows filesystem usage. Full root, log, or data partitions commonly cause outages and failed deploys.

Intermediate

Find largest files under logs

find /var/log -type f -size +100M -exec ls -lh {} \;

Use when disk fills unexpectedly. Pair with logrotate or application log settings.

Advanced

Inspect listening ports

ss -lntp

Shows TCP listeners and owning processes. Faster and more reliable than guessing whether a service bound correctly.

Complex

Trace slow syscalls for a process

strace -tt -p 1234

Attaches to a live process to see blocking file, network, DNS, or permission calls. Use carefully in production.

Intermediate

Check systemd service health

systemctl status nginx --no-pager

Shows active state, recent logs, restart behavior, and unit-file hints in one place.

Advanced

Find top memory processes

ps aux --sort=-%mem | head -15

Quickly identifies processes consuming memory before deeper heap, cache, or leak investigation.

Complex

Capture network packets for one host

tcpdump -i eth0 host 10.0.2.15 -w incident.pcap

Creates a packet capture for DNS, TLS, retransmission, or protocol analysis. Use with care on busy hosts.

Cron and rsync

Scheduled jobs, backup syncs, lock files, permissions, and missed executions.

7 commands

Simple

List user cron entries

crontab -l

Shows the current user schedule. Remember cron has a smaller environment than your shell.

Intermediate

Check cron service logs

journalctl -u cron --since "1 hour ago"

Use this to confirm whether cron triggered the job at all before debugging the script.

Advanced

Rsync with deletion and dry run

rsync -avz --delete --dry-run /data/ app@backup:/data/

Preview destructive sync behavior before deleting files on the destination.

Complex

Prevent overlapping cron runs

flock -n /tmp/backup.lock rsync -avz /data/ app@backup:/data/

Uses a lock file so long-running jobs do not overlap and corrupt backups or saturate I/O.

Intermediate

Install a cron file safely

crontab backup.cron && crontab -l

Loads a reviewed cron file and immediately confirms what cron will run.

Advanced

Preserve permissions with rsync

rsync -aHAX --numeric-ids /srv/data/ backup:/srv/data/

Preserves hard links, ACLs, extended attributes, and numeric ownership for system-style backups.

Complex

Bandwidth-limit a production sync

rsync -az --partial --bwlimit=50000 /data/ app@backup:/data/

Keeps a large sync from saturating links and preserves partial files if the transfer is interrupted.

SSL/TLS and mTLS

Certificate expiry, SANs, chains, trust stores, handshakes, and mutual auth.

7 commands

Simple

Show certificate expiry

openssl x509 -in server.crt -noout -dates

Quickly confirms notBefore and notAfter dates for a certificate file.

Intermediate

Inspect remote certificate chain

openssl s_client -connect api.example.com:443 -servername api.example.com -showcerts

Checks SNI, served certificates, chain order, and handshake errors from the client point of view.

Advanced

Verify certificate against CA

openssl verify -CAfile ca.pem server.crt

Confirms whether a certificate chains to the expected CA bundle.

Complex

Test mTLS with client certificate

curl --cert client.crt --key client.key --cacert ca.pem https://api.example.com/health

Validates both server trust and client authentication. Use when service mesh or gateway mTLS fails.

Intermediate

Print SANs from a certificate

openssl x509 -in server.crt -noout -text

Look for Subject Alternative Name entries when hostname validation fails despite an unexpired certificate.

Advanced

Check TLS protocol negotiation

openssl s_client -connect api.example.com:443 -tls1_2 -servername api.example.com

Confirms whether a service still accepts or rejects a specific TLS version.

Complex

Inspect Kubernetes TLS secret

kubectl get secret api-tls -n ingress -o jsonpath="{.data.tls\.crt}" | base64 -d | openssl x509 -noout -subject -issuer -dates

Decodes the served certificate from a Kubernetes secret and verifies subject, issuer, and expiry.

Java / JVM Ops

Thread dumps, heap dumps, GC, native memory, non-heap, and container memory limits.

7 commands

Simple

List Java processes

jps -lv

Shows JVM process IDs and startup arguments so you can target the right process.

Intermediate

Capture a thread dump

jstack -l 1234 > thread-dump.txt

Use for deadlocks, blocked threads, high CPU, stuck requests, or pool starvation.

Advanced

Create a heap dump

jcmd 1234 GC.heap_dump /tmp/heap.hprof

Captures heap for memory leak analysis. Ensure enough disk space before running in production.

Complex

Inspect native memory

jcmd 1234 VM.native_memory summary

Helps explain memory outside Java heap: metaspace, threads, code cache, direct buffers, and native allocations.

Intermediate

Show heap and non-heap usage

jcmd 1234 GC.heap_info

Shows heap layout and usage. Pair with metaspace and native memory checks when container RSS is high.

Advanced

Capture class histogram

jcmd 1234 GC.class_histogram > class-histogram.txt

Counts live objects by class and helps identify suspicious growth before taking a full heap dump.

Complex

Enable Java Flight Recorder

jcmd 1234 JFR.start name=incident settings=profile duration=120s filename=/tmp/incident.jfr

Captures CPU, allocation, lock, GC, and thread events with lower overhead than many ad hoc profilers.

Kafka

Topics, partitions, consumer lag, replication, ISR, retention, and broker health.

7 commands

Simple

List topics

kafka-topics.sh --bootstrap-server broker:9092 --list

Confirms the cluster is reachable and shows available topics.

Intermediate

Check consumer lag

kafka-consumer-groups.sh --bootstrap-server broker:9092 --describe --group payments-consumer

Shows current offset, log end offset, lag, and partition assignment for a consumer group.

Advanced

Describe topic partitions

kafka-topics.sh --bootstrap-server broker:9092 --describe --topic payments

Use to inspect partition count, leader broker, replicas, and in-sync replicas.

Complex

Reset consumer offsets after review

kafka-consumer-groups.sh --bootstrap-server broker:9092 --group payments-consumer --topic payments --reset-offsets --to-earliest --execute

Replays messages from earliest offset. Treat as a controlled operation because it can duplicate processing.

Intermediate

Produce a test message

kafka-console-producer.sh --bootstrap-server broker:9092 --topic payments

Useful for validating producer connectivity, ACLs, and topic availability during setup or incident triage.

Advanced

Consume from the beginning

kafka-console-consumer.sh --bootstrap-server broker:9092 --topic payments --from-beginning --max-messages 10

Samples stored messages to verify serialization, routing, headers, and whether data is arriving.

Complex

Check broker API versions

kafka-broker-api-versions.sh --bootstrap-server broker:9092

Confirms protocol compatibility between clients and brokers after upgrades.

Prometheus and Grafana

PromQL, scrape targets, alerts, dashboards, SLI/SLO, and burn-rate analysis.

7 commands

Simple

Check Prometheus targets

curl -s http://prometheus:9090/api/v1/targets

Shows which scrape targets are up or down and why metrics may be missing.

Intermediate

Query API error rate

sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

Basic SLI query for server error ratio over five minutes.

Advanced

Check p99 latency

histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))

Calculates p99 latency from histogram buckets grouped by service.

Complex

SLO burn-rate style query

sum(rate(http_requests_total{status=~"5.."}[1h])) / sum(rate(http_requests_total[1h])) > 14.4 * 0.001

Example fast-burn alert for a 99.9 percent availability SLO. Tune windows and multiplier to your policy.

Intermediate

List active alerts

curl -s http://prometheus:9090/api/v1/alerts

Shows currently firing and pending alerts, including labels and annotations.

Advanced

Query SLI from Prometheus API

curl -G http://prometheus:9090/api/v1/query --data-urlencode 'query=sum(rate(http_requests_total{status=~"5.."}[5m]))'

Runs a PromQL query from automation or CI without opening the Prometheus UI.

Complex

Check multi-window burn rate

(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 14.4 * 0.001) and (sum(rate(http_requests_total{status=~"5.."}[1h])) / sum(rate(http_requests_total[1h])) > 14.4 * 0.001)

Combines short and longer windows to reduce noisy SLO alerts while still catching fast user-impacting burn.

MongoDB and MySQL

Connection checks, slow queries, indexes, replication, backups, and production safety.

7 commands

Simple

MongoDB ping

mongosh --eval "db.adminCommand({ ping: 1 })"

Confirms the client can connect and the server can answer a basic command.

Intermediate

MySQL process list

mysql -e "SHOW FULL PROCESSLIST;"

Shows active queries, locks, long-running sessions, and blocked connections.

Advanced

MongoDB query plan

db.orders.find({status: "pending"}).explain("executionStats")

Use in mongosh to identify collection scans, bad indexes, and high document examination counts.

Complex

MySQL replication health

mysql -e "SHOW REPLICA STATUS\G"

Checks replication lag, IO thread, SQL thread, last error, and failover readiness.

Intermediate

MongoDB current operations

db.currentOp({ "secs_running": { $gt: 5 } })

Run in mongosh to find long-running operations, blocked writes, and expensive reads.

Advanced

MySQL index inspection

mysql -e "SHOW INDEX FROM orders;" appdb

Shows existing indexes so you can compare them with slow query predicates and joins.

Complex

MongoDB replica status

rs.status()

Run in mongosh to inspect primary/secondary health, election state, replication lag hints, and member errors.

DevSecOps: Trivy and Checkmarx

Container scanning, SAST gates, policy exceptions, and remediation evidence.

7 commands

Simple

Scan an image with Trivy

trivy image api:latest

Finds OS and application dependency vulnerabilities in a container image.

Intermediate

Fail on high and critical findings

trivy image --severity HIGH,CRITICAL --exit-code 1 api:latest

Turns scanning into a CI gate while keeping lower severity findings visible but non-blocking.

Advanced

Scan IaC config

trivy config ./terraform

Finds Terraform, Kubernetes, and other infrastructure misconfigurations before deployment.

Complex

Run Checkmarx CLI scan

cx scan create --project-name payments-api --source . --branch main

Starts a SAST scan from CI/CD. Pair with policy gates and a triage workflow for false positives.

Intermediate

Generate Trivy JSON report

trivy image --format json --output trivy-report.json api:latest

Creates machine-readable evidence for CI artifacts, exception review, or vulnerability dashboards.

Advanced

Scan filesystem dependencies

trivy fs --scanners vuln,secret,misconfig .

Checks the working tree for dependency vulnerabilities, leaked secrets, and configuration issues.

Complex

Run Checkmarx with policy gate

cx scan create --project-name payments-api --source . --branch main --threshold "high=0;medium=10"

Example CI-style SAST gate that blocks new high-risk findings and limits medium-risk accumulation.

Python DevOps Automation

Operational scripts, APIs, JSON/YAML, subprocess safety, retries, and cloud SDKs.

7 commands

Simple

Create a virtual environment

python3 -m venv .venv && source .venv/bin/activate

Keeps automation dependencies isolated from system Python.

Intermediate

Pretty-print JSON from an API

python3 -m json.tool response.json

Quickly validates and formats JSON output from scripts or curl captures.

Advanced

Run module with warnings visible

python3 -Wd scripts/smoke_api.py

Surfaces deprecation warnings that can break automation during future runtime upgrades.

Complex

Profile a slow automation script

python3 -m cProfile -o profile.out scripts/sync_inventory.py

Captures timing data so you can identify slow API calls, inefficient parsing, or accidental loops.

Intermediate

Install exact dependencies

python3 -m pip install -r requirements.txt

Installs automation dependencies from a reviewed manifest so scripts behave consistently across hosts.

Advanced

Run unit tests verbosely

python3 -m unittest discover -s scripts/tests -v

Runs standard-library tests without requiring pytest, useful for lightweight automation repos.

Complex

Trace imports and startup time

python3 -X importtime scripts/sync_inventory.py

Finds slow imports and startup overhead in automation that runs frequently from cron or CI.

OpenSearch

Cluster health, shards, indexes, snapshots, search latency, and log analytics.

4 commands

Simple

Check cluster health

curl -s http://opensearch:9200/_cluster/health?pretty

Shows green/yellow/red status, active shards, initializing shards, relocating shards, and unassigned shards.

Intermediate

List indexes by size

curl -s "http://opensearch:9200/_cat/indices?v&s=store.size:desc"

Finds large indexes that may be driving disk pressure, slow snapshots, or retention issues.

Advanced

Explain unassigned shards

curl -s http://opensearch:9200/_cluster/allocation/explain?pretty

Explains why a shard cannot allocate, such as disk watermark, missing node attributes, or replica constraints.

Complex

Create a snapshot

curl -X PUT "http://opensearch:9200/_snapshot/prod_repo/snap-2026-06-26?wait_for_completion=true"

Runs a cluster snapshot before risky maintenance. Repository must already be configured and healthy.

Logstash and Filebeat

Log pipelines, grok parsing, backpressure, outputs, registry state, and delivery checks.

4 commands

Simple

Test Logstash config

logstash --path.settings /etc/logstash -t

Validates pipeline syntax before restarting Logstash.

Intermediate

Run Filebeat config test

filebeat test config -e

Checks YAML, modules, inputs, and output configuration with logs printed to stderr.

Advanced

Test Filebeat output

filebeat test output -e

Confirms connectivity and authentication to OpenSearch, Elasticsearch, Logstash, or another configured output.

Complex

Debug Logstash pipeline events

logstash -f pipeline.conf --log.level debug

Runs a pipeline with verbose logs so you can troubleshoot grok failures, conditionals, and output retries.

Harness

CI/CD pipelines, deployments, connectors, delegates, environments, and rollback evidence.

4 commands

Simple

Check Harness CLI auth

harness login

Confirms CLI access before triggering pipelines or inspecting project resources.

Intermediate

List pipelines

harness pipeline list --project-id payments --org-id platform

Shows available pipelines and identifiers needed for automation.

Advanced

Run a pipeline

harness pipeline run --project-id payments --org-id platform --pipeline-id deploy-api

Triggers a deployment or CI workflow from the CLI using known Harness identifiers.

Complex

Inspect delegate health

harness delegate list --account-id ACCOUNT_ID

Checks whether delegates are available to execute Kubernetes, cloud, or artifact operations.

ZooKeeper

Quorum health, znodes, Kafka metadata legacy mode, sessions, watches, and latency.

4 commands

Simple

Check server mode

echo ruok | nc zookeeper 2181

Returns imok when ZooKeeper is reachable and responding to four-letter commands.

Intermediate

Show server stats

echo stat | nc zookeeper 2181

Shows leader/follower mode, connections, latency, packets, and znode count.

Advanced

List root znodes

zkCli.sh -server zookeeper:2181 ls /

Confirms namespace contents and whether clients are writing expected znodes.

Complex

Check watches summary

echo wchs | nc zookeeper 2181

Shows watch counts and helps diagnose clients creating too many watches or sessions.

SRE: SLI, SLO, CSAT, SSAT

Reliability indicators, error budgets, customer satisfaction, support satisfaction, and incident review.

4 commands

Simple

Availability SLI formula

good_events / total_events

The core SLI shape: define good user-visible events, divide by total valid events, and track over the SLO window.

Intermediate

Error budget remaining

1 - ((1 - current_availability) / (1 - slo_target))

Shows how much reliability budget remains for a target such as 99.9 percent monthly availability.

Advanced

CSAT percentage

(positive_survey_responses / total_survey_responses) * 100

Measures customer satisfaction from survey responses. Useful as a business-facing reliability companion metric.

Complex

SSAT support signal

(satisfied_support_responses / total_support_responses) * 100

Support satisfaction helps detect reliability pain that may not be visible in service-level telemetry alone.