Storing Grafana Loki log data in a Google Cloud Storage bucket

Published

I am a big fan of Grafana Loki for log collection in Kubernetes because it's simple and lightweight, and makes both collecting and querying logs with Grafana very easy.
Until recently I always configured Loki to use persistent volumes to store log data. It works well, but the problem with persistent volumes is that it's difficult to predict how large volumes you need depending on the amount of logs and the configured retention, so just to simplify things and stop worrying about volume size I decided to switch the storage to a Google Cloud Storage bucket instead.
Since our apps are hosted in Google Kubernetes Engine in the same location, performance is still pretty good and we can store an unlimited amount of logs indefinitely if we want.
In this post I am going to describe how to configure Loki to use a Google Cloud Storage to store log data.
In a terminal, enter the following environment variables so we can remove some duplication in commands we need to run:
Until recently I always configured Loki to use persistent volumes to store log data. It works well, but the problem with persistent volumes is that it's difficult to predict how large volumes you need depending on the amount of logs and the configured retention, so just to simplify things and stop worrying about volume size I decided to switch the storage to a Google Cloud Storage bucket instead.
Since our apps are hosted in Google Kubernetes Engine in the same location, performance is still pretty good and we can store an unlimited amount of logs indefinitely if we want.
In this post I am going to describe how to configure Loki to use a Google Cloud Storage to store log data.
In a terminal, enter the following environment variables so we can remove some duplication in commands we need to run:
ENVIRONMENT=production PROJECT=brella-$ENVIRONMENT BUCKET_NAME=$PROJECT-loki SA_NAME=loki-logging NS=logging SA=${SA_NAME}@${PROJECT}.iam.gserviceaccount.com
These basically set the GCP project name (we at Brella have a project for each environment named with the convention `brella-<environment>`, so for production the project would simply be `brella-production`). You can of course customize the names of the project, the bucket , the service account (that we'll use to be able to manipulate objects in the bucket) and the namespace in Kubernetes in which we'll install Loki.
Creating the storage bucket
To create the storage bucket from the terminal run this command:
gsutil mb -b on -l eu -p ${PROJECT} gs://${BUCKET_NAME}/
In my case I chose "eu" as the location to have the log data stored in multiple regions inside Europe, so if you prefer you can choose a single region to save on costs. To double check that the bucket has been created:
gsutil ls gs://${BUCKET_NAME}/
Creating the service account
Once installed in Kubernetes, Loki's components need to be able to access the bucket and manipulate the objects inside of it. The best way to authenticate deployments in GKE and grant them the required permissions to modify the contents of the bucket, the best approach is using Workload Identity. This basically links a GCP service account that has the required permissions, with a regular Kubernetes service account, so that the latter "inherits" the permissions from the former. This makes things easier than for example mounting a service account key file in the pods etc.
To create the service account run:
To create the service account run:
gcloud iam service-accounts create ${SA_NAME} --project ${PROJECT} --display-name="Service account for Loki" gcloud projects add-iam-policy-binding ${PROJECT} \ --member="serviceAccount:${SA_NAME}@${PROJECT}.iam.gserviceaccount.com" \ --project ${PROJECT} \ --role="roles/storage.objectAdmin"
As you can see, here we assign a role to the service account that allows it to modify the content of storage buckets.
Deploying Loki
We'll deploy Loki with Helm. There are various official charts from Grafana for this, but the one I prefer is Loki-distributed because it's simple to manage and allows for a scalable Loki deployment where each component can be scaled separately according to your needs.
First, we need to create a file containing the configuration options for the Helm chart. So run the following command to create a file named /tmp/loki.yaml or wherever you want with the following:
First, we need to create a file containing the configuration options for the Helm chart. So run the following command to create a file named /tmp/loki.yaml or wherever you want with the following:
cat > /tmp/loki.yaml <<EOF fullnameOverride: loki enabled: false distributor: replicas: 3 maxUnavailable: 1 gateway: replicas: 3 maxUnavailable: 1 basicAuth: enabled: false customParams: gcsBucket: loki ingester: replicas: 3 maxUnavailable: 1 persistence: enabled: true querier: replicas: 3 maxUnavailable: 1 persistence: enabled: true serviceAccount: annotations: iam.gke.io/gcp-service-account: ${SA} loki: config: | auth_enabled: false server: http_listen_port: 3100 distributor: ring: kvstore: store: memberlist memberlist: join_members: - {{ include "loki.fullname" . }}-memberlist schema_config: configs: - from: 2020-09-07 store: boltdb-shipper object_store: gcs schema: v11 index: prefix: loki_index_ period: 24h ingester: lifecycler: ring: kvstore: store: memberlist replication_factor: 1 chunk_idle_period: 10m chunk_block_size: 262144 chunk_encoding: snappy chunk_retain_period: 1m max_transfer_retries: 0 wal: dir: /var/loki/wal limits_config: enforce_metric_name: false reject_old_samples: true reject_old_samples_max_age: 168h max_cache_freshness_per_query: 10m retention_period: 2160h split_queries_by_interval: 15m storage_config: gcs: bucket_name: {{ .Values.customParams.gcsBucket }} boltdb_shipper: active_index_directory: /var/loki/boltdb-shipper-active cache_location: /var/loki/boltdb-shipper-cache cache_ttl: 24h shared_store: gcs chunk_store_config: max_look_back_period: 0s table_manager: retention_deletes_enabled: true retention_period: 2160h query_range: align_queries_with_step: true max_retries: 5 cache_results: true results_cache: cache: enable_fifocache: true fifocache: max_size_items: 1024 validity: 24h frontend_worker: frontend_address: loki-query-frontend:9095 frontend: log_queries_longer_than: 5s compress_responses: true tail_proxy_url: http://loki-querier:3100 compactor: shared_store: gcs retention_enabled: true retention_delete_delay: 2h retention_delete_worker_count: 150 compaction_interval: 10m ruler: storage: type: local local: directory: /etc/loki/rules ring: kvstore: store: memberlist rule_path: /tmp/loki/scratch EOF
Most of these settings should be straightforward if you are familiar with how Loki works. If not, please read about its architecture as well as its configuration.
I just want to highlight one small bit:
serviceAccount: annotations: iam.gke.io/gcp-service-account: ${SA}
This annotation for the Kubernetes service account used by Loki's components is what actually enables the link between this service account and the GCP service account for Workload Identity to work (Kubernetes to GCP side).
Next steps:
- Create the namespace:
kubectl create ns ${NS}
- Enable Workload Identity for the k8s service account
gcloud iam service-accounts add-iam-policy-binding --project ${PROJECT} ${SA} \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:$PROJECT.svc.id.goog[logging/loki]"
- Add the Grafana Helm repository:
helm repo add grafana https://grafana.github.io/helm-charts/ helm repo update
- Install Loki distributed:
helm upgrade --install loki grafana/loki-distributed \ -f /tmp/loki.yaml \ --set customParams.gcsBucket=${BUCKET_NAME} \ --version 0.67.1 \ --namespace ${NS}
- We also need to install Promtail, which is a DaemonSet that will send the logs from the nodes to Loki:
helm upgrade --install \ --namespace ${NS} \ --set "loki.serviceName=loki-query-frontend" \ --set "tolerations[0].operator=Exists,promtail.tolerations[0].effect=NoSchedule,promtail.tolerations[0].key=cloud.google.com/gke-spot" \ --set "config.clients[0].url=http://loki-gateway/loki/api/v1/push" \ promtail grafana/promtail
That's really it for a basic setup that stores Loki logs in a Google Cloud Storage bucket. Once all the pods are up and running, add a new Loki data source to Grafana with the URL http://loki-query-frontend.logging:3100 and confirm that the logs are coming through.
To confirm that the logs are being stored in the bucket wait for 10-15 minutes then run
gsutil ls gs://${BUCKET_NAME}/
You should see a json file as well as a couple of directories that store the log chunks and the index.
You can also wait for a while to collect some logs, then restart all the Loki components (you'll see a few deployments as well as a couple of StatefulSets and the Promtail DaemonSet) and confirm that you can still search the old logs, to confirm the data is retrieved from the bucket and not from local storage. In my case since it was the first time I set up Loki with a Google bucket I wanted to be 100% sure about this, so I basically tested by completely uninstalling Loki, and ensuring I could still search through old logs after reinstalling Loki again. It works nicely and so far I haven't seen any difference in query speed compared to persistent volumes. Perhaps this might change as the amount of logs grows, dunno yet.
If you already use Loki and have the same problem with persistent volumes that I had, or if you are looking to use Loki as a lightweight log collector in Kubernetes and also use GCP, then I hope this post was useful. Please let me know in the comments if you run into any issues.