has_many :codes

Storing Grafana Loki log data in a Google Cloud Storage bucket


I am a big fan of Grafana Loki for log collection in Kubernetes because it's simple and lightweight, and makes both collecting and querying logs with Grafana very easy.

Until recently I always configured Loki to use persistent volumes to store log data. It works well, but the problem with persistent volumes is that it's difficult to predict how large volumes you need depending on the amount of logs and the configured retention, so just to simplify things and stop worrying about volume size I decided to switch the storage to a Google Cloud Storage bucket instead.

Since our apps are hosted in Google Kubernetes Engine in the same location, performance is still pretty good and we can store an unlimited amount of logs indefinitely if we want. 

In this post I am going to describe how to configure Loki to use a Google Cloud Storage to store log data.

In a terminal, enter the following environment variables so we can remove some duplication in commands we need to run:


These basically set the GCP project name (we at Brella have a project for each environment named with the convention `brella-<environment>`, so for production the project would simply be `brella-production`). You can of course customize the names of the project, the bucket , the service account (that we'll use to be able to manipulate objects in the bucket) and the namespace in Kubernetes in which we'll install Loki.

Creating the storage bucket

To create the storage bucket from the terminal run this command:

gsutil mb -b on -l eu -p ${PROJECT} gs://${BUCKET_NAME}/ 

In my case I chose "eu" as the location to have the log data stored in multiple regions inside Europe, so if you prefer you can choose a single region to save on costs. To double check that the bucket has been created:

gsutil ls gs://${BUCKET_NAME}/


Creating the service account

Once installed in Kubernetes, Loki's components need to be able to access the bucket and manipulate the objects inside of it. The best way to authenticate deployments in GKE and grant them the required permissions to modify the contents of the bucket, the best approach is using Workload Identity (do read that page because you need to enable Workload Identity for your cluster before proceeding).  This basically links a GCP service account that has the required permissions, with a regular Kubernetes service account, so that the latter "inherits" the permissions from the former. This makes things easier than for example mounting a service account key file in the pods etc.

To create the service account run:

gcloud iam service-accounts create ${SA_NAME} --project ${PROJECT} --display-name="Service account for Loki"

gcloud projects add-iam-policy-binding ${PROJECT} \
    --member="serviceAccount:${SA_NAME}@${PROJECT}.iam.gserviceaccount.com" \
    --project ${PROJECT} \

As you can see, here we assign a role to the service account that allows it to modify the content of storage buckets.

Deploying Loki

We'll deploy Loki with Helm. There are various official charts from Grafana for this, but the one I prefer is Loki-distributed because it's simple to manage and allows for a scalable Loki deployment where each component can be scaled separately according to your needs.

First, we need to create a file containing the configuration options for the Helm chart. So run the following command to create a file named /tmp/loki.yaml or wherever you want with the following:

cat > /tmp/loki.yaml <<EOF
fullnameOverride: loki
enabled: false
  replicas: 3
  maxUnavailable: 1
  replicas: 3
  maxUnavailable: 1
    enabled: false
  gcsBucket: loki
  replicas: 3
  maxUnavailable: 1
    enabled: true
  replicas: 3
  maxUnavailable: 1
    enabled: true
    iam.gke.io/gcp-service-account: ${SA}
  config: |
      auth_enabled: false
        http_listen_port: 3100
            store: memberlist
          - {{ include "loki.fullname" . }}-memberlist
          - from: 2020-09-07
            store: boltdb-shipper
            object_store: gcs
            schema: v11
              prefix: loki_index_
              period: 24h
              store: memberlist
            replication_factor: 1
        chunk_idle_period: 10m
        chunk_block_size: 262144
        chunk_encoding: snappy
        chunk_retain_period: 1m
        max_transfer_retries: 0
          dir: /var/loki/wal
        enforce_metric_name: false
        reject_old_samples: true
        reject_old_samples_max_age: 168h
        max_cache_freshness_per_query: 10m
        retention_period: 2160h
        split_queries_by_interval: 15m
          bucket_name: {{ .Values.customParams.gcsBucket }}
          active_index_directory: /var/loki/boltdb-shipper-active
          cache_location: /var/loki/boltdb-shipper-cache
          cache_ttl: 24h
          shared_store: gcs
        max_look_back_period: 0s
        retention_deletes_enabled: true
        retention_period: 2160h
        align_queries_with_step: true
        max_retries: 5
        cache_results: true
            enable_fifocache: true
              max_size_items: 1024
              validity: 24h
        frontend_address: loki-query-frontend:9095
        log_queries_longer_than: 5s
        compress_responses: true
        tail_proxy_url: http://loki-querier:3100
        shared_store: gcs
        retention_enabled: true
        retention_delete_delay: 2h
        retention_delete_worker_count: 150
        compaction_interval: 10m
          type: local
            directory: /etc/loki/rules
            store: memberlist
        rule_path: /tmp/loki/scratch

Most of these settings should be straightforward if you are familiar with how Loki works.  If not, please read about its architecture as well as its configuration.

I just want to highlight one small bit:

    iam.gke.io/gcp-service-account: ${SA}

This annotation for the Kubernetes service account used by Loki's components is what actually enables the link between this service account and the GCP service account for Workload Identity to work (Kubernetes to GCP side).

Next steps:

- Create the namespace:

kubectl create ns ${NS}

- Enable Workload Identity for the k8s service account

gcloud iam service-accounts add-iam-policy-binding --project ${PROJECT} ${SA} \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:$PROJECT.svc.id.goog[logging/loki]"

- Add the Grafana Helm repository:

helm repo add grafana https://grafana.github.io/helm-charts/
helm repo update

- Install Loki distributed:

helm upgrade --install loki grafana/loki-distributed \
-f /tmp/loki.yaml \
--set customParams.gcsBucket=${BUCKET_NAME} \
--version 0.67.1 \
--namespace ${NS} 

- We also need to install Promtail, which is a DaemonSet that will send the logs from the nodes to Loki:

helm upgrade --install \
--namespace ${NS} \
--set "loki.serviceName=loki-query-frontend" \
--set "tolerations[0].operator=Exists,promtail.tolerations[0].effect=NoSchedule,promtail.tolerations[0].key=cloud.google.com/gke-spot" \
--set "config.clients[0].url=http://loki-gateway/loki/api/v1/push" \
promtail grafana/promtail

That's really it for a basic setup that stores Loki logs in a Google Cloud Storage bucket. Once all the pods are up and running, add a new Loki data source to Grafana with the URL http://loki-query-frontend.logging:3100 and confirm that the logs are coming through. 

To confirm that the logs are being stored in the bucket wait for 10-15 minutes then run 

gsutil ls gs://${BUCKET_NAME}/

You should see a json file as well as a couple of directories that store the log chunks and the index.

You can also wait for a while to collect some logs, then restart all the Loki components (you'll see a few deployments as well as a couple of StatefulSets and the Promtail DaemonSet) and confirm that you can still search the old logs, to confirm the data is retrieved from the bucket and not from local storage. In my case since it was the first time I set up Loki with a Google bucket I wanted to be 100% sure about this, so I basically tested by completely uninstalling Loki, and ensuring I could still search through old logs after reinstalling Loki again. It works nicely and so far I haven't seen any difference in query speed compared to persistent volumes. Perhaps this might change as the amount of logs grows, dunno yet.

If you already use Loki and have the same problem with persistent volumes that I had, or if you are looking to use Loki as a lightweight log collector in Kubernetes and also use GCP, then I hope this post was useful. Please let me know in the comments if you run into any issues.

© Vito Botta