As GitLab’s database hosts sensitive information, using it unfiltered for analytics implies high security requirements. To help alleviate this constraint, the Pseudonymizer service is used to export GitLab’s data in a pseudonymized way.

Warning: This process is not impervious. If the source data is available, it’s possible for a user to correlate data to the pseudonymized version.

The Pseudonymizer currently uses HMAC(SHA256) to mutate fields that shouldn’t be textually exported. This ensures that:

  • the end-user of the data source cannot infer/revert the pseudonymized fields
  • the referential integrity is maintained


To configure the pseudonymizer, you need to:

  • Provide a manifest file that describes which fields should be included or pseudonymized (example manifest.yml file). A default manifest is provided with the GitLab installation. Using a relative file path will be resolved from the Rails root. Alternatively, you can use an absolute file path.
  • Use an object storage and specify the connection parameters in the pseudonymizer.upload.connection configuration option.

Read more about using object storage with GitLab.

For Omnibus installations:

  1. Edit /etc/gitlab/gitlab.rb and add the following lines by replacing with the values you want:

    gitlab_rails['pseudonymizer_manifest'] = 'config/pseudonymizer.yml'
    gitlab_rails['pseudonymizer_upload_remote_directory'] = 'gitlab-elt' # bucket name
    gitlab_rails['pseudonymizer_upload_connection'] = {
      'provider' => 'AWS',
      'region' => 'eu-central-1',
      'aws_access_key_id' => 'AWS_ACCESS_KEY_ID',
      'aws_secret_access_key' => 'AWS_SECRET_ACCESS_KEY'
    Note: If you are using AWS IAM profiles, be sure to omit the AWS access key and secret access key/value pairs.
    gitlab_rails['pseudonymizer_upload_connection'] = {
      'provider' => 'AWS',
      'region' => 'eu-central-1',
      'use_iam_profile' => true
  2. Save the file and reconfigure GitLab for the changes to take effect.

For installations from source:

  1. Edit /home/git/gitlab/config/gitlab.yml and add or amend the following lines:

      manifest: config/pseudonymizer.yml
        remote_directory: 'gitlab-elt' # bucket name
          provider: AWS
          aws_access_key_id: AWS_ACCESS_KEY_ID
          aws_secret_access_key: AWS_SECRET_ACCESS_KEY
          region: eu-central-1
  2. Save the file and restart GitLab for the changes to take effect.


You can optionally run the pseudonymizer using the following environment variables:

  • PSEUDONYMIZER_OUTPUT_DIR - where to store the output CSV files (defaults to /tmp)
  • PSEUDONYMIZER_BATCH - the batch size when querying the DB (defaults to 100000)
## Omnibus
sudo gitlab-rake gitlab:db:pseudonymizer

## Source
sudo -u git -H bundle exec rake gitlab:db:pseudonymizer RAILS_ENV=production

This will produce some CSV files that might be very large, so make sure the PSEUDONYMIZER_OUTPUT_DIR has sufficient space. As a rule of thumb, at least 10% of the database size is recommended.

After the pseudonymizer has run, the output CSV files should be uploaded to the configured object storage and deleted from the local disk.