Blog

Implementing a Version Control System for AWS QuickSight

24 Oct, 2024
Xebia Background Header Wave

Introduction

In today’s data-driven world, business intelligence tools are indispensable for organizations aiming to make informed decisions. Among the myriads of BI tools available, AWS QuickSight stands out as a scalable and cost-effective solution that allows users to create visualizations, perform ad-hoc analysis, and generate business insights from their data. However, as with any data analytics platform, managing changes to reports, dashboards, and data sets is a critical concern. Implementing a version control system for AWS QuickSight can significantly enhance collaboration, streamline development processes, and improve the overall governance of BI projects.

Version control systems (VCS) are essential tools in modern software development, offering a structured way to manage changes, track history, and facilitate collaborative efforts among teams. In the context of AWS QuickSight, a VCS can help maintain different versions of analysis and dashboards, provide rollback capabilities in case of errors, and offer a clear audit trail of modifications over time.

Solution Overview

Implementing a Version Control System for AWS overview

Our solution for QuickSight resource version control comprises two main parts:

1. Publish Dashboard Pipeline

This Azure DevOps pipeline can be triggered by dashboard authors. When an author is satisfied with their work and wants to publish a dashboard, they must trigger the pipeline with the dashboard ID as a parameter. The pipeline then:

  1. Uses the QuickSight API to fetch the dashboard definition, including all underlying datasets, and saves them as JSON files.
  2. Automatically creates a pull request to the repository that contains dashboard and dataset definitions.

Implementing a Version Control System for AWS QuickSight pipeine 1

2. QuickSight Resources Repository Pipeline

Using custom Terraform modules, this pipeline manages all states (creations, updates, and deletions) of QuickSight resources. Let’s dive into the details of these steps.

Detailed Implementation

Fetching Dashboard Definitions

To fetch the dashboard definition, we use a Python script with the Boto3 package. After receiving the dashboard definition, the script traverses the list of related datasets and fetches their definitions as well. The JSON files are saved in two directories: dashboards and datasets. Each resource is organized by its display name and object ID, ensuring that name duplicates do not cause overwrites. The script also ensures that related permission files are created, maintaining both resource definitions and access policies in the repository.

def main(parameters): 
   boto3_session = authorize_dashboard_account(parameters) 
   dashboard_name_sanitized, dashboard_data = save_dashboard_definition(parameters, boto3_session) 
   datasets = get_datasets(dashboard_data) 

   for dataset_id in datasets: 
    save_dataset_definition(parameters, dataset_id, parameters["dashboardsDirectory"], boto3_session) 
    copy_permissions_file(parameters, dashboard_name_sanitized) 

   return dashboard_name_sanitized 

Creating a Pull Request

The next step in the publish dashboard pipeline is to create a pull request with the changes made to the repository. A Bash script utilizes Git commands to create a branch with the changes. The Azure CLI (az command line tool) then creates the pull request and provides a link to the user for review.

PR_CREATE_RESPONSE=$(az repos pr create \ 
 --project $(System.TeamProject) \ 
 --repository $(Build.Repository.Name) \ 
 --source-branch $BRANCH \ 
 --squash true \ 
 --delete-source-branch true \ 
 --merge-commit-message "$DESCRIPTION" \ 
 --detect true \ 
 --title "$DESCRIPTION" \ 
 --output json \ 
) 

echo "Pull request created successfully" 

Merge and deploy

After creating the pull request, the user can review changes, modify dashboard access policies, and approve the pull request. Once the changes are merged, Deploy Pipeline runs Terraform scripts creating revised and production ready visualisations. We have developed three separate modules: dashboard, dataset, and role_custom_permission. These modules use the ScottWinkler/shell provider to execute scripts based on the desired state of the object.

resource "shell_script" "dashboard" { 
 lifecycle_commands { 
   create = "python ${path.module}/scripts/dashboard_create.py" 
   read   = "python ${path.module}/scripts/dashboard_read.py" 
   update = "python ${path.module}/scripts/dashboard_update.py" 
   delete = "python ${path.module}/scripts/dashboard_delete.py" 
 } 

 environment = { 
   AWS_ACCOUNT_ID = data.aws_caller_identity.current.account_id 
   DASHBOARD_MD5  = md5(file(var.dashboard_path)) 
   DASHBOARD_PATH = var.dashboard_path 
   DASHBOARD_ID   = jsondecode(file(var.dashboard_path)).DashboardId 
 } 
} 

Enforcing the Solution

To make sure users opt for our solution rather than manual dashboard publishing, we blocked manual publishing dashboards by creating a custom permission named PreventDashboardSharing and applying it to the authors’ IAM role.

Caveats and Considerations

While implementing this solution, we have encountered some caveats:

  • Dashboard Thumbnails: Dashboards created via the API do not have thumbnails, which might confuse users. This should be considered before deploying such a solution.

  • Unsupported Datasets: For some datasets, such as Jira dataset, we couldn’t retrieve the JSON definition. AWS does not provide a comprehensive list of supported dataset types.

  • API Limitations: In some cases, the dashboard definition retrieved from the API cannot be used as a payload to create its duplicate. Fortunately, the QuickSight create dashboard API allows setting ValidationStrategy to relax validation.

Conclusion

Implementing a version control system for AWS QuickSight dashboards ensures their reliability and provides a straightforward way to revert changes. By leveraging Azure DevOps and Terraform, we aligned the solution with our existing workflows and infrastructure. Despite some caveats, this approach offers a robust mechanism for managing QuickSight resources, ensuring that accidental deletions and changes can be easily managed and mitigated.

Grzegorz Bach
Data Engineer based in Gdańsk, Poland, with 10 years of software development experience and over 4 years specializing in Big Data tools. Skilled in AWS, Snowflake, and Terraform, he excels in designing scalable data platforms and enjoys team collaboration and knowledge sharing.
Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts