Introduction
In today’s data-driven world, business intelligence tools are indispensable for organizations aiming to make informed decisions. Among the myriads of BI tools available, AWS QuickSight stands out as a scalable and cost-effective solution that allows users to create visualizations, perform ad-hoc analysis, and generate business insights from their data. However, as with any data analytics platform, managing changes to reports, dashboards, and data sets is a critical concern. Implementing a version control system for AWS QuickSight can significantly enhance collaboration, streamline development processes, and improve the overall governance of BI projects.
Version control systems (VCS) are essential tools in modern software development, offering a structured way to manage changes, track history, and facilitate collaborative efforts among teams. In the context of AWS QuickSight, a VCS can help maintain different versions of analysis and dashboards, provide rollback capabilities in case of errors, and offer a clear audit trail of modifications over time.
Solution Overview
Our solution for QuickSight resource version control comprises two main parts:
1. Publish Dashboard Pipeline
This Azure DevOps pipeline can be triggered by dashboard authors. When an author is satisfied with their work and wants to publish a dashboard, they must trigger the pipeline with the dashboard ID as a parameter. The pipeline then:
- Uses the QuickSight API to fetch the dashboard definition, including all underlying datasets, and saves them as JSON files.
- Automatically creates a pull request to the repository that contains dashboard and dataset definitions.
2. QuickSight Resources Repository Pipeline
Using custom Terraform modules, this pipeline manages all states (creations, updates, and deletions) of QuickSight resources. Let’s dive into the details of these steps.
Detailed Implementation
Fetching Dashboard Definitions
To fetch the dashboard definition, we use a Python script with the Boto3 package. After receiving the dashboard definition, the script traverses the list of related datasets and fetches their definitions as well. The JSON files are saved in two directories: dashboards and datasets. Each resource is organized by its display name and object ID, ensuring that name duplicates do not cause overwrites. The script also ensures that related permission files are created, maintaining both resource definitions and access policies in the repository.
def main(parameters):
boto3_session = authorize_dashboard_account(parameters)
dashboard_name_sanitized, dashboard_data = save_dashboard_definition(parameters, boto3_session)
datasets = get_datasets(dashboard_data)
for dataset_id in datasets:
save_dataset_definition(parameters, dataset_id, parameters["dashboardsDirectory"], boto3_session)
copy_permissions_file(parameters, dashboard_name_sanitized)
return dashboard_name_sanitized
Creating a Pull Request
The next step in the publish dashboard pipeline is to create a pull request with the changes made to the repository. A Bash script utilizes Git commands to create a branch with the changes. The Azure CLI (az command line tool) then creates the pull request and provides a link to the user for review.
PR_CREATE_RESPONSE=$(az repos pr create \
--project $(System.TeamProject) \
--repository $(Build.Repository.Name) \
--source-branch $BRANCH \
--squash true \
--delete-source-branch true \
--merge-commit-message "$DESCRIPTION" \
--detect true \
--title "$DESCRIPTION" \
--output json \
)
echo "Pull request created successfully"
Merge and deploy
After creating the pull request, the user can review changes, modify dashboard access policies, and approve the pull request. Once the changes are merged, Deploy Pipeline runs Terraform scripts creating revised and production ready visualisations. We have developed three separate modules: dashboard, dataset, and role_custom_permission. These modules use the ScottWinkler/shell provider to execute scripts based on the desired state of the object.
resource "shell_script" "dashboard" {
lifecycle_commands {
create = "python ${path.module}/scripts/dashboard_create.py"
read = "python ${path.module}/scripts/dashboard_read.py"
update = "python ${path.module}/scripts/dashboard_update.py"
delete = "python ${path.module}/scripts/dashboard_delete.py"
}
environment = {
AWS_ACCOUNT_ID = data.aws_caller_identity.current.account_id
DASHBOARD_MD5 = md5(file(var.dashboard_path))
DASHBOARD_PATH = var.dashboard_path
DASHBOARD_ID = jsondecode(file(var.dashboard_path)).DashboardId
}
}
Enforcing the Solution
To make sure users opt for our solution rather than manual dashboard publishing, we blocked manual publishing dashboards by creating a custom permission named PreventDashboardSharing and applying it to the authors’ IAM role.
Caveats and Considerations
While implementing this solution, we have encountered some caveats:
Dashboard Thumbnails: Dashboards created via the API do not have thumbnails, which might confuse users. This should be considered before deploying such a solution.
Unsupported Datasets: For some datasets, such as Jira dataset, we couldn’t retrieve the JSON definition. AWS does not provide a comprehensive list of supported dataset types.
API Limitations: In some cases, the dashboard definition retrieved from the API cannot be used as a payload to create its duplicate. Fortunately, the QuickSight create dashboard API allows setting ValidationStrategy to relax validation.
Conclusion
Implementing a version control system for AWS QuickSight dashboards ensures their reliability and provides a straightforward way to revert changes. By leveraging Azure DevOps and Terraform, we aligned the solution with our existing workflows and infrastructure. Despite some caveats, this approach offers a robust mechanism for managing QuickSight resources, ensuring that accidental deletions and changes can be easily managed and mitigated.