Background
On december 2nd AWS announced the general availability of CDK version 2. The main reason that the CDK team released version two was to deal with the so-called dependency hell. After upgrading CDK and all stable constructs are now combined in 1 package/module. Experimental modules still need to be installed one by one.
As working as a cloud consultant for an enterprise in the financial sector, being secure and patching your software, yes CDK can be seen as software too, is a must. So upgrading CDK to version 2 is inevitable. Especially as CDK version 1 will be retired as of 1st of June 2023, see maintenance policy AWS
So time to upgrade our code base as well. As there are multiple guides on the internet, including the official one from AWS, this blog will not take it into detail how a migration to CDK version 2 should take place. It describes the road I took to move from CDK version 1 to version 2.
Prerequisites for upgrading CDK
As this blog describes the patching of and upgrading to CDK, knowledge of CDK is required. Luckily AWS created workshops on this topic. Please check them out if you want to try out CDK and CDK pipelines.
Access to an AWS Account with proper rights to deploy resources is needed.
Installation
Just grab a cup of coffee and keep on reading…
Real World Scenario
At the current assignment a data platform is created in AWS. The platform will ingest data from several sources, process the data and make it available for end users. The services used are DataSync, S3, Glue, EMR, Athena and Managed Workflow Apache Airflow(MWAA). Besides these services, a lot of other so-called supporting services are used as well.
During the start of the project, CDK version 1 was the only major version available. When working with CDK version 1, every package/module which is needed for the CDK App, needs to be installed. For this application, which is a python CDK App, a requirements.txt is used.
The data platform itself is created in a so-called restricted AWS environment, which means that there is no public internet connectivity available. Every connection to the internet is routed over Direct Connects to on-premise firewalls, where the traffic is scanned and inspected.
Because there is no internet connectivity to the outside world to retrieve the build packages, configured in a requirements.txt file, an on-premise package manager has to be configured. And due to the restrictive policies in place, every package which is needed in the requirements.txt and dependencies, have the potential to be blocked by the package manager. As this data platform is created for an enterprise, to release such a package takes time. A ticket needs to be raised, communication with the Security team, Approving the package, it all takes time.
With CDK version 2 the downloading of packages is limited to a single package, the aws-cdk-lib package. Ok, actually you need the constructs package as well. This will make the requirements.txt file and the potential whitelisting of the aws-cdk-lib package easier.
Go Build
Old CDK version 1
So when the data platform application was built, it used the version cdk 1.136.0. The requirements.txt files looked like this:
aws-cdk.aws-athena==1.136.0 aws-cdk.aws-certificatemanager==1.136.0 aws-cdk.aws-codecommit==1.136.0 aws-cdk.aws-codebuild==1.136.0 aws-cdk.aws-codepipeline==1.136.0 aws-cdk.aws-codepipeline-actions==1.136.0 aws-cdk.custom-resources==1.136.0 aws-cdk.aws-datasync==1.136.0 aws-cdk.aws-dynamodb==1.136.0 aws-cdk.aws-ec2==1.136.0 aws-cdk.aws-elasticloadbalancingv2==1.136.0 aws-cdk.aws-elasticloadbalancingv2-targets==1.136.0 aws-cdk.aws-emr==1.136.0 aws-cdk.aws-glue==1.136.0 aws-cdk.aws-iam==1.136.0 aws-cdk.aws-lambda-event-sources==1.136.0 aws-cdk.aws-lambda-python==1.136.0 aws-cdk.aws-lambda==1.136.0 aws-cdk.aws-logs==1.136.0 aws-cdk.aws-mwaa==1.136.0 aws-cdk.aws-route53==1.136.0 aws-cdk.aws-route53-targets==1.136.0 aws-cdk.aws-s3==1.136.0 aws-cdk.aws-s3-deployment==1.136.0 aws-cdk.aws-s3-notifications==1.136.0 aws-cdk.aws-secretsmanager==1.136.0 aws-cdk.aws-ssm==1.136.0 aws-cdk.pipelines==1.136.0 jsonschema<=3.2.0 boto3 pytest -e .
As you can see above, each package is installed separately and could potentially be blocked by the package manager.
Our cdk.json file which holds information on CDK looked like this:
{ "app": "python app.py", "context": { "@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": true, "@aws-cdk/core:enableStackNameDuplicates": "true", "aws-cdk:enableDiffNoFail": "true", "@aws-cdk/core:stackRelativeExports": "true", "@aws-cdk/aws-ecr-assets:dockerIgnoreSupport": true, "@aws-cdk/aws-secretsmanager:parseOwnedSecretName": true, "@aws-cdk/aws-kms:defaultKeyPolicies": true, "@aws-cdk/aws-s3:grantWriteWithoutAcl": true, "@aws-cdk/aws-ecs-patterns:removeDefaultDesiredCount": true, "@aws-cdk/aws-rds:lowercaseDbIdentifier": true, "@aws-cdk/aws-efs:defaultEncryptionAtRest": true, "@aws-cdk/aws-lambda:recognizeVersionProps": true, "@aws-cdk/core:newStyleStackSynthesis": true, } }
Lastly our setup.py file:
import setuptools with open("README.md") as fp: long_description = fp.read() setuptools.setup( name="hashnode", version="0.0.1", description="Data Platform Application", long_description=long_description, long_description_content_type="text/markdown", author="Yvo van Zee", author_email="yvo@yvovanzee.nl", package_dir={"": "hashnode"}, packages=setuptools.find_packages(where="hashnode"), install_requires=[ "aws-cdk.core==1.136.0", ], python_requires=">=3.6", classifiers=[ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Programming Language :: JavaScript", "Programming Language :: Python :: 3 :: Only", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Topic :: Software Development :: Code Generators", "Topic :: Utilities", "Typing :: Typed", ], )
Move to CDK version 2
As this is a python based CDK project, packages can be installed in a virtual environment. A virtual environment is standard setup when you first install a CDK project. For the migration to CDK version two, an extra virtual environment to install our CDK version 2 packages in was created.
➜ hashnode git:(main) python3 -m venv .cdkv2 ➜ hashnode git:(main) source .cdkv2/bin/activate (.cdkv2) ➜ hashnode git:(main)
To make changes easier, a separate branch was created to track all changes.
(.cdkv2) ➜ hashnode git:(main) git checkout -b feature/cdk_version_2
Switched to a new branch ‘feature/cdk_version_2’
(.cdkv2) ➜ hashnode git:(feature/cdk_version_2)
As everything is now set, the fun stuff begins. The CDK version 2 packages need to be configured. I’ve chosen to move these packages to the setup.py, which is installed with the -e . option in the requirements.txt. So the requirements.txt file can be cleaned. The new version looks like:
boto3 pytest -e .
In the setup.py file the aws-cdk-lib and construct packages are added, and the aws-cdk.core==1.136.0 is removed. The version of the aws-cdk-lib package is pinned on version 2.2.0. The reason for this was that the lib package was put in quarantine mode in the first place. Due to the fact that the PyYaml package had a security vulnerability. When the PyYaml package was assessed by the security team, it got whitelisted.
import setuptools with open("README.md") as fp: long_description = fp.read() setuptools.setup( name="hashnode", version="0.0.1", description="Data Platform Application", long_description=long_description, long_description_content_type="text/markdown", author="Yvo van Zee", author_email="yvo@yvovanzee.nl", package_dir={"": "hashnode"}, packages=setuptools.find_packages(where="hashnode"), install_requires=[ "aws-cdk-lib==2.2.0", "constructs>=10.0.0,<11.0.0", ], python_requires=">=3.6", classifiers=[ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Programming Language :: JavaScript", "Programming Language :: Python :: 3 :: Only", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Topic :: Software Development :: Code Generators", "Topic :: Utilities", "Typing :: Typed", ], )
As described in the official migration manual of AWS, the cdk.json needs to be adjusted as well. A lot of options used in CDK version 1 are now obsolete. Which simplifies our cdk.json to:
{ "app": "python app.py", "context": { "@aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": false, "@aws-cdk/aws-cloudfront:defaultSecurityPolicyTLSv1.2_2021": false, "@aws-cdk/aws-rds:lowercaseDbIdentifier": false, "@aws-cdk/core:stackRelativeExports": false, } }
As this is for the configuration of the installation part, it is time to install the requirements.txt in our virtual environment:
(.cdkv2) ➜ hashnode git:(main) pip install -r requirements.txt Obtaining file:///Users/yvthepief/Code/hashnode (from -r requirements.txt (line 3)) Preparing metadata (setup.py) ... done Collecting boto3 Downloading boto3-1.20.35-py3-none-any.whl (131 kB) |████████████████████████████████| 131 kB 3.8 MB/s Collecting pytest Using cached pytest-6.2.5-py3-none-any.whl (280 kB) Collecting botocore<1.24.0,>=1.23.35 Downloading botocore-1.23.35-py3-none-any.whl (8.5 MB) |████████████████████████████████| 8.5 MB 6.9 MB/s Collecting jmespath<1.0.0,>=0.7.1 Using cached jmespath-0.10.0-py2.py3-none-any.whl (24 kB) Collecting s3transfer<0.6.0,>=0.5.0 Using cached s3transfer-0.5.0-py3-none-any.whl (79 kB) Collecting pluggy<2.0,>=0.12 Using cached pluggy-1.0.0-py2.py3-none-any.whl (13 kB) Collecting toml Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB) Collecting py>=1.8.2 Using cached py-1.11.0-py2.py3-none-any.whl (98 kB) Collecting packaging Using cached packaging-21.3-py3-none-any.whl (40 kB) Collecting attrs>=19.2.0 Downloading attrs-21.4.0-py2.py3-none-any.whl (60 kB) |████████████████████████████████| 60 kB 9.9 MB/s Collecting iniconfig Using cached iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB) Collecting aws-cdk-lib==2.2.0 Using cached aws_cdk_lib-2.2.0-py3-none-any.whl (57.6 MB) Collecting constructs<11.0.0,>=10.0.0 Downloading constructs-10.0.33-py3-none-any.whl (50 kB) |████████████████████████████████| 50 kB 10.1 MB/s Collecting publication>=0.0.3 Using cached publication-0.0.3-py2.py3-none-any.whl (7.7 kB) Collecting jsii<2.0.0,>=1.47.0 Downloading jsii-1.52.1-py3-none-any.whl (382 kB) |████████████████████████████████| 382 kB 11.6 MB/s Collecting python-dateutil<3.0.0,>=2.1 Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) Collecting urllib3<1.27,>=1.25.4 Downloading urllib3-1.26.8-py2.py3-none-any.whl (138 kB) |████████████████████████████████| 138 kB 8.9 MB/s Collecting pyparsing!=3.0.5,>=2.0.2 Using cached pyparsing-3.0.6-py3-none-any.whl (97 kB) Collecting typing-extensions<5.0,>=3.7 Using cached typing_extensions-4.0.1-py3-none-any.whl (22 kB) Collecting cattrs<1.11,>=1.8 Using cached cattrs-1.10.0-py3-none-any.whl (29 kB) Collecting six>=1.5 Using cached six-1.16.0-py2.py3-none-any.whl (11 kB) Installing collected packages: six, attrs, typing-extensions, python-dateutil, cattrs, urllib3, publication, jsii, jmespath, pyparsing, constructs, botocore, toml, s3transfer, py, pluggy, packaging, iniconfig, aws-cdk-lib, pytest, boto3, aac-cdp Running setup.py develop for aac-cdp Successfully installed aac-cdp-0.0.1 attrs-21.4.0 aws-cdk-lib-2.2.0 boto3-1.20.35 botocore-1.23.35 cattrs-1.10.0 constructs-10.0.33 iniconfig-1.1.1 jmespath-0.10.0 jsii-1.52.1 packaging-21.3 pluggy-1.0.0 publication-0.0.3 py-1.11.0 pyparsing-3.0.6 pytest-6.2.5 python-dateutil-2.8.2 s3transfer-0.5.0 six-1.16.0 toml-0.10.2 typing-extensions-4.0.1 urllib3-1.26.8
Configuring code to use new upgraded CDK version 2 libraries
As we now have set up our environment, it is time to actually change our imports as well. These are now still pointing to the aws_cdk.core libraries. The app.py file for example was using the following imports (CDKv1):
#!/usr/bin/env python3 import os from aws_cdk import core as cdk from pipeline_resources.cdkpipeline import CdkPipelineStack from pipeline_resources.repository import RepositoryStack from utilities.permission_boundary import PermissionBoundaryAspect from utilities.tagging import add_tags
To make use of CDK version 2, change the imports to:
#!/usr/bin/env python3 import os from aws_cdk import ( App, Aspects, Aws, Environment, ) from pipeline_resources.cdkpipeline import CdkPipelineStack from pipeline_resources.repository import RepositoryStack from utilities.permission_boundary import PermissionBoundaryAspect from utilities.tagging import add_tags
This looks like more lines of code, why so? Well this is because previously in CDK version 1 the core module was imported as a complet module. This meant that you could access everything inside the core module, such as App, Stack, Construct. With CDK version 2 in place a bit of cleaning code was done as well. So where we previously referred to the App module by cdk.App(), it is now done by just App().
For the pipeline stack the updated imports look like the following:
from aws_cdk import ( Aspects, Stack, aws_codecommit as codecommit, aws_codebuild as codebuild, aws_ec2 as ec2, aws_iam as iam, aws_kms as kms, aws_secretsmanager as secretsmanager, pipelines, ) from constructs import ( Construct, ) class CdkPipelineStack(Stack): def __init__(self, scope: Construct, construct_id: str, named_environments: dict, **kwargs) -> None: super().__init__(scope, construct_id, **kwargs) <SNIPPIT>
As you can see above we basically import two modules (Aspects and Stack) which were previously in the core module. Furthermore, the module Construct from constructs is used as well. It is in the class definition of the stack (scope). Basically everything which was cdk.xxx or core.xxx, depending on your type of import, is replaced by the direct import from the aws_cdk module. All other imports such as aws_ec2 and ec2 remained untouched.
Some handy documentation to check what should end up where is the RFC 0192 of CDK.
Testing the CDK version 2, synthesizing
The first try synthesizing the templates failed. This was because with all the SecureBucket resources (demo on how to create a secure bucket construct can be found in this blog), a KMS key was created as well. For this KMS key we used the trust_account_identities value set to True. But this is not supported for the KMS key in CDKv2 anymore.
So second try, yet another failure. This time on the VPC selection of the isolated subnet. In the Glue Stack one of the security requirements is that we create a CfnConnection for Glue so it uses the VPC. We select the first private subnet ID:
subnet_id=vpc.select_subnets( subnet_type=ec2.SubnetType.ISOLATED ).subnet_ids[0]
This fails with an AttributeError
File "/Users/yvthepief/Code/Hashnode/devtest/repository/application_resources/Ingestion/glue.py", line 262, in __init__ private_subnets = vpc.vpc.select_subnets(subnet_type=ec2.SubnetType.ISOLATED, one_per_az=True) File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/enum.py", line 429, in __getattr__ raise AttributeError(name) from None AttributeError: ISOLATED Subprocess exited with error 1
What this actually means is that ISOLATED is not used anymore. This is renamed in CDK version 2 to PRIVATE_ISOLATED. Other options are PUBLIC or PRIVATE_WITH_NAT.
Finally, the third time’s a charm! CDK synthesises correctly, which means all templates in the cdk.out folder are now rendered via CDK version 2.
Last thing to do was check in the code in the newly created branch. Create a proper commit message, following a pull request. This pull request was reviewed by a colleague, the so-called 4-eye method, and merged with the main branch and let CDK pipelines work its magic.
Try it Yourself
As the code is from an enterprise repository, it is not allowed for me to share it here.
If you want to try it yourself, you can use my cdkpipeline_with_cfn_nag repository in GitHub. This is a CDK version 1 Application with CDK pipelines; a blog on this can be found here. Clone or Fork it, and try to upgrading CDK to version 2 following the steps in this blog.