When you first encounter a new codebase, it can be a daunting task to figure out where to start. What do you look for to understand it? To help you in this process, we have compiled a non-exhaustive list of questions that you can consider asking when starting out. Within each domain the questions are ordered by level of sophistication. The level to which a domain should be developed will depend on the maturity of the project or the organization.
Documentation
Documentation is vital for any codebase. It helps developers understand the code and its purpose.
- Is the repository’s goal clearly defined, describing its purpose and objectives?
- Are there comments within the code, including docstrings if applicable to the language, to provide additional context and explanations?
- Is there a README file that provides an overview of the project and its setup instructions?
- Does the README have a self-service onboarding section for new developers? It should help them get started quickly.
- Does the documentation include a description of the deployment process?
- Is there a style guide and/or naming convention document to ensure consistency in the codebase?
- Is documentation stored alongside the code in the repository?
CI/CD
Continuous Integration and Continuous Deployment (CI/CD) ensure a smooth, efficient, and stable development process.
- Does the codebase have a CI pipeline that automates the build, test, and deployment processes?
- Is pre-commit used to enforce code quality and naming standards before committing code changes?
- Is linting in place to ensure consistent code formatting and detect potential errors or bugs?
- Is the deployment to production automated, allowing for quick and reliable releases?
- How is the release process managed? Are releases tagged and well-documented to track changes and ease rollbacks if needed?
- Are there dbt checkpoints to enforce naming conventions and consistency in data transformations?
Security and Secrets
Security is a critical aspect of any codebase, and it is essential to assess the codebase practices regarding secrets management and access control.
- Are any hard-coded secrets, such as passwords or API keys, present in the code repository?
- Is the code repository connected to a secure secret manager, such as AWS Secrets Manager or HashiCorp Vault, to store and manage sensitive information?
Regarding access management:
- Is the project managed through Terraform, which provides infrastructure-as-code and allows for secure provisioning and management of resources?
- Is the principle of least privilege followed, ensuring that each user or service has only the necessary permissions required for their tasks?
- Are there security groups implemented?
Main Contributors
Understanding the main stakeholders of a codebase is crucial for effective communication, decision-making, and identifying potential risks. By assessing the ownership and activity within the codebase, you can determine the key individuals responsible for its management and maintenance.
- Git fame: Who has contributed the most code to the repository? The CLI tool Git Fame helps identify individuals with significant knowledge and ownership of the codebase.
- Is there recent activity in the repository, and what is the frequency of commits? Assessing the frequency of commits helps gauge the level of ongoing development and maintenance efforts.
- Request someone from the client to walk through the codebase. This will allow you to gain their perspective on the codebase’s intended purpose, potential challenges, and any specific considerations.
Code quality and structure
Evaluating the quality and structure of code is essential for maintaining an efficient codebase. By considering various aspects, such as clarity of goals, folder structure, code functionality, package management, code smells, and testing, you can gain insights into the overall health and maintainability of the codebase.
- Does the repository title accurately reflect its purpose?
- Does the project’s folder structure make sense, allowing for easy navigation and organization of code files?
- Can the code be successfully executed and produce the expected results?
- Is there a well-structured main.py file (for Python projects) that serves as the entry point and provides a clear overview of the code’s structure?
- Are the used packages up to date, and how are dependencies managed to ensure compatibility and security?
- Does the code exhibit any code smells, such as duplicated code, long methods, or excessive complexity?
- Are notebooks used to run in production? This is a practice that is best avoided.
- Are multiple languages used within a notebook, potentially leading to confusion and maintainability issues?
- Are there tests in place to verify the correctness of the code, and what percentage of the codebase is covered by tests?
- Is the code properly packaged, allowing for easy distribution and deployment?
Branching
Effective branching and version control practices are essential for collaboration, code stability, and maintaining a well-structured codebase. Is the codebase properly version controlled, allowing for easy tracking of changes and collaboration among developers?
- Are there any policies or rules regarding branching, such as commit permissions, required approvals, or specific approvers?
- Can developers commit directly to the main/default branch, or is a separate branch required for changes?
- Is approval from someone else required before merging a branch, and are there specific approvers?
- Are pull requests (PRs) focused on a single purpose, or are they large and comprehensive, potentially leading to difficulties in reviewing and merging?
- Have PRs been approved before CI checks (such as code review and automated tests) have succeeded, which are supposed to ensure code quality?
- Is it possible to merge branches before all CI checks have succeeded, potentially compromising the stability and quality of the codebase?
Separation of environments
Proper separation of environments and effective dependency management are crucial for maintaining a stable and efficient development process. By assessing the codebase’s environment setup and dependency practices, you can ensure that developers have the necessary tools and resources while minimizing the risk of issues in production.
- Are different environments defined, such as development, staging, and production? Do these environments align with the project’s requirements and make sense regarding their purpose?
- Is the supported environment clearly defined, including the operating system, Python versions, and other relevant dependencies? Is there support for development containers to ensure consistent and reproducible environments?
- Can developers directly run code on the production environment, or are there proper deployment pipelines and processes to ensure controlled releases?
- Are the dependencies for development purposes separated from those used in production, allowing for efficient management and ensuring that only necessary dependencies are included in the production environment?
DBT projects
Depending on the technology used in the repo, specific elements can be looked for. Here are some when working on a dbt project.
- Is slim-CI (dbt) being utilized to optimize the CI pipeline for data transformation processes?
- Is data in the CI pipeline limited to avoid unnecessary processing and reduce execution time (dbt)?