Category: Featured

Software dependencies: Types, management & solutions

Posted on March 18, 2025 by Eldad Palachi - Featured

If you’re building software, chances are you have various software dependencies within your code base. These dependencies are on external components like libraries, frameworks, or modules that are part of almost every modern application. Just like how each puzzle piece is crucial to completing the picture, every dependency is essential for building and running your software application efficiently.

The modern software landscape is built upon a vast ocean of reusable code, enabling developers to speed up development and leverage existing high-quality software components maintained by a community of experts. However, these dependencies may also introduce complexity and require maintenance over time. Managing dependencies well is critical for application stability, security, and performance. A single poorly managed dependency can bring down the entire system.

This blog will explore the essentials of software dependencies, including their types, management techniques, and tools. Ideal for both seasoned developers and newcomers, mastering dependency management is crucial for efficient and secure software development.

What is a software dependency?

At its core, a software dependency describes a relationship where one piece of software relies on another to function properly. Think of them as components added to your application to enable its functionality. Modern software applications rarely exist in isolation, typically building on existing code, libraries, and frameworks, incorporating essential functionality to avoid starting from scratch.

To illustrate how this can work, consider a web application built using a framework like React. The use of React is a dependency; without it, the application won’t work. Another example would be a Python script that performs complex mathematical operations using the NumPy library. NumPy provides optimized computation capabilities. Developers don’t need to build these components from scratch. Instead, they include the dependency and use its functionality within their app.

Dependencies allow developers to focus on their application’s unique parts rather than re-invent standard and common functionality. They enable code reuse and speed up development cycles and standardization. However, using dependencies introduces a chain of interconnected components that need to be managed.

Dependencies include:

Libraries: Collections of pre-written code that provide specific functionality.
Frameworks: Structural foundation that provides a template for building applications.
Modules: Self-contained units of code that provide specific features.
APIs (Application Programming Interfaces): Interfaces that allow different software components to talk to each other.

Essentially, any external component an application relies on to work is a software dependency. Understanding this fundamental concept is the first step to managing dependencies effectively.

How do you identify software dependencies?

The first step in dependency management is identifying them. Without this knowledge, you risk version conflicts, security vulnerabilities, and runtime errors. Understanding your dependencies—their uses and potential issues—is crucial for efficiency and stability.

Identifying dependencies can vary depending on the programming language, development environment, and tools used. Several common ways exist, from highly straightforward to less standard approaches. Let’s take a look at some of them.

Package and build managers

Most modern programming languages have package and build managers that automate the process of installing and managing dependencies. For example, Node.js has npm (Node Package Manager) or yarn, Python has pip, Java has Maven or Gradle, and .NET has NuGet. These tools use a manifest or build files (like package.json, requirements.txt, or pom.xml) that specify all the project’s direct dependencies. By looking at this file, developers can quickly see the libraries and frameworks their application relies on. Some code IDEs visualize these dependencies and list all the derived/indirect dependencies brought by the direct dependencies

Software Composition Analysis

Software Composition Analysis (SCA) tools identify and manage security risks, outdated dependencies and licensing issues in your software. Modern applications rely on open source components but without oversight, these can introduce vulnerabilities and legal risks. SCA tools scan your projects to find all direct and transitive dependencies, cross check them against vulnerability databases and highlight the risks. They also ensure open source license compliance and recommend secure and up to date versions of libraries. By hooking SCA into your development workflow using tools like Snyk, Black Duck, Sonatype Nexus, and GitHub Dependabot, you can secure your applications proactively and reduce risk with minimal overhead to your development effort.

Manual inspection

While automated tools are helpful, manual inspection of the codebase is still essential. Reviewing import statements, library calls, and project documentation can provide valuable insights into the dependencies on which your application relies and in which context. This is especially important for identifying unnecessary dependencies that can be removed and simplify the implementation. It also helps, in the case of relying on a manifest file, to see which dependencies are actually used. For instance, in Node, to use a dependency in a component, you have to import it explicitly. Inspecting the code will allow you to verify this since manifest files may include dependencies that aren’t actually used upon manually inspecting the code.

Build tool outputs

Build tools typically list resolved dependencies in their output, revealing direct and transitive dependencies used in your application’s construction. However, this method can be unreliable as builds may omit dependencies included in prior iterations, complicating the identification of newly installed dependencies. This approach is least advisable and should be considered a last resort, such as when source code is inaccessible but build logs are available.

Developers can use one or more of these methods to get a complete picture of their application’s dependencies. That said, not all dependencies are equal or straightforward.

Static and dynamic analysis

Architectural observability pioneer, vFunction, can visualize the dependencies between software components within Java and .Net monolithic applications, including the details of which classes are using the dependencies and if there are any circular dependencies (which is a design flaw that must be corrected).

Dynamic analysis identifies dependencies in runtime, while static analysis composes a full dependency graph between the classes. Static and dynamic analysis methods complement each other, as some components might not be used at runtime. Some dependencies might be detected at runtime even if the binaries provided to the static analysis are missing or some software elements used by the application are generated at runtime.

Below is an extreme example from a highly complex Java monolithic application that is partitioned to many JAR (Java ARchive) libraries. Every node in the graph is a JAR file and every line is a dependency between the JAR files. The orange lines highlight circular dependency paths which are nontrivial bidirectional relationships in this case (as seen in the graph). A circular dependency is a critical architectural design flow creating complexity which may lead to build and runtime malfunctions that are hard to diagnose and fix, as well as a maintenance issue due to the tight coupling between the library components.

Hovering over a line in the below graph lists the dependencies between classes across the two jar files.

Graph in vFunction highlighting dependencies between JAR files.

Types of software dependencies

There’s a reason why they call it “dependency hell.” Modern software generally contains a complex web of dependencies, where each dependency can recursively rely on others, creating a multi-tiered structure that is far from ideal. They come in various forms, each with its own characteristics and implications for your project. Understanding these distinctions is crucial for management and anticipating how dependencies may impact your project.

Direct dependencies

Direct dependencies are the libraries or packages your project explicitly declares and imports. They are the components you’ve consciously chosen to include in your application. For example, if you’re building a web application with Node.js and you use the Express package for routing, Express is a direct dependency. Direct dependencies are the easiest to identify and manage in most modern languages or frameworks, as they are usually listed in the project’s manifest file (e.g., package.json, requirements.txt)..

Transitive dependencies

Transitive dependencies, or indirect dependencies, are the libraries that your direct dependencies rely on to function (a direct reference to dependency hell that we discussed earlier). For instance, if Express relies on the debug package for logging, debug is a transitive dependency of your application. Transitive dependencies can create a complex web, making it hard to understand the full scope of your application’s dependencies. They can introduce security vulnerabilities or version conflicts that slip under the radar if not managed carefully. This is where tooling can help determine if transitive dependencies introduce risk or security issues.

Development dependencies

Development dependencies are the tools and libraries required in the development process but not for the application to run in production. Examples include testing frameworks (e.g., Jest, JUnit), linters (e.g., ESLint, PyLint), and build tools (e.g., Webpack, Gradle). These dependencies help improve code quality, automate testing, and streamline the development workflow. They are usually separated from production dependencies to minimize the size and complexity of the deployed application.

Runtime dependencies

Runtime dependencies are the libraries and packages required for the application to run in the production environment. These dependencies provide the core functionalities the application relies on. Examples include database drivers, web frameworks, and networking libraries. Managing runtime dependencies is critical for application stability and performance.

Optional dependencies

Optional dependencies are dependencies that enhance the application’s functionality but are not strictly required for it to run. They provide additional features or capabilities that users can choose to enable. For example, a library might offer optional support for a specific file format, database, or operating system. Optional dependencies allow developers to provide a more flexible and customizable application that only includes these dependencies if required for their specific use case/build.

Platform dependencies

Platform dependencies are specific to your application’s operating system or environment. These dependencies may include system libraries, device drivers, or platform-specific APIs. Managing platform dependencies can be challenging as they often require careful configuration and testing across different environments. Modern portable languages and containerization reduce, but do not completely eliminate these issues that arise in specific scenarios.

Most dependencies fall somewhere within this spectrum. Understanding the different types of dependencies allows developers to make informed decisions about dependency management, ensuring the dependencies used are needed but also ensuring they are stable, secure, and performant. Now let’s look at some common ways to manage them within an application’s code base.

How do you manage software dependencies?

Managing software dependencies is not just about installing the libraries and going forward with adding anything and everything to your project; it has to be more pragmatic. Part of the pragmatism of dependency management is about having a process to ensure the dependencies used within your application are stable, secure, and maintainable throughout its life. Effective dependency management combines best practices, tools, and vigilance (especially regarding security and performance). Let’s look at some common best practices for managing dependencies.

Use a package manager

Package managers are essential tools for managing dependencies. They automate the installation, update, and removal of dependencies, as well as maintain consistency across different development environments. Package managers also resolve version conflicts and have a centralized repository for dependencies. Most languages have a preferred package manager, so getting started is generally not too hard; choose a package manager that fits your language and project (e.g., npm for Node.js, pip for Python, Maven for Java) and begin using it.

Pin versions

Version pinning specifies the exact version of each dependency your application requires. This prevents changes due to automatic updates, which can introduce breaking changes or compatibility issues. By pinning versions, you ensure your application always uses the tested and compatible versions of its dependencies. But review and update pinned versions periodically to get the bug fixes and security patches, potentially bumping the pinned version up while also performing regression testing for compatibility.

Scan dependencies

Use security scanning tools to scan your dependencies for known vulnerabilities regularly. These tools scan your project’s dependencies against vulnerability databases and alert you to potential security risks. This proactive approach helps identify and fix security issues before they can be exploited. Integrate dependency scanning into your CI/CD pipeline to catch vulnerabilities early in development. Integrating scans into each commit process ensures early detection of issues, allowing developers to address problems before they reach production. This is a big component of the “shift-left” movement and mindset.

Keep dependencies up-to-date

Keeping your dependencies up-to-date is crucial for getting bug fixes, performance improvements, and security patches. Tools like Dependabot can be really helpful in automating this. Remember that updating dependencies can also introduce risks, as new versions may introduce breaking changes. Have a clear process for updating dependencies, including testing and rollback mechanisms, to minimize the risk of downtime. You likely also want to consider using semantic versioning to understand the impact of updates as you roll out newer versions of your app with update dependencies.

Isolate dependencies

Use virtual environments or containers to isolate dependencies for different projects. This prevents conflicts between dependencies that may have different versions or requirements. Virtual environments create isolated spaces where each project has its own set of dependencies, so changes to one project won’t affect the others. Containers provide a more comprehensive isolation mechanism, packaging the application and its dependencies into a portable unit.

Document everything

Document your project’s dependencies, including versions, purposes, and specific configuration requirements. This will be a valuable resource for developers to understand the application’s dependencies and troubleshoot issues. To make things even easier, you can usually access or even generate certain sorts of documentation right from your package manager itself. For instance, you could use npm docs or pip show to help access and generate documentation from your package manager command line interface (CLI).

vFunction provides a report of standard Java and .Net libraries to detect the usage of aging frameworks as seen in the table below. A library or framework is marked aging if the version has an older non-minor version or it is more than two years old.

vFunction generates a report identifying aging Java and .Net libraries.

Do a dependency audit

Lastly, regularly audit your project’s dependencies to remove unused or outdated ones. Unused dependencies increase the size and complexity of your application, while outdated dependencies can introduce security vulnerabilities. Use depcheck or pip-check to remove unused dependencies from your project.

Following these best practices and using the right tools will give you a well-rounded dependency management process. Since dependencies are crucial to how your software functions, it makes sense to closely monitor which dependencies are used and how. Want a bit more clarity on what dependencies are? Let’s examine some examples next.

Software dependencies examples

Let’s see some concrete examples of software dependencies across different programming ecosystems:

Web application (Node.js)

Imagine a modern web application built with Node.js. To build such an application, we would mostly likely use one or more of the following dependencies:

Express.js: A web framework for routing, middleware, and HTTP requests.
MongoDB Driver: A library that interacts with a MongoDB database.
React: A JavaScript library to build user interfaces with a component-based approach.
Axios: A library to make HTTP requests to external APIs or services.
JWT (JSON Web Token): A library to implement authentication and authorization.

Data analysis script (Python)

When using Python to analyze data, most developers and data scientists use various dependencies to help them. Here are a few common ones you’d most likely see in a data analysis script:

NumPy: A fundamental library for numerical computing with array objects and mathematical functions.
Pandas: A data manipulation and analysis library with DataFrames for efficient data handling.
Matplotlib: A library to create static, interactive, and animated visualizations.
Scikit-learn: A machine learning library with tools for classification, regression, clustering, and dimensionality reduction.

Mobile application (Android – Java/Kotlin)

For mobile apps, using a vast amount of dependencies is also the norm. For instance, in Android mobile app development, you might find:

Retrofit: A type-safe HTTP client.
Gson: A library to convert Java objects to JSON and vice versa.
Glide: An image loading and caching library.
Room Persistence Library: An abstraction layer over SQLite.

These examples show how dependencies are the foundation of various software projects. Although these are very simple examples, it does demonstrate how many of the core functions we bring into our applications are handled by dependencies. So, how do you bring these dependencies into your project? As mentioned before, this will likely require some dependency management tools to handle.

Software dependency management tools

If your project is using dependencies (which almost every one will be), managing them is key to any software project. Luckily, there are many tools to help with that, automating tasks and providing insights and order to the complex world of dependencies. Here are the most popular and widely used dependency management tools across a variety of common languages:

npm (Node Package Manager)

Language: JavaScript

Description: The default package manager for Node.js, npm, gives you access to a massive registry of JavaScript packages to easily find and install the dependencies your project needs.

Features:

Simple package installation and management.
Version management (specifying ranges or pinning to specific versions).
Automatic dependency resolution.
Ability to publish your own packages.

pip (Python Package Installer)

Language: Python

Description: Pip is the standard package manager for Python, and it simplifies the installation and management of Python packages from the Python Package Index (PyPI) and other repositories.

Features:

Straightforward package and dependencies installation.
Tools for virtual environments to isolate project dependencies.
Supports different versioning schemes.
Ability to install from source code or wheels.

Maven

Language: Java

Description: A build automation tool that excels at dependency management, primarily used for Java projects. It uses a declarative approach with dependencies documented within a pom.xml file.

Features:

Central repository (Maven Central) for easy access to dependencies.
Standardized build lifecycle with phases for compile, test and package.
Extensive plugin ecosystem.
Support for multi-module projects.

Gradle

Language: Java, Kotlin, Groovy, and others

Description: A flexible and highly customizable build automation tool that also provides dependency management. It uses a Groovy-based DSL to define builds and dependencies.

Features:

Incremental builds for performance.
Support for many languages and platforms.
Powerful dependency management with support for multiple repositories.
Extensible with plugins and custom tasks.

Chances are that if you are already working in these languages, you are already using these tools. These tools, each with its strengths and focus, make modern software development much easier with their approach to dependency management. Although dependencies can be added to a project manually, these tools make this process much easier and scalable.

vFunction: Dependency management at the architectural level

Traditional dependency management tools typically focus on individual packages and libraries. In contrast, vFunction takes a broader approach, managing dependencies at the architectural level. To build quality software, clean architecture is essential—it has far-reaching impacts across the entire application. This means reducing unnecessary architectural dependencies. As an architectural observability platform, vFunction enables teams to visualize, analyze, and manage dependencies within the context of the overall application architecture.

vFunction’s key features for dependency management are:

Visualizing architectural dependencies: vFunction generates interactive diagrams that map out the dependencies between different components and services in your application. This gives you a clear and complete view of how different parts of your system interact.
Detecting architectural drift: As applications evolve, their architecture can drift from its original design, often because of new planned or unplanned dependencies or changes in how components interact. vFunction detects this drift and helps you maintain architectural integrity.
Analyzing the impact of changes: Before making changes to your application, vFunction allows you to analyze the potential effect of changes on dependencies and the architecture. This helps you avoid unintended consequences and make changes safely.
Managing technical debt: vFunction identifies and helps manage technical debt related to dependencies, including outdated libraries and complex issues like circular dependencies between services and components. This insight allows you to prioritize refactoring and improve your application’s long-term maintainability.

vFunction goes beyond simply showing the dependencies within your code. It illustrates how individual modules and internal dependencies are connected and function within the broader system. This holistic approach gives teams a complete view of code and architectural dependencies, highlighting their impact on the overall application architecture. As a result, vFunction empowers teams to make more informed decisions, reduce risk, and enhance the health and maintainability of their applications.

Conclusion

Software dependencies are the building blocks of modern software development. They let developers reuse code, speed up development, and build complex applications more efficiently. Managing these dependencies is crucial to application stability, security, and maintainability.

Throughout this blog, we’ve covered the different aspects of software dependencies, from their definition and types to the challenges and best practices for managing them. We’ve looked at traditional package managers like npm, pip, Maven, and Gradle, as well as vFunction, which offers an architectural perspective on your projects’ dependencies.

Ready to take control of your software architecture and dependencies?

Try vFunction for free and experience the power of architectural observability. Gain a deeper understanding of your application’s dependencies, identify potential risks, and make informed decisions to improve the health and maintainability of your software.

The true measure of software quality

Posted on February 28, 2025 by Amir Rapson - Featured

How to measure software quality: Architecture or code

This piece originally appeared in AWS ComSum Quarterly, an independent publication dedicated to knowledge-sharing within the AWS community. For this edition, Amir Rapson, CTO of vFunction, guest-edited the issue to highlight a critical truth: software quality isn’t just about code—it’s about architecture.

The principles of good software are a popular discussion topic at vFunction. Improving software quality is at the heart of our mission—whether for cloud migration, cost reduction, or simply building better software. Our focus, architectural observability, centers on improving applications via software architecture. You can have bad software built entirely of good code, because software quality isn’t just about clean syntax or following patterns. Software architecture is the crucial element of software quality.

Yet, we still encounter software architectures that make us question our principles and spark discussions like “But why is this bad?” or “How would you fix this?”

This article sets the groundwork for what makes software truly good—not just at the code level but at the architectural level. We hope sharing our perspective provides valuable insights and meaningful discussions on the essential elements of good software.

What makes one piece of software better than another?

To define software quality, let’s start with some common ground. Imagine that every piece of software meets its current requirements and satisfies its current user needs—it’s easy if you try. With that assumption, we can easily differentiate between good software and better software. Now, suppose this software is reliable under the current conditions—it isn’t hard to do—and even performs well with the current resources—I wonder if you can...

Alas, software is never static. Requirements, users, and usage patterns and conditions are always subject to change. What is considered “ok” in terms of operational costs and performance today may be the cause of a major headache tomorrow.

Assuming no software is future-proof—if it were, why wasn’t it released sooner?—then, software quality in this imaginary world can be defined by how easily it adapts:

Functionality & usability: Can the software be easily modified, updated, and repaired to meet new requirements and usage patterns?
Security & portability: Can updates to security vulnerabilities be made quickly without risking the stability of the software? Can the software run in new environments and platforms with minimal changes?
Reliability & quality: Can the software perform reliably despite constant changes, with little impact on other components? Will it deploy efficiently with changes?

Good architecture, good code, and their contribution to good software

A fabulous article from 2012, “In Search of a Metric for Managing Architectural Technical Debt,” describes a simple model to calculate architectural technical debt. By defining this metric, the authors imply that good architecture is one with minimal architectural technical debt. The article states that the “cost of reworking” a software element is the sum of the cost of reworking all its dependent elements. This means good architecture minimizes dependencies—whether code classes, modules, or libraries. Fewer dependencies ensure when we modify a piece of software, we only need to rework a small set of elements vs. a cascade of rework across the system.

Returning to the definition of better software—we can now say that better architecture makes for better software.

Good code is another matter. A class with good code can minimize the rework effort for that specific class. With readability, coding standards, and code reviews, a class can be maintained more easily. The effect of good code is primarily limited to its class. In other words, you can have bad software built with good code.

For further exploration, you may want to read “Is your software architecture as clean as your code?” which explains the principles of good architecture that lead to minimizing dependencies, and “Measure It? Manage It? Ignore It? Software Practitioners and Technical Debt” which emphasizes the importance of architectural technical debt as a source of technical debt.

AWS ComSum is an independent, community-driven initiative that fosters collaboration among AWS users, helping them navigate the ever-evolving cloud landscape.

Quality practices that support software quality

Besides the key element of having good software architecture, other practices contribute to the quality of software and reduce the cost of reworking or adding new capabilities to an existing software application:

Clear requirements & design: Clear and precise requirements and a detailed blueprint of the potential dependent elements ensure that engineers understand the task at hand. Bad requirements and design will lead to confusion and possible rework.
Robust testing & QA: Conducting various levels of testing (unit, integration, system, load, and acceptance testing) ensure the software functions correctly and meets quality standards. The more automated and complete the testing, the easier it is to minimize the time to release software. Less dependencies for the changed element allow QA to focus on the specific functionality rather than conducting acceptance on the entire system.
Automated CI/CD, security testing and tooling: Automating the process of code integration, testing, vulnerability testing, and deployment to ensure rapid and reliable delivery of software updates.

With these elements, software engineers can develop high-quality software that meets user expectations and performs reliably. However, while these practices support a good software development lifecycle, they do not ensure that the software itself is of good quality.

How to measure software quality

If you’ve made it this far, then you may agree that evaluating software quality predominantly involves assessing its architecture. Here are some approaches and metrics to evaluate it:

Architectural peer reviews: Systematically evaluate architectural decisions and trade-offs by conducting reviews with architects and engineers. Assess the architecture’s quality, feasibility, and alignment with requirements.
Documentation quality: Ensure the architecture is well-documented, including diagrams and design decisions to support understanding, maintenance, and evolution as the software changes. Strive, through the use of tools, to keep your documentation accurate and up to date with minimal overhead.
Coupling & cohesion: Measure the degree of dependency between components (coupling) and the degree to which components are related (cohesion). At vFunction, we call this metric “exclusivity,” which measures the percentage of classes or resources required solely for the component.
Technical debt & architectural complexity: Measure the amount of work required to fix issues in the architecture. For instance, you can count story points in your refactoring backlog or sum up weighted scores on your to-dos.
Modularity: Assess the degree to which the architecture supports modularization and reuse of components. Look for ways to ensure and monitor modularity through compile-time boundaries and runtime monitoring.
Code complexity & code churn: Use existing tools to measure cyclomatic complexity and maintainability. Although certain parts of the code need to be complex, if all your code is complex, you have a problem. Code churn is the percentage of code that’s added and modified over time. If the same code changes all the time, or too many classes get checked in at once, there are probably too many dependencies in the code.

By employing these approaches and metrics, you can effectively assess the quality of your software architecture and ensure it can support future changes to the application.

Other measures that support software quality

As you consider how to measure software architecture quality in your organization, the following measures serve as a safety net for software quality. Ironically, these are measured more often than software quality itself. Here are some common metrics and methods:

Test coverage: The percentage of code executed by the test cases, ensuring that all parts of the code are tested. Think about mimicking production behavior and not just covering every line in your code, since the same code can be executed in different contexts by your real users. In vFunction, we call this “Live Flow Coverage.”
Reliability metrics (MTBF, MTTR): The average time between failure and the time taken to repair the software following a failure. Application performance monitoring (APM) tools are built to monitor this.
Performance issues and error rates: The frequency of performance glitches and user errors while interacting with the software. This, too, can be monitored with APM tools.

GenAI, code quality, and its contribution to software quality

GenAI tools enhance code quality by automating and assisting with various aspects of coding, including:

Code generation: Generating boilerplate code, repetitive code patterns, and even complex algorithms, which reduces human error and improves consistency.
Code reviews: Assisting in code review processes by identifying potential issues, code smells, and suggesting improvements.
Testing: generating unit tests, integration tests, and other forms of automated testing to ensure code correctness and reliability.

GenAI also plays a lesser role in improving overall software quality by supporting higher-level aspects of software development, such as:

Requirements analysis & design: Analysing user requirements and generating relevant documentation.
Performance optimization: Suggesting optimizations to enhance software performance.

While GenAI tools are particularly effective in enhancing code quality by automating and optimizing coding tasks, their contribution to software quality is limited. If GenAI tools can create a lot more code, then measuring and maintaining good quality and good architecture becomes even more critical.

Software architecture: The foundation of software quality

Good software architecture is essential for building high-quality applications that adapt easily to changing requirements and support future enhancements. By measuring and assessing the quality of your architecture, you can ensure that your software will be able to meet the changing needs of your users and business.

Consider how your organization currently measures software quality, and the growing importance of doing so in a world increasingly driven by GenAI tools. Take proactive steps to ensure your software is reliable, maintainable, and scalable, which is what engineering excellence is all about.

What is software architecture? Checkout our guide.

Bridging the gap in microservices documentation with vFunction

Posted on January 29, 2025 by Lee Altman - Featured

Keeping microservices documentation up to date is a challenge every team knows too well. With competing priorities—building, deploying, and keeping up with the pace of innovation—documentation often falls by the wayside or gets done in sporadic bursts. Clear, current architecture diagrams are critical for productivity, yet creating and maintaining them often feels like an uphill battle—too time consuming and always a step behind your system’s reality.

To address these challenges, vFunction builds on its recently released architecture governance capabilities with new functionality that simplifies documenting, visualizing, and managing microservices throughout their lifecycle.

Real-time documentation for modern development

Seamless integration with existing tools and workflows, including CI/CD pipelines and documentation platforms, is essential for effective application development and management. New functionalities in vFunction’s architectural observability platform enable you to integrate and export vFunction’s architectural insights into your current workflows for greater efficiency and alignment.

Sequence diagrams: From static images to dynamic architecture-as-code diagrams

One of the standout features of our latest release is the ability to generate sequence diagrams based on runtime production data. We now track multiple alternative paths, loops, and repeated calls in a single flow—simplifying and speeding the detection and resolution of hard-to-identify bugs. Behind the scenes, we use Mermaid, a JavaScript-based diagramming and charting tool selected for its simplicity and compatibility with other programs. Mermaid uses Markdown-inspired text definitions to render and modify complex diagrams. The latest release also provides the option for the user to export these diagrams as code, specifically as Mermaid script.

mermaid based sequence diagram — *Mermaid-based sequence diagram in vFunction.*

export architecture as code — *Export architecture-as-code written in Mermaid syntax from vFunction.*

architecture as code into documentation — *Bring the architecture-as-code into your documentation tool of choice to share live sequence flows captured by vFunction.*

Exporting flows in Mermaid script retain all the architectural details needed for teams to embed, modify, and visualize diagrams directly within tools like Confluence or other documentation platforms. This support ensures teams can reflect live architecture effortlessly, maintaining dynamic, up-to-date documentation that evolves alongside their systems.

Quickly identify flows with errors

vFunction simplifies troubleshooting complex microservices by using sequence diagrams to identify flows with errors. These flows are marked in green (ok) or red (error), making it easy for developers to identify issues quickly. To see the number of calls and errors, developers can toggle between flows and dive into the sequence that caused the issue. Flows with an error rate above 10% are highlighted, making it easy to sort and prioritize areas that need attention. This streamlines the debugging process, helping teams quickly identify and fix issues, improving overall application reliability.

Support for the C4 model and workflows

The C4 model is a widely used framework that improves collaboration by illustrating the complex structures and interactions between context, container, component, and code (C4) diagrams within software systems. Now teams can export and import C4 container diagrams with vFunction to help detect architectural drift and support compliance, enhancing how engineers conceptualize, communicate, and manage distributed architectures throughout their lifecycle.

oms exported as C4 diagram — This diagram illustrates our Order Management System (OMS) demo application, exported as a C4 diagram and visualized as a graph with PlantUML. Support of C4 enables seamless export of live architecture into widely used formats for broader insights and collaboration.

Using “architecture as code,” vFunction aligns live application architecture with existing diagrams, acting as a real-time system of record. This ensures consistency, detects drift, and keeps architecture and workflows in sync as systems evolve. By moving beyond drift measurements within vFunction and enabling teams to compare real-time architectural flows against C4 reference diagrams, this capability ensures that teams can identify where drift has occurred and have the context to understand its impact and prioritize resolution.

Grouping services for better microservices management

If you have many microservices, you can now group services and containers using attributes like names, tags, and annotations, including those imported from C4 diagrams. This feature enhances governance by allowing you to apply architecture rules that enforce standards, prevent drift, and maintain consistency across your microservices. It helps teams organize, manage, and monitor their microservices architecture more effectively, ensuring alignment with best practices and reducing complexity.

Distributed application dashboard

The new dashboard for distributed applications provides a centralized and actionable overview of architectural complexity and technical debt across your distributed portfolio of apps, empowering teams to make informed, data-driven decisions to maintain system health.

vfunction new portfolio dashboard — vFunction’s new portfolio view tracks architectural complexity and technical debt across distributed applications, providing a clear overview of application health and actionable insights in the form of related TODOs.

The related technical debt report helps teams track changes in technical debt across distributed applications, providing valuable insights to prioritize remediation efforts and enhance architectural integrity. For example, Service X may have a higher technical debt of 8.3 due to circular dependencies and multi-hop flows, while Service Y scores 4.2, indicating fewer inefficiencies. Teams can focus remediation efforts on Service X, prioritizing areas where architectural debt has the most significant impact.

vfunction tech debt score — *The tech debt score delivers actionable clarity, enabling teams to understand newly added debt and streamline the prioritization of technical debt reduction.*

The accompanying technical debt score offers a clear, quantified metric based on all open tasks (TODOs), including inefficiencies such as circular dependencies, multi-hop flows, loops, and repeated service calls. Developers use this score to focus on resolving issues like multi-hop flows. For example resolving redundant calls in a service can bring the debt down from 7.5 to 4.5. This score delivers actionable clarity, enabling teams to understand newly added debt and streamline the prioritization of technical debt reduction.

Additionally, CI/CD integration for debt management takes this functionality a step further. Triggered learning scripts allow teams to incorporate technical debt insights directly into their CI/CD pipelines. By comparing the latest system measurements with baseline data, teams can approve or deny pull requests based on technical debt changes, ensuring alignment with architectural goals and mitigating risks before they escalate.

Sync architectural tasks with Jira

vFunction brings architecture observability and workflow management closer together by syncing TODO tasks directly with Jira. Engineering leaders can integrate architectural updates into sprints and bake related tasks into existing workflows.

export todos into jira — *TODOs identified in vFunction can now be opened as tickets in Jira in order to incorporate architecture modifications into the development lifecycle.*

Architecture audit log

The new logs section tracks key architectural decisions made relative to “to-do’s.” This log captures impactful actions such as adding or removing TODOs, uploading references, changing baselines, and setting the latest measurements, while excluding routine auto-generated or completed TODOs. Each entry includes details about the affected entities and the context for the action, along with the user responsible for the decision and the timestamp of when the decision was made. This enhancement highlights how tracking architectural decisions ensures alignment with organizational standards, aids in meeting compliance requirements and reduces the risk of miscommunication or errors by providing a clear historical record of changes.

Why it matters: Efficiency, alignment, and agility

With these new capabilities, vFunction empowers teams to reclaim valuable engineering time, improve productivity, and maintain alignment. By bridging the gap between architecture and workflow, this release makes it easier than ever to document, update, and share architectural insights. Teams can focus on building resilient, scalable systems without the overhead of disconnected tools and outdated diagrams.

In the fast-paced world of microservices, maintaining clear, actionable architecture diagrams is no longer a luxury—it’s a necessity. vFunction equips your team to stay agile, aligned, and ahead of complexity.

Ready to transform your management of microservices?

Explore how vFunction can help your team tackle microservices complexity. Contact us today to learn more or start a free trial.

OpenTelemetry tracing guide + best practices

Posted on December 10, 2024 by Michael Chiaramonte - Featured

As applications grow more complex to meet increasing demands for functionality and performance, understanding how they operate is critical. Modern architectures—built on microservices, APIs, and cloud-native principles—often form a tangled web of interactions, making pinpointing bottlenecks and resolving latency issues challenging. OpenTelemetry tracing provides greater visibility for developers and DevOps teams to understand the performance of their services and quickly diagnose issues and bottlenecks. This data can also be used for other critical applications, which we’ll incorporate into our discussion.

This post will cover all the fundamental aspects of OpenTelemetry tracing, including best practices for implementation and hands-on examples to get you started. Whether you’re new to distributed tracing or looking to improve your existing approach, this guide will give you the knowledge and techniques to monitor and troubleshoot your applications with OpenTelemetry. Let’s get started!

What is OpenTelemetry tracing?

OpenTelemetry, from the Cloud Native Computing Foundation (CNCF), has become the standard for instrumenting cloud-native applications. It provides a vendor-neutral framework and tools to generate, collect, and export telemetry consisting of traces, metrics, and logs. This data gives you visibility into your application’s performance and behavior.

dzone trend report observability survey — *79% of organizations who use observability tools use or are considering using OpenTelemetry. Get the latest in observability from the DZone Trend Report.*

Key concepts of OpenTelemetry

At its core, OpenTelemetry revolves around a few fundamental concepts, which include:

Signals: OpenTelemetry deals with three types of signals: traces, metrics, and logs. Each gives you a different view of your application’s behavior.
Traces: The end-to-end journey of a single request through the many services within your system.
Spans: These are individual operations within a trace. For example a single trace might have spans for authentication, database access and external API calls.
Context propagation: The mechanism for linking spans across services to give you a single view of the request’s path.

Within this framework, OpenTelemetry tracing focuses on understanding the journey of requests as they flow through a distributed system. Conceptually, this is like creating a map of each request path, encapsulating every single step, service interaction, and potential bottleneck along the way.

Why distributed tracing matters

Debugging performance issues in legacy, monolithic applications was relatively easy compared to today’s microservice applications. You could often find the bottleneck within these applications by looking at a single codebase and analyzing a single process; but with the rise of microservices, where a single user request can hit multiple services, finding the source of latency or errors is much more challenging.

For these more complex systems, distributed tracing cuts through this noise. This type of tracing can be used for:

Finding performance bottlenecks: Trace where requests slow down in your system.
Error detection: Quickly find the root cause of errors by following the request path.
Discovering service dependencies: Understand how your services interact and where to improve.
Capacity planning: Get visibility into resource usage and plan for future scaling.

Combining these basic building blocks builds the core functionality and insights from OpenTelemetry. Next, let’s examine how all of these components work together.

How OpenTelemetry tracing works

OpenTelemetry tracing captures the flow of a request through your application by combining instrumentation, context propagation, and data export. Here’s a breakdown:

Spans and traces

A user clicks a button on your website. This triggers a request that hits multiple services: authentication, database lookup, payment processing, etc. OpenTelemetry breaks this down into spans, or units of work in distributed tracing, representing a single operation or a piece of a process within a larger, distributed system. Spans help you understand how requests flow through a system by capturing critical performance and context information. In the example below, we can see how this works, including a parent span (the user action) and a child span corresponding to each sub-operation (user authentication) originating from the initial action.

opentelemetry tracing spans — *Spans help you understand how requests flow through a system by capturing critical performance and context information. Credit: Hackage.haskell.org*

Within a span, there are span attributes. These attributes include the following:

Operation name (e.g. “database query”, “API call”)
Start and end timestamps
Status (success or failure)
Attributes (additional context like user ID, product ID)

One or multiple spans are then linked together to form a trace, giving you a complete end-to-end view of the request’s path. The trace shows you how long each operation took, where the latency was, and the overall flow of execution.

Context propagation

So, how does an OpenTelemetry link a span across services? This is where context propagation comes in. Conceptually, we can relate this to a relay race. Each service is handed a “baton” with trace information when it receives a request. This baton, metadata contained in headers, allows the service to create spans linked to the overall trace. As the request moves from one service to the next, the context is propagated, correlating all the spans.

To implement this, OpenTelemetry uses the W3C Trace Context standard for context propagation. This standard allows the trace context to be used across different platforms and protocols. By combining spans and traces with context propagation, OpenTelemetry gives users a holistic and platform-agnostic way to see the complex interactions within a distributed system.

Getting started with OpenTelemetry tracing

With the core concepts covered, let’s first look at auto-instrumentation. Auto-instrumentation refers to the ability to add OpenTelemetry tracing to your service without having to modify your code or with very minimal changes. While it’s also possible to implement OpenTelemetry by leveraging the OpenTelemetry SDK and adding tracing to your code directly (with some added benefits), the easiest way to get started is to leverage OpenTelemetry’s auto-instrumentation for a “zero-code” implementation.

What is auto-instrumentation?

For those who don’t require deep customization or want to get OpenTelemetry up and running quickly, auto-instrumentation should be considered. Auto-instrumentation can be implemented in a number of ways, including through the use of agents that can automatically instrument your application without code changes, saving time and effort. The way it’s implemented depends on your specific development language / platform.

The benefits of running auto-instrumentation include:

Quick setup: Start tracing with minimal configuration.
Comprehensive coverage: Automatically instrument common libraries and frameworks.
Reduced maintenance: Less manual code to write and maintain.

To show how easy it is to configure, let’s take a brief look at how you would implement auto-instrumentation in Java.

How to implement auto-instrumentation in Java

First, download the OpenTelemetry Agent, grabbing the opentelemetry-javaagent.jar from the OpenTelemetry Java Instrumentation page.

Once the agent is downloaded, you will need to identify the parameters needed to pass to the agent so that the application’s OpenTelemetry data can be exported properly. The most basic of these would be:

Export endpoint – The server endpoint where all the telemetry data will be sent for analysis.
Protocol – The protocol to be used for exporting the telemetry data. OpenTelemetry supports several different protocols, but we’ll be selecting http/protobuf for this example

Exporting will send your telemetry data to an external service for analysis and visualization. Some of the popular platforms include Jaeger and Zipkin and managed services such as Honeycomb or Lightstep.

While these platforms are great for visualizing traces and finding performance bottlenecks, vFunction complements this by using this same tracing data to give you a deeper understanding of your application architecture. vFunction will automatically analyze your application traces and generate a real-time architecture map showing service interactions and dependencies as they relate to the application’s architecture, not its performance. This can help developers and architects identify potential architectural issues that might cause performance problems that are identified within the other OpenTelemetry tools being used.

analyze app dependencies with vFunction — *vFunction analyzes applications to clearly identify unnecessary dependencies between services, reducing complexity and technical debt that can degrade performance.*

Once you have the settings needed for exporting, you can run your application with the agent. That command will look like this:

java -javaagent:path/to/opentelemetry-javaagent.jar -jar myapp.jar
-Dotel.exporter.otlp.protocol=http/protobuf
-Dotel.exporter.otlp.endpoint=https://opentelemetryserver

In the command, you’ll need to replace path/to/ with the actual path to the agent JAR file and myapp.jar with your application’s JAR file. Additionally, the endpoint would need to be changed to an actual endpoint capable of ingesting telemetry data. For more details, see the Getting Started section in the instrumentation page’s readme.

As you can see, this is a simple way to add OpenTelemetry to your individual services without modification of the code. If you choose this option, ensure that you understand and have confirmed the compatibility between the auto-instrumentation agent and your application and that the agent supports your application’s libraries. Another consideration revolves around customization. Sometimes auto-instrumentation does not support a library you’re using or is missing data that you require. If you need this kind of customization, then you should consider updating your service to use the OpenTelemetry SDK directly.

Updating your service

Let’s look at what it takes to manually implement OpenTelemetry tracing in a simple Java example. Remember, the principles apply across any of the different languages within which you could implement OpenTelemetry.

Setting up a tracer

First, you’ll need to add the OpenTelemetry libraries to your project. For Java, you can include the following dependencies in your pom.xml (Maven) or build.gradle (Gradle) file:

<dependency>
  <groupId>io.opentelemetry</groupId>
  <artifactId>opentelemetry-api</artifactId>
  <version>1.26.0</version> 
</dependency>
<dependency>
  <groupId>io.opentelemetry</groupId>
  <artifactId>opentelemetry-sdk</artifactId>
  <version>1.26.0</version> 
</dependency>
<dependency>
  <groupId>io.opentelemetry</groupId>
  <artifactId>opentelemetry-exporter-otlp</artifactId>
  <version>1.26.0</version> 
</dependency>

Next, your code will need to be updated to initialize the OpenTelemetry SDK within your application and create a tracer:

import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor;


public class MyApplication {


    public static void main(String[] args) {
        // Configure the OTLP exporter to send data to your backend
        OtlpGrpcSpanExporter otlpExporter = OtlpGrpcSpanExporter.builder().build();


        SdkTracerProvider tracerProvider = SdkTracerProvider.builder()
                .addSpanProcessor(BatchSpanProcessor.builder(otlpExporter).build())
                .build();


        OpenTelemetrySdk openTelemetrySdk = OpenTelemetrySdk.builder()
                .setTracerProvider(tracerProvider)
                .buildAndRegisterGlobal();


        // Create a tracer
        Tracer tracer = openTelemetrySdk.getTracer("my-instrumentation-library");


        // ... your application code ...
    }
}

This code snippet initializes the OpenTelemetry SDK with an OTLP exporter. Just like with auto-instrumentation, this data will be exported to an external system for analysis. The main difference is that this is configured here in code, rather than through command-line parameters.

Instrumentation basics

With the tracer set up, it’s time to get some tracing injected into the code. For this, we’ll create a simple function that simulates a database query, as follows:

import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.StatusCode;
import io.opentelemetry.context.Scope;5


public void queryDatabase(String query) {
    Span span = tracer.spanBuilder("database query").startSpan();
    try (Scope scope = span.makeCurrent()) {
        // Simulate database operation
        Thread.sleep(100); 
        span.setAttribute("db.statement", query);
        span.setStatus(StatusCode.OK);
    } catch (Exception e) {
        span.setStatus(StatusCode.ERROR, "Error querying database");
    } finally {
        span.end();
    }
}

This code does a few things that are applicable to our OpenTelemetry configuration. First, we use tracer.spanBuilder(“database query”).startSpan() to create a new span named “database query.” Then, we use span.makeCurrent() to ensure that this span is active within the try block.

Within the try block, we use Thread.sleep() to simulate a database command. Then we add a span.setAttribute(“db.statement”, query) to record the query string and set the span.setStatus to OK if successful. If the operation causes an error, you’ll see in the catch block that we call span.setStatus again, passing it an error to be recorded. Finally, span.end() completes the span.

In our example above, this basic instrumentation captures the execution time of the database query and provides context within the query string and status. You can use this pattern to manually instrument various operations in your application, such as HTTP requests, message queue interactions, and function calls.

Leveraging vFunction for architectural insights

Combining the detailed traces from OpenTelemetry implementation like the one above with vFunction’s architectural analysis gives you a complete view of your application’s performance and architecture. For example, if you find a slow database query through OpenTelemetry, vFunction can help you understand the service dependencies around that database and potentially reveal architectural bottlenecks causing the latency.

To integrate OpenTelemetry tracing data with vFunction:

1. Configure the OpenTelemetry collector:

NOTE: If you’re not using a collector, you can skip this step and send the data directly to vFunction

– Configure the OpenTelemetry Collector to export traces to vFunction.
- Example configuration collector:

extensions:
  health_check:
  headers_setter:
    headers:
      - action: insert
        key: X-VF-APP
        from_context: X-VF-APP
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
        include_metadata: true
processors:
  batch:
    metadata_keys:
    - X-VF-APP
exporters:
  otlphttp/vf:
    endpoint: $PROTOCOL://$VFSERVER/api/unauth/otlp
    auth:
      authenticator: headers_setter
service:
  extensions: [health_check, headers_setter]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/vf]

2. Configure your service to include the required vFunction trace headers:

Each service that should be considered for the analysis needs to send its trace data to either vFunction or the collector. As part of this, a trace header must be added to help vFunction know with which distributed application this service is associated. After creating the distributed application in the vFunction server UI, it provides instructions on how to export telemetry data to the collector or vFunction directly. One way this can be done is via the Java command line:

-javaagent:<path-to-opentelemetry-javaagent.jar>
-Dotel.exporter.otlp.protocol=http/protobuf
-Dotel.exporter.otlp.endpoint=http://opentelemetrycollector:4318
-Dotel.traces.exporter=otlp
-Dotel.metrics.exporter=none
-Dotel.logs.exporter=none
-Dotel.exporter.otlp.traces.headers=X-VF-APP=<app header UUID>,X-VF-TAG=<env>

Other examples are provided in the server UI. The <app header UUID> above refers to a unique ID that’s provided in the vFunction server UI to associate a service with the application. The ID can be easily found by clicking on the “Installation Instructions” in the server UI and following the instructions.

3. Verify the integration:

– Generate some traces in your application.

– Check the vFunction server UI, and you should see the number of agents and tags increase as more services begin to send telemetry information. Click on “START” at the bottom of the interface to begin the analysis of the telemetry data. As data is received, vFunction’s UI will begin to visualize and analyze the incoming trace data.

By following these steps and integrating vFunction with your observability toolkit, you can effectively instrument your application and gain deeper insights into its performance and architecture.

Best practices for OpenTelemetry tracing

Implementing OpenTelemetry tracing effectively involves more than just adding instrumentation. As with most technologies, there are good, better, and best ways to implement OpenTelemetry. To get the full value from your tracing data, follow these best practices Consider the following key points when implementing and utilizing OpenTelemetry in your applications:

Semantic conventions

Adhere to OpenTelemetry’s semantic conventions for naming spans, attributes, and events. Consistent naming ensures interoperability and makes it easier to analyze traces.

For example, if you are creating a span for an HTTP call, add all of the relevant details to the span attributes and attribute the key-value pairs of information to the span. This might look something like this:

span.setAttribute("http.method", "GET");
span.setAttribute("http.status_code", 200);

The documentation on the OpenTelemetry website provides a detailed overview of the recommended semantic conventions, which are certainly worth exploring.

Efficient context propagation

Use efficient context propagation mechanisms to link spans across services with minimal overhead. OpenTelemetry supports various propagators, such as W3C TraceContext, the default propagator specification used with OpenTelemetry.

To configure the W3C TraceContext propagator in Java, do the following:

GlobalOpenTelemetry.setPropagators(
    ContextPropagators.create(W3CTraceContextPropagator.getInstance())
);

If you want to bridge outside of the default value, the OpenTelemetry docs on context propagation have extensive information to review, including the Propogators API.

Tail-based sampling

When it comes to tracing, be aware of different sampling strategies that can help to manage data volume while retaining valuable traces. One method to consider is tail-based sampling, which makes sampling decisions after a trace is completed, allowing you to keep traces based on specific characteristics like errors or high latency.

To implement tail-based sampling, you can configure it in the OpenTelemetry Collector or directly in the backend. More information on the exact configuration can be found within the OpenTelemetry docs on tail-based sampling.

Adhering to these best practices and incorporating auto-instrumentation as appropriate can enhance the efficiency and effectiveness of your OpenTelemetry tracing, yielding valuable insights into your application’s performance.

Troubleshooting and optimization

Although OpenTelemetry provides extensive data, effectively troubleshooting and optimizing your application requires understanding how to leverage this information. Here are strategies for using traces to identify and resolve issues:

Recording errors and events

When an error occurs, you need to capture relevant context. OpenTelemetry allows you to record exceptions and events within your spans, providing more information for debugging.

For example, in Java you can add tracing so that error conditions, such as those caught in a try-catch statement, can be captured correctly. For example, in your code, it may look something like this:

try {
    // ... operation that might throw an exception ...
} catch (Exception e) {
    span.setStatus(StatusCode.ERROR, e.getMessage());
    span.recordException(e); 
}

This code snippet sets the span status to ERROR, records the exception message, and attaches the entire exception object to the span. Thus, you can see not only that an error occurred but also the specific details of the exception, which can be extremely helpful in debugging and troubleshooting. You can also use events to log important events within a span, such as the start of a specific process, a state change, or a significant decision point within a branch of logic.

Performance monitoring with traces

Traces are also invaluable for identifying performance bottlenecks. By examining the duration of spans and the flow of requests, you can pinpoint slow operations or services causing performance issues and latency within the application. Most tracing backends that work with OpenTelemetry already provide tools for visualizing traces, filtering by various criteria (e.g., service, duration, status), and analyzing performance metrics.

opentelemetry tracing complexity — vFunction uses OpenTelemetry tracing to reveal the complexity behind a single user request, identifying overly complex flows and potential bottlenecks, such as the OMS service (highlighted in red), where all requests are routed through a single service.

vFunction goes beyond performance analysis, by correlating trace data with architectural insights. For example, if you identify a slow service through OpenTelemetry, vFunction can help you understand its dependencies, resource consumption, and potential architectural bottlenecks contributing to the latency, providing deep architectural insights that traditional performance-based observability tools don’t reveal.

Pinpoint and resolve issues faster

By combining the detailed information from traces with vFunction’s architectural analysis, you can reveal hidden dependencies, overly complex flows, and architectural anti-patterns that impact the resiliency and scalability of your application. Pulling tracing data into vFunction to support deeper architectural observability, empowers you to:

Isolate the root cause of errors: Follow the request path to identify the service or operation that triggered the error.
Identify performance bottlenecks: Pinpoint slow operations or services that are causing delays.
Understand service dependencies: Visualize how services interact and identify potential areas for optimization.
Verify fixes: After implementing a fix, use traces to confirm that the issue is resolved and performance has improved.

OpenTelemetry tracing, combined with the analytical capabilities of platforms like vFunction, empowers you to troubleshoot issues faster and optimize your application’s performance more effectively.

Next steps with vFunction

OpenTelemetry tracing provides a powerful mechanism for understanding the performance and behavior of your distributed applications. By instrumenting your code, capturing spans and traces, and effectively analyzing the data, you can identify bottlenecks, troubleshoot errors, and optimize your services.

Discover how vFunction can transform your application development.

Start Free Trial

Navigating complexity: Overcoming challenges in microservices and monoliths with vFunction

Posted on October 15, 2024 by Nenad Crncec - Featured

We’re excited to have Nenad Crncec, founder of Architech, writing this week’s blog post. With extensive experience in addressing architectural challenges, Nenad shares valuable insights and highlights how vFunction plays a pivotal role in overcoming common stumbling blocks. Take it away, Nenad!

In my journey through various modernization projects, one theme that consistently emerges is the challenge of managing complexity—whether in microservices and distributed systems or monolithic applications. Complexity can be a significant barrier to innovation, agility, and scalability, impacting an organization’s ability to respond to changing market demands.

Complexity can also come in many forms: Complex interoperability, complex technology implementation (and maintenance), complex processes, etc.…

“Complex” is something we can’t clearly understand – it is unpredictable and unmanageable because of the multifaceted nature and the interaction between components.

Imagine trying to assemble flat-pack furniture without instructions, in the dark, while wearing mittens. That’s complexity for you.

What is complexity in software architecture?

Complexity, in the context of software architecture and system design, refers to the degree of intricacy and interdependence within a system’s components and processes. It encompasses how difficult it is to understand, modify, and maintain the system. Complexity arises from various factors, including the number of elements in the system, the nature of their interactions, the technologies used, and the clarity of the system’s structure and documentation.

Complexity also arises from two additional factors, even more impactful – people and time – but that is for another article.

complexity different architectures — *Complexity creates all sorts of challenges across different types of architectures.*

The double-edged sword of microservices

I recently assisted a company in transitioning from a monolithic architecture to microservices. The promise of microservices—greater flexibility, scalability, and independent deployability—was enticing. Breaking down the application into smaller, autonomous services allowed different teams to work concurrently, accelerating development.

Allegedly.

While this shift offered many benefits, it also led to challenges such as:

Operational overhead: Managing numerous services required advanced orchestration and monitoring tools. The team had to invest in infrastructure and develop new skill sets to handle containerization, service discovery, and distributed tracing. Devops, SRE’s were spawned as part of agile transformation and a once complex environment…remained complex.
Complex inter-service communication: Ensuring reliable communication between services added layers of complexity. Network latency, message serialization, and fault tolerance became daily concerns. Add to that communication (or lack thereof) between teams, building those services that need to work together and you have a recipe for disaster. If not managed and governed properly.
Data consistency issues: Maintaining consistent data across distributed services became a significant concern. Without clear data governance, the simplest of tasks can become epic sagas of “finding and understanding data.”

And then there were the people—each team responsible for their own microservice, each with their own deadlines, priorities, and interpretations of “RESTful APIs.” Time pressures only added to the fun, as stakeholders expected the agility of microservices to translate into instant results.

Despite these challenges, the move to microservices was essential for the company’s growth. However, it was clear that without proper management, the complexity could outweigh the benefits.

The hidden complexities of monolithic applications

On the other hand, monolithic applications, often the backbone of legacy systems, tend to accumulate complexity over time. I recall working with an enterprise where the core application had evolved over years, integrating numerous features and fixes without a cohesive architectural strategy. The result was a massive codebase where components were tightly coupled, making it difficult to implement changes or updates without unintended consequences.

This complexity manifested in several ways:

Slower development cycles: Even minor changes required extensive testing across the entire application.
Inflexibility: The application couldn’t easily adapt to new business requirements or technologies.
High risk of errors: Tightly coupled components increased the likelihood of bugs when making modifications.

But beyond the code, there were people and time at play. Teams had changed over the years, with knowledge lost as developers, business analysts, sysadmins, software architects, engineers, and leaders, moved on. Institutional memory was fading, and documentation was, well, let’s say “aspirational.” Time had turned the once sleek application into a relic, and people—each with their unique coding styles and architectural philosophies—had added layers of complexity that no one fully understood anymore.

people in complexity equation — *As people leave organizations, institutional memory fades and teams are left with apps no one understands.*

Adding people and time to the complexity equation

It’s often said that technology would be simple if it weren’t for people and time. People bring creativity, innovation, and, occasionally, chaos. Time brings evolution, obsolescence, and the ever-looming deadlines that keep us all on our toes.

In both monolithic and microservices environments, people and time contribute significantly to complexity:

Knowledge silos: As teams change over time, critical knowledge can be lost. New team members may not have the historical context needed to make informed decisions, leading to the reinvention of wheels—and occasionally square ones.
Diverging priorities: Different stakeholders have different goals, and aligning them is like trying to synchronize watches in a room full of clocks that all think they’re the master timekeeper.
Technological drift: Over time, technologies evolve, and what was cutting-edge becomes legacy. Keeping systems up-to-date without disrupting operations adds another layer of complexity.
Cultural differences: Different teams may have varying coding standards, tools, and practices, turning integration into an archaeological expedition.

Addressing complexity with vFunction

Understanding the intricacies of both monolithic and microservices architectures led me to explore tools that could aid in managing and reducing complexity. One such tool is vFunction, an AI-driven architectural observability platform designed to facilitate the decomposition of monolithic applications into microservices and observe behaviour and architecture of distributed systems.

Optimizing microservices architectures

In microservice environments (distributed systems), vFunction plays an important role in deciphering complexity:

Identifying anti-patterns: The tool detects services that are overly chatty, indicating that they might be too granular or that boundaries were incorrectly drawn. Think of it as a polite way of saying, “Your services need to mind their own business a bit more.”
Performance enhancement: By visualizing service interactions, we could optimize communication paths and reduce latency. It’s like rerouting traffic to avoid the perpetual construction zone that is Main Street.
Streamlining dependencies: vFunction helps us clean up unnecessary dependencies, simplifying the architecture. Less is more, especially when “more” equals “more headaches.”

understand and structure microservices — *vFunction helps teams understand and structure their microservices, reducing unnecessary dependencies.*

How vFunction helps with monolithic complexity

When dealing with complex monolithic systems, vFunction can:

Automate analysis: vFunction scans the entire system while running, identifying dependencies and clustering related functionalities. This automated analysis saved countless hours that would have been spent manually tracing code. It was like having a seasoned detective sort through years of code crimes.
Define service boundaries: The platform suggested logical partitions based on actual usage patterns, helping us determine where natural service boundaries existed. No more debates in meeting rooms resembling philosophical symposiums.
Prioritizing refactoring efforts: By highlighting the most critical areas for modernization, vFunction allows us to focus on components that would deliver the most significant impact first. It’s amazing how a clear priority list can turn “we’ll never finish” into “we’re making progress.”

architecture is important — *Does your organisation manage architecture? How are things built, maintained, planned for the future? How does your organisation treat architecture? Is it part of the culture?*

Bridging the people and time gap with vFunction

One of the unexpected benefits of using vFunction it its impact on the people side of the equation:

Knowledge transfer: The visualizations and analyses provided by the tool help bring new team members up to speed faster than you can say “RTFM.”
Unified understanding: With a common platform, teams have a shared reference point, reducing misunderstandings that usually start with “I thought you meant…”
Accelerated timelines: By adopting it in the modernization process, vFunction helps us meet tight deadlines without resorting to the classic solution of adding more coffee to the project.

Practical use case and lessons learned

Now that this is said and done, there are real-life lessons that you should take to heart (and brain…)

Does your organisation manage architecture? How are things built, maintained, planned for the future? How does your organisation treat architecture? Is it part of the culture?

Every tool is useless if it is not used.

In the project where we transitioned a large European bank to microservices, using vFunction (post-reengineering) provided teams with fine-tuned architecture insights (see video at the top of this blog). We analyzed both “monolithic” apps and “distributed’ apps with microservices. We identified multi-hop and cyclic calls between services, god classes, dead code, high complexity classes… and much more.

We used initial measurements and created target architecture based on it. vFunction showed us where complexity and coupling lies and how it impacts the architecture.

vfunction todos to tackle issues — *vFunction creates a comprehensive list of TODOs which are a guide to start tackling identified issues.*

One major blocker is not treating architecture as a critical artifact in team ownership. Taking care of architecture “later” is like building a house, walls and everything, and deciding later what is the living room, where is the bathroom, and how many doors and windows we need after the fact. That kind of approach will not make a family happy or a home safe.

Personal reflections on using vFunction

“What stands out to me about vFunction is how it brings clarity to complex systems. It’s not just about breaking down applications but understanding them at a fundamental level. This comprehension is crucial for making informed decisions during modernization.”

In both monolithic and microservices environments, the vFunction’s architectural observability provided:

Visibility: A comprehensive view of the application’s structure and interdependencies.
Guidance: Actionable insights that informed our architectural strategies.
Efficiency: Streamlined processes that saved time and resources.

Conclusion: Never modernize again

Complexity in software architecture is inevitable, but it doesn’t have to be an insurmountable obstacle. Whether dealing with the entanglement of a monolith or the distributed nature of microservices, tools like vFunction offer valuable assistance.

By leveraging platforms such as vFunction, organizations can:

Reduce risk: Make changes with confidence, backed by data-driven insights.
Enhance agility: Respond more quickly to business needs and technological advancements.
Promote innovation: Free up resources to focus on new features and improvements rather than wrestling with complexity.

From my experiences, embracing solutions that tackle architectural complexity head-on is essential for successful modernization. And more than that, it is a tool that should help us never modernize again, by continually monitoring architectural debt and drift, helping us to always keep our systems modern and fresh. It’s about empowering teams to understand their systems deeply and make strategic decisions that drive growth.

Take control of your microservices, macroservices, or distributed monoliths with vFunction

Request a Demo

vFunction recognized as a 2024 Gartner® Cool Vendor

Posted on August 22, 2024 by Moti Rafalin - Featured

Major alert: vFunction was just named a 2024 Gartner Cool Vendor in AI-Augmented Development and Testing. We’re incredibly grateful and proud of this recognition. We’re also excited about the opportunity to share our platform on a larger scale and bring architectural observability to more enterprises.

According to the “Gartner Cool Vendors™ in AI Augmented Development and Testing for Software Engineering” report, “As the codebase and architectural complexity grow, the processing power required to handle local builds escalates significantly. Many organizations struggle to equip their software engineers with the necessary tools to meet the increasing demand for faster delivery from idea to production, impacting overall productivity and efficiency.” For too long, organizations have long struggled to fully grasp the complexity of their application architectures as they evolve throughout the SDLC. Enterprises juggle all types of application architectures, modular monoliths, distributed monoliths, miniservices, microservices and more, having to make tradeoffs between agility and complexity. Traditional approaches—relying on manual code reviews, fragmented documentation, and institutional knowledge—have proven largely inadequate in identifying and addressing architectural risks and prioritizing necessary fixes at today’s speed of business. This has resulted in an architectural blind spot that has significantly impeded modernization efforts and led to mounting technical debt – particularly architectural technical debt – as well as unrealized revenue potential in the billions.

To remediate technical debt effectively, Gartner recommends that “organizations use architectural observability tools to thoroughly analyze software architecture, identify inconsistencies, and gain deeper insights.”

At vFunction, we see software architecture as a critical but often underutilized driver of business success. We believe being recognized as a Gartner Cool Vendor validates our innovative approach to empower engineering teams to innovate faster, address resiliency earlier, build smarter, and create scalable applications that change the trajectory of their business. With our AI-driven architectural observability platform, teams are equipped with valuable insights to find and fix unnecessary complexity and technical debt across large, complex applications and modern, highly distributed microservices health throughout the organization. Software teams use the platform to understand their applications, identify the sources of technical debt, and find refactoring opportunities to enhance scalability, resiliency, and engineering velocity.

Five reasons why we believe vFunction was recognized as a Cool Vendor.

Architectural observability plays a key role in managing the complexities of modern software development. Gartner states that, “By 2027, 80% of software engineering groups will monitor software architecture complexity and architecture technical debt in near real time, up from less than 10% today.” We feel we’re at the forefront of this trend, providing the tools necessary to meet this growing need.

By vigilantly monitoring architectural technical debt and drift across the entire application portfolio, our solution equips software engineering leaders and their teams with the insights necessary to make informed decisions. Here’s why we believe vFunction stands out:

AI-powered. vFunction’s architectural observability platform understands and visualizes application architecture to reduce technical debt and complexity.
Find and fix technical debt. vFunction uses extensive data to identify and remediate architectural technical debt across the entire application portfolio.
Shift left. Address the root causes of technical debt to prevent performance issues before they arise using vFunction’s patented methods of static and dynamic analysis.
Prioritize and alert. vFunction incorporates a prioritized task list into every sprint to fix key technical debt issues, based on your unique business goals.
Any architecture. The platform relies on OpenTelemetry to support a wide spectrum of programming languages in the distributed world, and Java and .NET for monolithic architectures, so you can use it for a variety of use cases, from monoliths to distributed microservices, refining microservices, and considering modular monoliths.

Organizations face immense pressure to deliver high-quality software rapidly, stay competitive, and pivot quickly in response to market demands. The rapid accumulation of technical debt exacerbates these challenges, hampering engineering velocity, limiting application scalability, and impacting resiliency. This often results in increased risks of outages, delayed projects, and missed opportunities.

Ready to put the freeze on software complexity and mounting technical debt? Let us partner with you to unlock the full potential of your software architecture. Contact us today to learn how vFunction can be an indispensable asset in transforming your software development practices.

Gartner, Inc. Cool Vendors in AI-Augmented Development and Testing for Software Engineering. Tigran Egiazarov, Philip Walsh, etl. 8 August 2024.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved. Cool Vendors is a registered trademark of Gartner, Inc. and/or its affiliates and is used herein with permission. All rights reserved.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Driving innovation: How vFunction accelerates Turo’s journey to 10X scale

Posted on July 23, 2024 by Amir Rapson - Featured

Adam Safran, Senior Engineering Manager at Turo, took the stage with me at the 2024 Gartner Application Innovation & Business Solutions Summit to discuss how vFunction’s architectural observability platform supports Turo’s journey to 10x scale.

Turo, the world’s largest car-sharing marketplace, is on a mission to put the world’s 1.5 billion cars to better use. Since 2010, Turo has consistently grown to amass over 3.5 million active guests and over 6.5 billion miles driven. This growth is great, but it also introduces challenges.

Turo’s growth pushed the limits of its monolith

Turo’s growth presented new challenges for its twelve-year-old monolithic application and rapidly expanding engineering team.

To address these concerns, Turo CTO Avinash Gangadharan proposed a mandate to achieve 10x scale. He developed two new engineering domains: a platform domain focused on developer experience and reliability and a core services domain focused on scale.

Adam’s API services team acts as a bridge between the two domains. However, the proposal faced several challenges:

Justifying the initiative to leadership: When things are going well, it can be challenging to explain to leadership the necessity of investing in scale. Turo’s consistent growth required investing in scalability without sacrificing new feature development.

“The challenge is that there’s sort of a sense of if ‘it ain’t broke, don’t fix it.’”
Adam Safran, Senior Engineering Manager at Turo

System and organizational scale challenges: Turo’s monolithic application faced several issues due to growth:
- The number of developers contributing to Turo’s codebase more than doubled from 28 contributors in 2017 to 72 in 2024
- Turo’s engineering team grew from 80 engineers in 2020 to over 200 in 2024
- The number of tables grew from 62 in 2011 to 491 in 2023
- Their reservation table went from 13 in 2011 to 36 in 2019
  
  Turo began experiencing deployment issues, with releases taking 5-10 hours to get code into production. Organizational silos emerged, leading to increased ambiguity around domain ownership.
Loss of modularity: As new domains were added to the application, classes written for a single purpose took on other responsibilities, leading to entanglement. Database tables previously called from one logical domain were called from two different places, mixing up their data layer with their business logic. As a result of over a decade of ad hoc development, Turo’s architecture lost its modularity, impacting the application’s scalability and resiliency.

turo monolith — *Turo’s monolithic application as shown in vFunction*. “*This is what happens over twelve years of ad hoc development to build a world-class vehicle marketplace without pausing to consider scale*,” said *Adam Safran.*

The shift from monolith to microservices

To address the challenges posed by their monolithic application, Turo chose first to extract its conversation service. The decision was based on the service’s frequency of use and overall latency issues. The goal was to improve engineering velocity by reducing complexity through distinct microservices that can scale separately and provide clarity on domain ownership.

Once the conversation service was modernized to a “lean, mean microservices machine,” Turo achieved the following results:

Faster average response times that went from half a second to 19 milliseconds with the 99th percentile response times improving from about seven seconds to just under one second
25-100x better sync times
Improved code deployment from 5-10 hours to just five minutes

Challenges following microservice extraction

However, soon after creating the microservice, Turo realized that the application’s architecture could quickly become entangled without complete visibility, with teams adding new code and dependencies to new microservices. These changes might not be discovered until weeks or months later, leading to Adam’s API services team having to untangle “spaghetti code.”

vFunction — visualizing architecture, paving the way to scalability

To ensure vigilance in observing the application’s software architecture, Turo turned to vFunction to provide continuous architectural observability with dynamic and static code analysis and a real-time view of the application’s dependencies. vFunction helps the microservices architecture remain resilient and scalable while monitoring for technical debt accumulation and reducing software complexity as the team adds new features.

“We’re making this investment now in scale while the wheels are moving so that our product teams can continue to focus on the features that our users want.”
Adam Safran, Senior Engineering Manager at Turo

With vFunction, Turo identified a repeatable process to extract microservices and achieve scalability goals. Here’s what Adam describes as the best practices when modernizing an application:

Use vFunction to understand your domain, the interconnections between services, and how to maintain service boundaries to avoid technical debt
Over communicate with your teams about what the organization is doing and how they are doing it
Break down organizational silos to reduce domain pollution
Share your journey to help uplevel the organization
Document successes to demonstrate ROI
Rinse and repeat

As Turo’s business continues its growth trajectory, vFunction architectural observability enables the team to visualize its application and continuously find and fix technical debt before it has a chance to take root.

Organizations that deploy architectural observability experience improved application resiliency and scalability while increasing their engineering velocity. This ensures continued innovation and sharpens their competitive edge.

“I wish we had vFunction when I started at Turo. This kind of architectural observability gives us a much better understanding into our application and helps us with decision making as we move forward.”
Adam Safran, Senior Engineering Manager at Turo

If you’d like to learn more about how vFunction can help your organization, contact us. Tell us the big goals for your application and we’ll show you how architectural observability gives you a clear path to get there faster.

How to Measure Technical Debt: Step by Step Guide

Posted on July 11, 2024 by Matt Tanner - Featured

To meet the challenges posed by customers and competitors in today’s rapidly changing marketplace, you must regularly update and modernize the software applications on which your business operations depend. In such an environment, technical debt is inevitable and highly detrimental. Knowing how to measure and manage technical debt effectively is essential.

Understanding how much technical debt a company has is crucial for setting up accurate metrics and realizing the extent of the debt. Getting a complete picture of the technical debt metrics within your organization’s applications makes it easier to manage and track.

According to Gartner, companies that manage technical debt “will achieve at least 50% faster service delivery times to the business.” On the other hand, organizations that fail to manage their technical debt properly can expect higher operating expenses, reduced performance, and a longer time to market. As a report from McKinsey makes clear, “Poor management of tech debt hamstrings companies’ ability to compete.” With so much riding on managing and remedying technical debt, it’s a topic that architects, developers, and technical leaders must know well.

What is technical debt?

The term “technical debt” was coined in 1992 by computer scientist Ward Cunningham to vividly illustrate the long-term consequences of the short-term compromises and workarounds developers often incorporate into their code. Much like financial debt, where borrowing money now leads to interest payments later, technical debt accumulates “interest” through increased development time, decreased system stability, and the potential for future bugs or failures.As TechTarget explains, technical debt is an inevitable consequence of the “build now, fix later” mentality that sometimes pervades software development projects. With tight deadlines, limited resources, or evolving requirements, developers may opt for quick-and-dirty solutions rather than investing the time and effort to build robust, scalable, and maintainable code.

In essence, technical debt is the result of prioritizing speed over quality. While these shortcuts may seem beneficial in the short term, allowing teams to meet deadlines or deliver features faster, they can significantly impact reliability down the line. Just as ignoring financial debt can lead to financial ruin, neglecting technical debt can crush a software project, making it increasingly difficult and expensive to maintain, modify, or extend.

The snowball effect

Technical debt doesn’t just remain static; it accumulates over time. As teams implement more quick fixes and workarounds, the codebase becomes increasingly convoluted, complex, and difficult to understand. This, in turn, makes it harder for developers to identify and fix bugs, add new features, or refactor the code to improve its quality. The result is a vicious cycle where technical debt begets more technical debt, leading to a gradual decline in the software’s overall health and performance. Growing technical debt signifies that the complexity of the code is increasing, which will eventually require untangling and negatively impact code quality.

Understanding the nature of technical debt and its potential consequences is the first step toward managing it effectively. Although the impacts of technical debt can be gradual, they can result in massive disadvantages in the long run.

Disadvantages of technical debt

Since no software development project ever has all the time or resources required to produce a perfect codebase, some technical debt is unavoidable. That’s not necessarily bad if an application’s technical debt is promptly “paid off.” Otherwise, just as with financial debt, the costs of repaying the “principal” plus the “interest” on the debt can eventually reach crippling proportions.

The “principal” portion of technical debt is the cost of fixing the original code, dependencies, and frameworks to enable it to function in today’s technology environment. The “interest” is the added cost of maintaining such applications, which continues to compound over time. The challenge is keeping an aging and inflexible legacy application running as it becomes increasingly incompatible with the rapidly changing modern infrastructure it operates on top of.

Technical debt can significantly hinder a company’s ability to innovate. According to a recent U.S. study, more than half of respondents dedicate at least a quarter of their annual budget to technical debt. Poorly written code is a common form of technical debt, often leading to increased maintenance costs and reduced code quality.

And other costs of technical debt are, perhaps, even worse than the financial ones:

Less innovation: The time developers devote to dealing with technical debt is time taken away from developing the innovations that can propel the business forward in its marketplace.
Slow test and release cycles: Technical debt makes legacy apps brittle (easy to break), opaque (hard to understand), and challenging to upgrade safely. That means teams must devote more time to understanding the potential impact of changes and testing them to ensure they don’t cause unexpected disruptions in the app’s operation.
Inability to meet business goals: This is the inevitable result of the previous two issues. In today’s environment of rapid evolution in technology and market requirements, the inability to quickly release and deploy innovative new applications can impede a company’s ability to meet its goals.
Security exposures: Because modern security concerns were typically unknown or disregarded when older apps were designed or patched, security-related technical debt often constitutes a significant vulnerability for legacy code.
Poor developer morale: For many developers, dealing with technical debt can be mind-numbing and frustrating. In one survey, 76% of respondents affirmed that “paying down technical debt” negatively impacted their morale.

Ward Cunningham explains the destructive potential of technical debt this way:

“The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation.”

While increasing amounts of technical debt feels normal to most, it’s essential to understand the impact so that managing it becomes a priority. Teams must also be aware of various types of tech debt.

What are the types of technical debt?

Technical debt manifests in various forms, each with unique characteristics and potential consequences. Understanding these types is crucial for effectively identifying and managing technical debt within the applications that an organization owns and maintains. Here are examples of tech debt that can arise.

Code debt: This is the most common and easily recognizable type of technical debt. It refers to accumulating poorly written, overly complex, or outdated code. Code debt can result from rushed development, lack of adherence to coding standards, or simply the evolution of technology and best practices over time. Symptoms of code debt include excessive bugs, difficulty understanding and modifying the code, and slow performance. Additionally, changing existing code can be particularly challenging, often requiring significant time and effort to ensure stability and maintainability.
Design debt: Design debt arises when the software’s design is suboptimal, leading to challenges in implementing new features or modifying existing functionality. This can occur due to a lack of upfront design, changes in requirements, or a failure to adapt the design as the software evolves. Design debt can manifest as tightly coupled components, hard coded dependencies, or a lack of modularity.
Testing Debt: Insufficient or inadequate testing can lead to testing debt. This can include a lack of automated tests, outdated tests that no longer reflect the current state of the software, or simply a culture of neglecting testing in favor of rapid development. Testing debt increases the risk of bugs and regressions, making it harder to ensure the software’s reliability.
Documentation Debt: Outdated, incomplete, or inaccurate documentation constitutes documentation debt. This can make it difficult for new developers to understand the codebase, for existing developers to remember how things work, or for stakeholders to understand the software’s capabilities and limitations. Documentation debt can lead to misunderstandings, errors, and delays in development.

Architectural technical debt

At vFunction, we also focus on an additional aspect of technical debt that can accumulate: architectural technical debt. While sharing similarities with the types mentioned above, architectural technical debt is more deeply ingrained in the software’s structure. It refers to compromises made in the overall architecture or design of the system for short-term gains, such as meeting a tight deadline or delivering a specific feature. These compromises may involve:

Architectural drift: Deviation from the intended architecture over time due to ad-hoc changes or lack of governance.
Intentional violations: Deliberately violating best practices or established architectural principles due to time constraints or other pressures.
Unstable shortcuts: Using temporary or unreliable solutions that provide a quick fix but may not be sustainable in the long run.

While incurring architectural technical debt can sometimes be a strategic decision, it’s essential to know the associated costs and drawbacks, as well as keeping a metric in place to measure and monitor changes. Over time, architectural debt can lead to increased complexity, reduced flexibility, and even system instability.

Understanding the different types of technical debt and their potential impact is a critical step in managing this hidden cost of software development. In the next section, we’ll explore how to measure technical debt so you can understand its extent and make informed decisions about how to address it.

How to measure technical debt

As we’ve seen, companies need to understand how to manage technical debt. Yet, according to an article in Forbes, technical debt is difficult to measure. The article quotes Sven Blumberg and Björn Münstermann of McKinsey, saying, “Technical debt is like dark matter: you know it exists, you can infer its impact, but you can’t see or measure it.” Blumberg and Münstermann list some informal indicators of technical debt, such as product delays, out-of-control costs, and low developer morale. But are there any formal methods available to quantify the amount of technical debt that characterizes a particular application or an entire application portfolio?

Some have proposed using metrics such as cyclomatic complexity (the number of possible execution paths in the code) and cognitive complexity (a measure of a human’s difficulty in understanding the code and all its possible execution paths). Code quality metrics can also be used to calculate the remediation cost, which helps determine the technical debt ratio (TDR). TDR measures the overall future cost of technical debt in terms of time or resources, providing a clearer understanding of the effort needed for quality improvement. The problem with such indicators is the difficulty of measuring them in a large monolithic codebase with millions of lines of code.

survey most damaging types of tech debt ranked — *Architecture technical debt consistently appears as the most damaging and far reaching type of technical debt in surveys, analyst reports, and academic studies.*

Why knowing how to measure technical debt is crucial

Many companies today depend on traditional monolithic applications for business-critical processing. Due to their age and development over time, such apps typically have substantial technical debt that limits their ability to integrate and take advantage of today’s cloud-based technological ecosystem.

The solution is modernizing those legacy apps to give them essentially cloud-native capabilities. And that means dealing with the technical debt that’s holding them back. But, as management guru Peter Drucker famously said, “You can’t improve what you don’t measure.” Measuring the technical debt of legacy apps is critical to bringing them into the modern technological age. Tracking technical debt over time is crucial for continuous improvement and ensuring long-term code quality. One recent application modernization white paper explains it this way: “For any application modernization strategy to be successful, organizations need to first understand the complexity, risk, and technical debt of their current application estate. From there, they can prioritize and make the appropriate substantial investments into technology, resources, and the time it takes to implement the strategy.”

However, technical debt is notoriously difficult to identify and measure. Luckily, there are tools that can help to detect and monitor technical debt so that teams can stay on top of it. Next, let’s look at a few of the most popular tools for identifying technical debt and helping teams stay on top of it.

“Poor management of tech debt hamstrings companies’ ability to compete.”
McKinsey & Company

Five best tools for measuring technical debt

The first step to tackling technical debt is understanding its extent and nature within your codebase, infrastructure, and overall architecture. Various tools can automate this process, providing valuable insights and metrics to guide your debt management strategy. Additionally, emphasizing the importance of code ownership can significantly aid in managing technical debt by ensuring clear ownership and contributions from fewer developers, thus minimizing unreliable code and tech debt.

According to Gartner, technical debt analysis tools can fall into a few categories. Below is a diagram that explains the types of tools, their capabilities, and where they reside within the SDLC.

Based on these categories, here are five of the best tools available for measuring different types of technical debt:

vFunction

This AI-powered platform tackles architectural technical debt in large, complex legacy systems and modern, cloud-based microservices. vFunction statically and dynamically analyzes applications, identifying hidden dependencies, outdated architectures, and potential risks. It then provides actionable insights and recommendations for refactoring, modernizing, and systematically reducing technical debt.

CAST Software (Cast Imaging)

Cast Imaging takes a comprehensive approach to technical debt assessment, analyzing code quality, architecture, and security vulnerabilities. It provides a detailed view of the technical debt landscape, including metrics for code complexity, design violations, and potential risks. This holistic approach helps teams prioritize their remediation efforts based on the most critical areas of debt.

SonarQube

This popular open-source platform is a versatile code quality and security analysis tool. While not explicitly focused on technical debt, SonarQube provides valuable insights into code smells, bugs, vulnerabilities, and code duplication, often indicative of technical debt. By regularly using SonarQube, teams can proactively identify and address code-level issues contributing to technical debt.

Snyk (Snyk Code)

While primarily known for its security focus, Snyk Code also offers features for analyzing code quality and maintainability. It can identify issues like code complexity, potential bugs, and security vulnerabilities, often intertwined with technical debt. By addressing these issues, teams can improve code quality and reduce the overall technical debt burden.

CodeScene

This unique tool goes beyond static code analysis by analyzing the evolution of your codebase over time. It identifies hotspots—areas of the code that are frequently changed and prone to accumulating technical debt. It also analyzes social aspects of code development, such as team dynamics and knowledge distribution, to identify potential bottlenecks and risks. This behavioral code analysis provides valuable insights into the root causes of technical debt, helping teams address it more effectively.

By leveraging these tools, you can comprehensively understand your technical debt landscape. This knowledge empowers you to make informed decisions about which areas of debt to prioritize and how to allocate resources for remediation. Many of these tools can be embedded directly into your development pipelines with automated scans and monitoring to keep your teams informed as a project evolves. As we have discussed, monitoring and managing technical debt is crucial; these tools can help keep that at the forefront of our minds. Let’s review some tips on monitoring and managing technical debt.

Monitoring and managing technical debt

A significant portion of technical debt in legacy applications stems from their monolithic architecture and reliance on outdated technologies. These applications often have a complex codebase with hidden dependencies, making it challenging to assess and address technical debt effectively.

In 2012, a team of researchers proposed a groundbreaking approach for measuring architectural technical debt based on dependencies between architectural elements in the code. The paper “In Search of a Metric for Managing Architectural Technical Debt,“ recognized as the “Most Influential Paper” at the 19th IEEE International Conference on Software Architecture (ICSA 2022), has paved the way for using machine learning to quantify technical debt.

This modern approach leverages machine learning (ML) to analyze the dependency graph between classes within an application. The dependency graph, a directed graph representing dependencies between entities in a system, provides valuable insights into the complexity and risk associated with the application’s architecture.

By applying ML algorithms, we can extract three key metrics that represent the level of technical debt in the application:

Complexity: This metric reflects the effort required to add new features to the application. A higher complexity score indicates a greater likelihood of encountering challenges and potential issues during development.
Risk: This metric relates to the probability that adding new features may disrupt the operation of existing functionalities. A higher risk score suggests a greater vulnerability to bugs, regressions, and unintended consequences.
Overall debt: This metric quantifies the additional work required when adding new features. It provides an overall assessment of the technical debt burden associated with the application.

By training ML models on manually analyzed data incorporating expert knowledge, we can accurately assess the technical debt level in applications even without prior knowledge. This enables organizations to comprehensively understand technical debt across their legacy software portfolio. With this information, IT leaders can make data-driven decisions about which applications to prioritize for modernization and how to allocate resources for technical debt reduction.

In addition to this ML-based approach, continuous monitoring of critical metrics, such as code complexity, code churn, and test coverage, can help identify potential hotspots where technical debt accumulates. By proactively addressing these issues, organizations can prevent technical debt from spiraling out of control and ensure their legacy applications’ long-term health and maintainability. Tracking the technical debt ratio (TDR) is also crucial, as it measures the amount spent on fixing software compared to developing it. A minimal TDR of less than five percent is ideal and can help demonstrate to executives the value of proactively addressing technical debt.

Conclusion

For machine learning to be a practical solution to measuring technical debt, it must be embodied in an intelligent, AI-driven, automated analysis tool that delivers comprehensive technical debt metrics and allows users to build a data-driven business case for modernizing a company’s suite of legacy apps. These metrics should identify the level of technical debt, complexity, and risk for each app and the legacy app portfolio.vFunction architectural observability platform is a purpose-built platform that embodies those principles. It uses AI and machine learning to provide accurate measures of technical debt and can also help automate the refactoring of legacy apps to eliminate technical debt. To see vFunction’s answer to how to measure technical debt in action, schedule a demo today.

From tangled to streamlined: New vFunction features for managing distributed applications

Posted on May 13, 2024 by Lee Altman - Featured, Uncategorized

Many teams turn to microservice architectures hoping to leave behind the complexity of monolithic applications. However, they soon realize that the complexity hasn’t disappeared — it has simply shifted to the network layer in the form of service dependencies, API interactions, and data flows between microservices. Managing and maintaining these intricate distributed systems can feel like swimming against a strong current — you might be making progress, but it’s a constant struggle and you are left tired. However, the new distributed applications capability in vFunction provides a life raft, offering much-needed visibility and control over your distributed architecture.

In this post, we’ll dive into how vFunction can automatically visualize the services comprising your distributed applications and highlight important architectural characteristics like redundancies, cyclic dependencies, and API policy violations. We’ll also look at the new conversational assistant powered by advanced AI that acts as an ever-present guide as you navigate vFunction and your applications.

Illuminating your distributed architecture

At the heart of vFunction’s new distributed applications capability is the Service Map – an intuitive visualization of all the services within a distributed application and their interactions. Each node represents a service, with details like name, type, tech stack, and hosting environment. The connections between nodes illustrate dependencies like API calls and shared resources.

OpenTelemetry

This architectural diagram is automatically constructed by vFunction during a learning period, where it observes traffic flowing through your distributed system. For applications instrumented with OpenTelemetry, vFunction can ingest the telemetry data directly, supporting a wide range of languages including Java, .NET, Node.js, Python, Go, and more. This OpenTelemetry integration expands vFunction’s ability to monitor distributed applications across numerous modern language stacks beyond traditional APM environments.

Unlike traditional APM tools that simply display service maps based on aggregated traces, vFunction applies intelligent analysis to pinpoint potential architectural issues and surface them as visual cues on the Service Map. This guidance goes beyond just displaying nodes and arrows on the screen. It applies intelligent analysis to identify potential areas of concern, such as:

Redundant or overlapping services, like multiple payment processors, that could be consolidated.
Circular dependencies or multi-hop chains, where a chain of calls increases complexity.
Tightly coupled components like separate services using the same database, making changes difficult
Services that don’t adhere to API policies like accessing production data from test environments

These potential issues are flagged as visual cues on the Service Map and listed as actionable to-do’s (TODOs) that architects can prioritize and assign. You can filter the map to drill into specific areas, adjust layouts, and plan how services should be merged or split through an intuitive interface.

Your AI virtual architect

vFunction now includes an AI-powered assistant to guide you through managing your architecture every step of the way. Powered by advanced language models customized for the vFunction domain, the vFunction Assistant can understand and respond to natural language queries about your applications while incorporating real-time context.

Need to understand why certain domains are depicted a certain way on the map? Ask the assistant. Wondering about the implications of exclusivity on a class? The assistant can explain the reasoning and suggest the next steps. You can think of it as an ever-present co-architect sitting side-by-side with you.

You can query the assistant about any part of the vFunction interface and your monitored applications. Describing the intent behind a change in natural language, the assistant can point you in the right direction. No more getting lost in mountains of data and navigating between disparate views — the assistant acts as a tailored guide adapted to your specific needs.

Of course, the assistant has safeguards in place. It only operates on the context and data already accessible to you within vFunction, respecting all existing privacy, security and access controls. The conversations are ephemeral, and you can freely send feedback to improve the assistant’s responses over time.

An elegant architectural management solution

Together, the distributed applications visualization and conversational assistant provide architects and engineering teams with an elegant way to manage the complexity of different applications. The Service Map gives you a comprehensive, yet intuitive picture of your distributed application at a glance, automatically surfacing areas that need attention. The assistant seamlessly augments this visualization, understanding your architectural intent and providing relevant advice in real-time.

These new capabilities build on vFunction’s existing architectural analysis strengths, creating a unified solution for designing, implementing, observing, and evolving software architectures over time. By illuminating and streamlining the management of distributed architectures, vFunction empowers architects to embrace modern practices without being overwhelmed by their complexity.

Want to see vFunction in action? Request a demo today to learn how our architectural observability platform can keep your applications resilient and scalable, whatever their architecture.

Recent Posts

Recent Comments