Keeping microservices documentation up to date is a challenge every team knows too well. With competing priorities—building, deploying, and keeping up with the pace of innovation—documentation often falls by the wayside or gets done in sporadic bursts. Clear, current architecture diagrams are critical for productivity, yet creating and maintaining them often feels like an uphill battle—too time consuming and always a step behind your system’s reality.
To address these challenges, vFunction builds on its recently released architecture governance capabilities with new functionality that simplifies documenting, visualizing, and managing microservices throughout their lifecycle.
Real-time documentation for modern development
Seamless integration with existing tools and workflows, including CI/CD pipelines and documentation platforms, is essential for effective application development and management. New functionalities in vFunction’s architectural observability platform enable you to integrate and export vFunction’s architectural insights into your current workflows for greater efficiency and alignment.
Sequence diagrams: From static images to dynamic architecture-as-code diagrams
One of the standout features of our latest release is the ability to generate sequence diagrams based on runtime production data. We now track multiple alternative paths, loops, and repeated calls in a single flow—simplifying and speeding the detection and resolution of hard-to-identify bugs. Behind the scenes, we use Mermaid, a JavaScript-based diagramming and charting tool selected for its simplicity and compatibility with other programs. Mermaid uses Markdown-inspired text definitions to render and modify complex diagrams. The latest release also provides the option for the user to export these diagrams as code, specifically as Mermaid script.
Mermaid-based sequence diagram in vFunction.Export architecture-as-code written in Mermaid syntax from vFunction.Bring the architecture-as-code into your documentation tool of choice to share live sequence flows captured by vFunction.
Exporting flows in Mermaid script retain all the architectural details needed for teams to embed, modify, and visualize diagrams directly within tools like Confluence or other documentation platforms. This support ensures teams can reflect live architecture effortlessly, maintaining dynamic, up-to-date documentation that evolves alongside their systems.
Quickly identify flows with errors
vFunction simplifies troubleshooting complex microservices by using sequence diagrams to identify flows with errors. These flows are marked in green (ok) or red (error), making it easy for developers to identify issues quickly. To see the number of calls and errors, developers can toggle between flows and dive into the sequence that caused the issue. Flows with an error rate above 10% are highlighted, making it easy to sort and prioritize areas that need attention. This streamlines the debugging process, helping teams quickly identify and fix issues, improving overall application reliability.
Support for the C4 model and workflows
The C4 model is a widely used framework that improves collaboration by illustrating the complex structures and interactions between context, container, component, and code (C4) diagrams within software systems. Now teams can export and import C4 container diagrams with vFunction to help detect architectural drift and support compliance, enhancing how engineers conceptualize, communicate, and manage distributed architectures throughout their lifecycle.
This diagram illustrates our Order Management System (OMS) demo application, exported as a C4 diagram and visualized as a graph with PlantUML. Support of C4 enables seamless export of live architecture into widely used formats for broader insights and collaboration.
Using “architecture as code,” vFunction aligns live application architecture with existing diagrams, acting as a real-time system of record. This ensures consistency, detects drift, and keeps architecture and workflows in sync as systems evolve. By moving beyond drift measurements within vFunction and enabling teams to compare real-time architectural flows against C4 reference diagrams, this capability ensures that teams can identify where drift has occurred and have the context to understand its impact and prioritize resolution.
Grouping services for better microservices management
If you have many microservices, you can now group services and containers using attributes like names, tags, and annotations, including those imported from C4 diagrams. This feature enhances governance by allowing you to apply architecture rules that enforce standards, prevent drift, and maintain consistency across your microservices. It helps teams organize, manage, and monitor their microservices architecture more effectively, ensuring alignment with best practices and reducing complexity.
Distributed application dashboard
The new dashboard for distributed applications provides a centralized and actionable overview of architectural complexity and technical debt across your distributed portfolio of apps, empowering teams to make informed, data-driven decisions to maintain system health.
vFunction’s new portfolio view tracks architectural complexity and technical debt across distributed applications, providing a clear overview of application health and actionable insights in the form of related TODOs.
The related technical debt report helps teams track changes in technical debt across distributed applications, providing valuable insights to prioritize remediation efforts and enhance architectural integrity. For example, Service X may have a higher technical debt of 8.3 due to circular dependencies and multi-hop flows, while Service Y scores 4.2, indicating fewer inefficiencies. Teams can focus remediation efforts on Service X, prioritizing areas where architectural debt has the most significant impact.
The tech debt score delivers actionable clarity, enabling teams to understand newly added debt and streamline the prioritization of technical debt reduction.
The accompanying technical debt score offers a clear, quantified metric based on all open tasks (TODOs), including inefficiencies such as circular dependencies, multi-hop flows, loops, and repeated service calls. Developers use this score to focus on resolving issues like multi-hop flows. For example resolving redundant calls in a service can bring the debt down from 7.5 to 4.5. This score delivers actionable clarity, enabling teams to understand newly added debt and streamline the prioritization of technical debt reduction.
Additionally, CI/CD integration for debt management takes this functionality a step further. Triggered learning scripts allow teams to incorporate technical debt insights directly into their CI/CD pipelines. By comparing the latest system measurements with baseline data, teams can approve or deny pull requests based on technical debt changes, ensuring alignment with architectural goals and mitigating risks before they escalate.
Sync architectural tasks with Jira
vFunction brings architecture observability and workflow management closer together by syncing TODO tasks directly with Jira. Engineering leaders can integrate architectural updates into sprints and bake related tasks into existing workflows.
TODOs identified in vFunction can now be opened as tickets in Jira in order to incorporate architecture modifications into the development lifecycle.
Architecture audit log
The new logs section tracks key architectural decisions made relative to “to-do’s.” This log captures impactful actions such as adding or removing TODOs, uploading references, changing baselines, and setting the latest measurements, while excluding routine auto-generated or completed TODOs. Each entry includes details about the affected entities and the context for the action, along with the user responsible for the decision and the timestamp of when the decision was made. This enhancement highlights how tracking architectural decisions ensures alignment with organizational standards, aids in meeting compliance requirements and reduces the risk of miscommunication or errors by providing a clear historical record of changes.
Why it matters: Efficiency, alignment, and agility
With these new capabilities, vFunction empowers teams to reclaim valuable engineering time, improve productivity, and maintain alignment. By bridging the gap between architecture and workflow, this release makes it easier than ever to document, update, and share architectural insights. Teams can focus on building resilient, scalable systems without the overhead of disconnected tools and outdated diagrams.
In the fast-paced world of microservices, maintaining clear, actionable architecture diagrams is no longer a luxury—it’s a necessity. vFunction equips your team to stay agile, aligned, and ahead of complexity.
Ready to transform your management of microservices?
As applications grow more complex to meet increasing demands for functionality and performance, understanding how they operate is critical. Modern architectures—built on microservices, APIs, and cloud-native principles—often form a tangled web of interactions, making pinpointing bottlenecks and resolving latency issues challenging. OpenTelemetry tracing provides greater visibility for developers and DevOps teams to understand the performance of their services and quickly diagnose issues and bottlenecks. This data can also be used for other critical applications, which we’ll incorporate into our discussion.
This post will cover all the fundamental aspects of OpenTelemetry tracing, including best practices for implementation and hands-on examples to get you started. Whether you’re new to distributed tracing or looking to improve your existing approach, this guide will give you the knowledge and techniques to monitor and troubleshoot your applications with OpenTelemetry. Let’s get started!
What is OpenTelemetry tracing?
OpenTelemetry, from the Cloud Native Computing Foundation (CNCF), has become the standard for instrumenting cloud-native applications. It provides a vendor-neutral framework and tools to generate, collect, and export telemetry consisting of traces, metrics, and logs. This data gives you visibility into your application’s performance and behavior.
79% of organizations who use observability tools use or are considering using OpenTelemetry. Get the latest in observability from the DZone Trend Report.
Key concepts of OpenTelemetry
At its core, OpenTelemetry revolves around a few fundamental concepts, which include:
Signals: OpenTelemetry deals with three types of signals: traces, metrics, and logs. Each gives you a different view of your application’s behavior.
Traces: The end-to-end journey of a single request through the many services within your system.
Spans: These are individual operations within a trace. For example a single trace might have spans for authentication, database access and external API calls.
Context propagation: The mechanism for linking spans across services to give you a single view of the request’s path.
Within this framework, OpenTelemetry tracing focuses on understanding the journey of requests as they flow through a distributed system. Conceptually, this is like creating a map of each request path, encapsulating every single step, service interaction, and potential bottleneck along the way.
Why distributed tracing matters
Debugging performance issues in legacy, monolithic applications was relatively easy compared to today’s microservice applications. You could often find the bottleneck within these applications by looking at a single codebase and analyzing a single process; but with the rise of microservices, where a single user request can hit multiple services, finding the source of latency or errors is much more challenging.
For these more complex systems, distributed tracing cuts through this noise. This type of tracing can be used for:
Finding performance bottlenecks: Trace where requests slow down in your system.
Error detection: Quickly find the root cause of errors by following the request path.
Discovering service dependencies: Understand how your services interact and where to improve.
Capacity planning: Get visibility into resource usage and plan for future scaling.
Combining these basic building blocks builds the core functionality and insights from OpenTelemetry. Next, let’s examine how all of these components work together.
How OpenTelemetry tracing works
OpenTelemetry tracing captures the flow of a request through your application by combining instrumentation, context propagation, and data export. Here’s a breakdown:
Spans and traces
A user clicks a button on your website. This triggers a request that hits multiple services: authentication, database lookup, payment processing, etc. OpenTelemetry breaks this down into spans, or units of work in distributed tracing, representing a single operation or a piece of a process within a larger, distributed system. Spans help you understand how requests flow through a system by capturing critical performance and context information. In the example below, we can see how this works, including a parent span (the user action) and a child span corresponding to each sub-operation (user authentication) originating from the initial action.
Spans help you understand how requests flow through a system by capturing critical performance and context information. Credit: Hackage.haskell.org
Within a span, there are span attributes. These attributes include the following:
Operation name (e.g. “database query”, “API call”)
Start and end timestamps
Status (success or failure)
Attributes (additional context like user ID, product ID)
One or multiple spans are then linked together to form a trace, giving you a complete end-to-end view of the request’s path. The trace shows you how long each operation took, where the latency was, and the overall flow of execution.
Context propagation
So, how does an OpenTelemetry link a span across services? This is where context propagation comes in. Conceptually, we can relate this to a relay race. Each service is handed a “baton” with trace information when it receives a request. This baton, metadata contained in headers, allows the service to create spans linked to the overall trace. As the request moves from one service to the next, the context is propagated, correlating all the spans.
To implement this, OpenTelemetry uses the W3C Trace Context standard for context propagation. This standard allows the trace context to be used across different platforms and protocols. By combining spans and traces with context propagation, OpenTelemetry gives users a holistic and platform-agnostic way to see the complex interactions within a distributed system.
Getting started with OpenTelemetry tracing
With the core concepts covered, let’s first look at auto-instrumentation. Auto-instrumentation refers to the ability to add OpenTelemetry tracing to your service without having to modify your code or with very minimal changes. While it’s also possible to implement OpenTelemetry by leveraging the OpenTelemetry SDK and adding tracing to your code directly (with some added benefits), the easiest way to get started is to leverage OpenTelemetry’s auto-instrumentation for a “zero-code” implementation.
What is auto-instrumentation?
For those who don’t require deep customization or want to get OpenTelemetry up and running quickly, auto-instrumentation should be considered. Auto-instrumentation can be implemented in a number of ways, including through the use of agents that can automatically instrument your application without code changes, saving time and effort. The way it’s implemented depends on your specific development language / platform.
The benefits of running auto-instrumentation include:
Quick setup: Start tracing with minimal configuration.
Comprehensive coverage: Automatically instrument common libraries and frameworks.
Reduced maintenance: Less manual code to write and maintain.
To show how easy it is to configure, let’s take a brief look at how you would implement auto-instrumentation in Java.
Once the agent is downloaded, you will need to identify the parameters needed to pass to the agent so that the application’s OpenTelemetry data can be exported properly. The most basic of these would be:
Export endpoint – The server endpoint where all the telemetry data will be sent for analysis.
Protocol – The protocol to be used for exporting the telemetry data. OpenTelemetry supports several different protocols, but we’ll be selecting http/protobuf for this example
Exporting will send your telemetry data to an external service for analysis and visualization. Some of the popular platforms include Jaeger and Zipkin and managed services such as Honeycomb or Lightstep.
While these platforms are great for visualizing traces and finding performance bottlenecks, vFunction complements this by using this same tracing data to give you a deeper understanding of your application architecture. vFunction will automatically analyze your application traces and generate a real-time architecture map showing service interactions and dependencies as they relate to the application’s architecture, not its performance. This can help developers and architects identify potential architectural issues that might cause performance problems that are identified within the other OpenTelemetry tools being used.
vFunction analyzes applications to clearly identify unnecessary dependencies between services, reducing complexity and technical debt that can degrade performance.
Once you have the settings needed for exporting, you can run your application with the agent. That command will look like this:
In the command, you’ll need to replace path/to/ with the actual path to the agent JAR file and myapp.jar with your application’s JAR file. Additionally, the endpoint would need to be changed to an actual endpoint capable of ingesting telemetry data. For more details, see the Getting Started section in the instrumentation page’s readme.
As you can see, this is a simple way to add OpenTelemetry to your individual services without modification of the code. If you choose this option, ensure that you understand and have confirmed the compatibility between the auto-instrumentation agent and your application and that the agent supports your application’s libraries. Another consideration revolves around customization. Sometimes auto-instrumentation does not support a library you’re using or is missing data that you require. If you need this kind of customization, then you should consider updating your service to use the OpenTelemetry SDK directly.
Updating your service
Let’s look at what it takes to manually implement OpenTelemetry tracing in a simple Java example. Remember, the principles apply across any of the different languages within which you could implement OpenTelemetry.
Setting up a tracer
First, you’ll need to add the OpenTelemetry libraries to your project. For Java, you can include the following dependencies in your pom.xml (Maven) or build.gradle (Gradle) file:
public static void main(String[] args) { // Configure the OTLP exporter to send data to your backend OtlpGrpcSpanExporter otlpExporter = OtlpGrpcSpanExporter.builder().build();
// Create a tracer Tracer tracer = openTelemetrySdk.getTracer("my-instrumentation-library");
// ... your application code ... } }
This code snippet initializes the OpenTelemetry SDK with an OTLP exporter. Just like with auto-instrumentation, this data will be exported to an external system for analysis. The main difference is that this is configured here in code, rather than through command-line parameters.
Instrumentation basics
With the tracer set up, it’s time to get some tracing injected into the code. For this, we’ll create a simple function that simulates a database query, as follows:
This code does a few things that are applicable to our OpenTelemetry configuration. First, we use tracer.spanBuilder(“database query”).startSpan() to create a new span named “database query.” Then, we use span.makeCurrent() to ensure that this span is active within the try block.
Within the try block, we use Thread.sleep() to simulate a database command. Then we add a span.setAttribute(“db.statement”, query) to record the query string and set the span.setStatus to OK if successful. If the operation causes an error, you’ll see in the catch block that we call span.setStatus again, passing it an error to be recorded. Finally, span.end() completes the span.
In our example above, this basic instrumentation captures the execution time of the database query and provides context within the query string and status. You can use this pattern to manually instrument various operations in your application, such as HTTP requests, message queue interactions, and function calls.
Leveraging vFunction for architectural insights
Combining the detailed traces from OpenTelemetry implementation like the one above with vFunction’s architectural analysis gives you a complete view of your application’s performance and architecture. For example, if you find a slow database query through OpenTelemetry, vFunction can help you understand the service dependencies around that database and potentially reveal architectural bottlenecks causing the latency.
To integrate OpenTelemetry tracing data with vFunction:
1. Configure the OpenTelemetry collector:
NOTE: If you’re not using a collector, you can skip this step and send the data directly to vFunction
– Configure the OpenTelemetry Collector to export traces to vFunction.
2. Configure your service to include the required vFunction trace headers:
Each service that should be considered for the analysis needs to send its trace data to either vFunction or the collector. As part of this, a trace header must be added to help vFunction know with which distributed application this service is associated. After creating the distributed application in the vFunction server UI, it provides instructions on how to export telemetry data to the collector or vFunction directly. One way this can be done is via the Java command line:
Other examples are provided in the server UI. The <app header UUID> above refers to a unique ID that’s provided in the vFunction server UI to associate a service with the application. The ID can be easily found by clicking on the “Installation Instructions” in the server UI and following the instructions.
3. Verify the integration:
– Generate some traces in your application.
– Check the vFunction server UI, and you should see the number of agents and tags increase as more services begin to send telemetry information. Click on “START” at the bottom of the interface to begin the analysis of the telemetry data. As data is received, vFunction’s UI will begin to visualize and analyze the incoming trace data.
By following these steps and integrating vFunction with your observability toolkit, you can effectively instrument your application and gain deeper insights into its performance and architecture.
Best practices for OpenTelemetry tracing
Implementing OpenTelemetry tracing effectively involves more than just adding instrumentation. As with most technologies, there are good, better, and best ways to implement OpenTelemetry. To get the full value from your tracing data, follow these best practices Consider the following key points when implementing and utilizing OpenTelemetry in your applications:
Semantic conventions
Adhere to OpenTelemetry’s semantic conventions for naming spans, attributes, and events. Consistent naming ensures interoperability and makes it easier to analyze traces.
For example, if you are creating a span for an HTTP call, add all of the relevant details to the span attributes and attribute the key-value pairs of information to the span. This might look something like this:
The documentation on the OpenTelemetry website provides a detailed overview of the recommended semantic conventions, which are certainly worth exploring.
Efficient context propagation
Use efficient context propagation mechanisms to link spans across services with minimal overhead. OpenTelemetry supports various propagators, such as W3C TraceContext, the default propagator specification used with OpenTelemetry.
To configure the W3C TraceContext propagator in Java, do the following:
If you want to bridge outside of the default value, the OpenTelemetry docs on context propagation have extensive information to review, including the Propogators API.
Tail-based sampling
When it comes to tracing, be aware of different sampling strategies that can help to manage data volume while retaining valuable traces. One method to consider is tail-based sampling, which makes sampling decisions after a trace is completed, allowing you to keep traces based on specific characteristics like errors or high latency.
To implement tail-based sampling, you can configure it in the OpenTelemetry Collector or directly in the backend. More information on the exact configuration can be found within the OpenTelemetry docs on tail-based sampling.
Adhering to these best practices and incorporating auto-instrumentation as appropriate can enhance the efficiency and effectiveness of your OpenTelemetry tracing, yielding valuable insights into your application’s performance.
Troubleshooting and optimization
Although OpenTelemetry provides extensive data, effectively troubleshooting and optimizing your application requires understanding how to leverage this information. Here are strategies for using traces to identify and resolve issues:
Recording errors and events
When an error occurs, you need to capture relevant context. OpenTelemetry allows you to record exceptions and events within your spans, providing more information for debugging.
For example, in Java you can add tracing so that error conditions, such as those caught in a try-catch statement, can be captured correctly. For example, in your code, it may look something like this:
try { // ... operation that might throw an exception ... } catch (Exception e) { span.setStatus(StatusCode.ERROR, e.getMessage()); span.recordException(e); }
This code snippet sets the span status to ERROR, records the exception message, and attaches the entire exception object to the span. Thus, you can see not only that an error occurred but also the specific details of the exception, which can be extremely helpful in debugging and troubleshooting. You can also use events to log important events within a span, such as the start of a specific process, a state change, or a significant decision point within a branch of logic.
Performance monitoring with traces
Traces are also invaluable for identifying performance bottlenecks. By examining the duration of spans and the flow of requests, you can pinpoint slow operations or services causing performance issues and latency within the application. Most tracing backends that work with OpenTelemetry already provide tools for visualizing traces, filtering by various criteria (e.g., service, duration, status), and analyzing performance metrics.
vFunction uses OpenTelemetry tracing to reveal the complexity behind a single user request, identifying overly complex flows and potential bottlenecks, such as the OMS service (highlighted in red), where all requests are routed through a single service.
vFunction goes beyond performance analysis, by correlating trace data with architectural insights. For example, if you identify a slow service through OpenTelemetry, vFunction can help you understand its dependencies, resource consumption, and potential architectural bottlenecks contributing to the latency, providing deep architectural insights that traditional performance-based observability tools don’t reveal.
Pinpoint and resolve issues faster
By combining the detailed information from traces with vFunction’s architectural analysis, you can reveal hidden dependencies, overly complex flows, and architectural anti-patterns that impact the resiliency and scalability of your application. Pulling tracing data into vFunction to support deeper architectural observability, empowers you to:
Isolate the root cause of errors: Follow the request path to identify the service or operation that triggered the error.
Identify performance bottlenecks: Pinpoint slow operations or services that are causing delays.
Understand service dependencies: Visualize how services interact and identify potential areas for optimization.
Verify fixes: After implementing a fix, use traces to confirm that the issue is resolved and performance has improved.
OpenTelemetry tracing, combined with the analytical capabilities of platforms like vFunction, empowers you to troubleshoot issues faster and optimize your application’s performance more effectively.
Next steps with vFunction
OpenTelemetry tracing provides a powerful mechanism for understanding the performance and behavior of your distributed applications. By instrumenting your code, capturing spans and traces, and effectively analyzing the data, you can identify bottlenecks, troubleshoot errors, and optimize your services.
Discover how vFunction can transform your application development.
We’re excited to have Nenad Crncec, founder of Architech, writing this week’s blog post. With extensive experience in addressing architectural challenges, Nenad shares valuable insights and highlights how vFunction plays a pivotal role in overcoming common stumbling blocks. Take it away, Nenad!
In my journey through various modernization projects, one theme that consistently emerges is the challenge of managing complexity—whether in microservices and distributed systems or monolithic applications. Complexity can be a significant barrier to innovation, agility, and scalability, impacting an organization’s ability to respond to changing market demands.
Complexity can also come in many forms: Complex interoperability, complex technology implementation (and maintenance), complex processes, etc.…
“Complex” is something we can’t clearly understand – it is unpredictable and unmanageable because of the multifaceted nature and the interaction between components.
Imagine trying to assemble flat-pack furniture without instructions, in the dark, while wearing mittens. That’s complexity for you.
What is complexity in software architecture?
Complexity, in the context of software architecture and system design, refers to the degree of intricacy and interdependence within a system’s components and processes. It encompasses how difficult it is to understand, modify, and maintain the system. Complexity arises from various factors, including the number of elements in the system, the nature of their interactions, the technologies used, and the clarity of the system’s structure and documentation.
Complexity also arises from two additional factors, even more impactful – people and time – but that is for another article.
Complexity creates all sorts of challenges across different types of architectures.
The double-edged sword of microservices
I recently assisted a company in transitioning from a monolithic architecture to microservices. The promise of microservices—greater flexibility, scalability, and independent deployability—was enticing. Breaking down the application into smaller, autonomous services allowed different teams to work concurrently, accelerating development.
Allegedly.
While this shift offered many benefits, it also led to challenges such as:
Operational overhead: Managing numerous services required advanced orchestration and monitoring tools. The team had to invest in infrastructure and develop new skill sets to handle containerization, service discovery, and distributed tracing. Devops, SRE’s were spawned as part of agile transformation and a once complex environment…remained complex.
Complex inter-service communication: Ensuring reliable communication between services added layers of complexity. Network latency, message serialization, and fault tolerance became daily concerns. Add to that communication (or lack thereof) between teams, building those services that need to work together and you have a recipe for disaster. If not managed and governed properly.
Data consistency issues: Maintaining consistent data across distributed services became a significant concern. Without clear data governance, the simplest of tasks can become epic sagas of “finding and understanding data.”
And then there were the people—each team responsible for their own microservice, each with their own deadlines, priorities, and interpretations of “RESTful APIs.” Time pressures only added to the fun, as stakeholders expected the agility of microservices to translate into instant results.
Despite these challenges, the move to microservices was essential for the company’s growth. However, it was clear that without proper management, the complexity could outweigh the benefits.
The hidden complexities of monolithic applications
On the other hand, monolithic applications, often the backbone of legacy systems, tend to accumulate complexity over time. I recall working with an enterprise where the core application had evolved over years, integrating numerous features and fixes without a cohesive architectural strategy. The result was a massive codebase where components were tightly coupled, making it difficult to implement changes or updates without unintended consequences.
This complexity manifested in several ways:
Slower development cycles: Even minor changes required extensive testing across the entire application.
Inflexibility: The application couldn’t easily adapt to new business requirements or technologies.
High risk of errors: Tightly coupled components increased the likelihood of bugs when making modifications.
But beyond the code, there were people and time at play. Teams had changed over the years, with knowledge lost as developers, business analysts, sysadmins, software architects, engineers, and leaders, moved on. Institutional memory was fading, and documentation was, well, let’s say “aspirational.” Time had turned the once sleek application into a relic, and people—each with their unique coding styles and architectural philosophies—had added layers of complexity that no one fully understood anymore.
As people leave organizations, institutional memory fades and teams are left with apps no one understands.
Adding people and time to the complexity equation
It’s often said that technology would be simple if it weren’t for people and time. People bring creativity, innovation, and, occasionally, chaos. Time brings evolution, obsolescence, and the ever-looming deadlines that keep us all on our toes.
In both monolithic and microservices environments, people and time contribute significantly to complexity:
Knowledge silos: As teams change over time, critical knowledge can be lost. New team members may not have the historical context needed to make informed decisions, leading to the reinvention of wheels—and occasionally square ones.
Diverging priorities: Different stakeholders have different goals, and aligning them is like trying to synchronize watches in a room full of clocks that all think they’re the master timekeeper.
Technological drift: Over time, technologies evolve, and what was cutting-edge becomes legacy. Keeping systems up-to-date without disrupting operations adds another layer of complexity.
Cultural differences: Different teams may have varying coding standards, tools, and practices, turning integration into an archaeological expedition.
Addressing complexity with vFunction
Understanding the intricacies of both monolithic and microservices architectures led me to explore tools that could aid in managing and reducing complexity. One such tool is vFunction, an AI-driven architectural observability platform designed to facilitate the decomposition of monolithic applications into microservices and observe behaviour and architecture of distributed systems.
Optimizing microservices architectures
In microservice environments (distributed systems), vFunction plays an important role in deciphering complexity:
Identifying anti-patterns: The tool detects services that are overly chatty, indicating that they might be too granular or that boundaries were incorrectly drawn. Think of it as a polite way of saying, “Your services need to mind their own business a bit more.”
Performance enhancement: By visualizing service interactions, we could optimize communication paths and reduce latency. It’s like rerouting traffic to avoid the perpetual construction zone that is Main Street.
Streamlining dependencies: vFunction helps us clean up unnecessary dependencies, simplifying the architecture. Less is more, especially when “more” equals “more headaches.”
vFunction helps teams understand and structure their microservices, reducing unnecessary dependencies.
How vFunction helps with monolithic complexity
When dealing with complex monolithic systems, vFunction can:
Automate analysis: vFunction scans the entire system while running, identifying dependencies and clustering related functionalities. This automated analysis saved countless hours that would have been spent manually tracing code. It was like having a seasoned detective sort through years of code crimes.
Define service boundaries: The platform suggested logical partitions based on actual usage patterns, helping us determine where natural service boundaries existed. No more debates in meeting rooms resembling philosophical symposiums.
Prioritizing refactoring efforts: By highlighting the most critical areas for modernization, vFunction allows us to focus on components that would deliver the most significant impact first. It’s amazing how a clear priority list can turn “we’ll never finish” into “we’re making progress.”
Does your organisation manage architecture? How are things built, maintained, planned for the future? How does your organisation treat architecture? Is it part of the culture?
Bridging the people and time gap with vFunction
One of the unexpected benefits of using vFunction it its impact on the people side of the equation:
Knowledge transfer: The visualizations and analyses provided by the tool help bring new team members up to speed faster than you can say “RTFM.”
Unified understanding: With a common platform, teams have a shared reference point, reducing misunderstandings that usually start with “I thought you meant…”
Accelerated timelines: By adopting it in the modernization process, vFunction helps us meet tight deadlines without resorting to the classic solution of adding more coffee to the project.
Practical use case and lessons learned
Now that this is said and done, there are real-life lessons that you should take to heart (and brain…)
Does your organisation manage architecture? How are things built, maintained, planned for the future? How does your organisation treat architecture? Is it part of the culture?
Every tool is useless if it is not used.
In the project where we transitioned a large European bank to microservices, using vFunction (post-reengineering) provided teams with fine-tuned architecture insights (see video at the top of this blog). We analyzed both “monolithic” apps and “distributed’ apps with microservices. We identified multi-hop and cyclic calls between services, god classes, dead code, high complexity classes… and much more.
We used initial measurements and created target architecture based on it. vFunction showed us where complexity and coupling lies and how it impacts the architecture.
vFunction creates a comprehensive list of TODOs which are a guide to start tackling identified issues.
One major blocker is not treating architecture as a critical artifact in team ownership. Taking care of architecture “later” is like building a house, walls and everything, and deciding later what is the living room, where is the bathroom, and how many doors and windows we need after the fact. That kind of approach will not make a family happy or a home safe.
Personal reflections on using vFunction
“What stands out to me about vFunction is how it brings clarity to complex systems. It’s not just about breaking down applications but understanding them at a fundamental level. This comprehension is crucial for making informed decisions during modernization.”
In both monolithic and microservices environments, the vFunction’s architectural observability provided:
Visibility: A comprehensive view of the application’s structure and interdependencies.
Guidance: Actionable insights that informed our architectural strategies.
Efficiency: Streamlined processes that saved time and resources.
Conclusion: Never modernize again
Complexity in software architecture is inevitable, but it doesn’t have to be an insurmountable obstacle. Whether dealing with the entanglement of a monolith or the distributed nature of microservices, tools like vFunction offer valuable assistance.
By leveraging platforms such as vFunction, organizations can:
Reduce risk: Make changes with confidence, backed by data-driven insights.
Enhance agility: Respond more quickly to business needs and technological advancements.
Promote innovation: Free up resources to focus on new features and improvements rather than wrestling with complexity.
From my experiences, embracing solutions that tackle architectural complexity head-on is essential for successful modernization. And more than that, it is a tool that should help us never modernize again, by continually monitoring architectural debt and drift, helping us to always keep our systems modern and fresh. It’s about empowering teams to understand their systems deeply and make strategic decisions that drive growth.
Take control of your microservices, macroservices, or distributed monoliths with vFunction
Major alert: vFunction was just named a 2024 Gartner Cool Vendor in AI-Augmented Development and Testing. We’re incredibly grateful and proud of this recognition. We’re also excited about the opportunity to share our platform on a larger scale and bring architectural observability to more enterprises.
According to the “Gartner Cool Vendors™ in AI Augmented Development and Testing for Software Engineering” report, “As the codebase and architectural complexity grow, the processing power required to handle local builds escalates significantly. Many organizations struggle to equip their software engineers with the necessary tools to meet the increasing demand for faster delivery from idea to production, impacting overall productivity and efficiency.” For too long, organizations have long struggled to fully grasp the complexity of their application architectures as they evolve throughout the SDLC. Enterprises juggle all types of application architectures, modular monoliths, distributed monoliths, miniservices, microservices and more, having to make tradeoffs between agility and complexity. Traditional approaches—relying on manual code reviews, fragmented documentation, and institutional knowledge—have proven largely inadequate in identifying and addressing architectural risks and prioritizing necessary fixes at today’s speed of business. This has resulted in an architectural blind spot that has significantly impeded modernization efforts and led to mounting technical debt – particularly architectural technical debt – as well as unrealized revenue potential in the billions.
To remediate technical debt effectively, Gartner recommends that “organizations use architectural observability tools to thoroughly analyze software architecture, identify inconsistencies, and gain deeper insights.”
At vFunction, we see software architecture as a critical but often underutilized driver of business success. We believe being recognized as a Gartner Cool Vendor validates our innovative approach to empower engineering teams to innovate faster, address resiliency earlier, build smarter, and create scalable applications that change the trajectory of their business. With our AI-driven architectural observability platform, teams are equipped with valuable insights to find and fix unnecessary complexity and technical debt across large, complex applications and modern, highly distributed microservices health throughout the organization. Software teams use the platform to understand their applications, identify the sources of technical debt, and find refactoring opportunities to enhance scalability, resiliency, and engineering velocity.
Five reasons why we believe vFunction was recognized as a Cool Vendor.
Architectural observability plays a key role in managing the complexities of modern software development. Gartner states that, “By 2027, 80% of software engineering groups will monitor software architecture complexity and architecture technical debt in near real time, up from less than 10% today.” We feel we’re at the forefront of this trend, providing the tools necessary to meet this growing need.
By vigilantly monitoring architectural technical debt and drift across the entire application portfolio, our solution equips software engineering leaders and their teams with the insights necessary to make informed decisions. Here’s why we believe vFunction stands out:
AI-powered. vFunction’s architectural observability platform understands and visualizes application architecture to reduce technical debt and complexity.
Find and fix technical debt. vFunction uses extensive data to identify and remediate architectural technical debt across the entire application portfolio.
Shift left. Address the root causes of technical debt to prevent performance issues before they arise using vFunction’s patented methods of static and dynamic analysis.
Prioritize and alert. vFunction incorporates a prioritized task list into every sprint to fix key technical debt issues, based on your unique business goals.
Any architecture. The platform relies on OpenTelemetry to support a wide spectrum of programming languages in the distributed world, and Java and .NET for monolithic architectures, so you can use it for a variety of use cases, from monoliths to distributed microservices, refining microservices, and considering modular monoliths.
Organizations face immense pressure to deliver high-quality software rapidly, stay competitive, and pivot quickly in response to market demands. The rapid accumulation of technical debt exacerbates these challenges, hampering engineering velocity, limiting application scalability, and impacting resiliency. This often results in increased risks of outages, delayed projects, and missed opportunities.
Ready to put the freeze on software complexity and mounting technical debt? Let us partner with you to unlock the full potential of your software architecture. Contact us today to learn how vFunction can be an indispensable asset in transforming your software development practices.
Gartner, Inc. Cool Vendors in AI-Augmented Development and Testing for Software Engineering. Tigran Egiazarov, Philip Walsh, etl. 8 August 2024.
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved. Cool Vendors is a registered trademark of Gartner, Inc. and/or its affiliates and is used herein with permission. All rights reserved.
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Adam Safran, Senior Engineering Manager at Turo, took the stage with me at the 2024 Gartner Application Innovation & Business Solutions Summit to discuss how vFunction’s architectural observability platform supports Turo’s journey to 10x scale.
Turo, the world’s largest car-sharing marketplace, is on a mission to put the world’s 1.5 billion cars to better use. Since 2010, Turo has consistently grown to amass over 3.5 million active guests and over 6.5 billion miles driven. This growth is great, but it also introduces challenges.
Turo’s growth pushed the limits of its monolith
Turo’s growth presented new challenges for its twelve-year-old monolithic application and rapidly expanding engineering team.
To address these concerns, Turo CTO Avinash Gangadharan proposed a mandate to achieve 10x scale. He developed two new engineering domains: a platform domain focused on developer experience and reliability and a core services domain focused on scale.
Adam’s API services team acts as a bridge between the two domains. However, the proposal faced several challenges:
Justifying the initiative to leadership: When things are going well, it can be challenging to explain to leadership the necessity of investing in scale. Turo’s consistent growth required investing in scalability without sacrificing new feature development.
“The challenge is that there’s sort of a sense of if ‘it ain’t broke, don’t fix it.’”
Adam Safran, Senior Engineering Manager at Turo
System and organizational scale challenges: Turo’s monolithic application faced several issues due to growth:
The number of developers contributing to Turo’s codebase more than doubled from 28 contributors in 2017 to 72 in 2024
Turo’s engineering team grew from 80 engineers in 2020 to over 200 in 2024
The number of tables grew from 62 in 2011 to 491 in 2023
Their reservation table went from 13 in 2011 to 36 in 2019
Turo began experiencing deployment issues, with releases taking 5-10 hours to get code into production. Organizational silos emerged, leading to increased ambiguity around domain ownership.
Loss of modularity: As new domains were added to the application, classes written for a single purpose took on other responsibilities, leading to entanglement. Database tables previously called from one logical domain were called from two different places, mixing up their data layer with their business logic. As a result of over a decade of ad hoc development, Turo’s architecture lost its modularity, impacting the application’s scalability and resiliency.
Turo’s monolithic application as shown in vFunction. “This is what happens over twelve years of ad hoc development to build a world-class vehicle marketplace without pausing to consider scale,” said Adam Safran.
To address the challenges posed by their monolithic application, Turo chose first to extract its conversation service. The decision was based on the service’s frequency of use and overall latency issues. The goal was to improve engineering velocity by reducing complexity through distinct microservices that can scale separately and provide clarity on domain ownership.
Once the conversation service was modernized to a “lean, mean microservices machine,” Turo achieved the following results:
Faster average response times that went from half a second to 19 milliseconds with the 99th percentile response times improving from about seven seconds to just under one second
25-100x better sync times
Improved code deployment from 5-10 hours to just five minutes
Challenges following microservice extraction
However, soon after creating the microservice, Turo realized that the application’s architecture could quickly become entangled without complete visibility, with teams adding new code and dependencies to new microservices. These changes might not be discovered until weeks or months later, leading to Adam’s API services team having to untangle “spaghetti code.”
vFunction — visualizing architecture, paving the way to scalability
To ensure vigilance in observing the application’s software architecture, Turo turned to vFunction to provide continuous architectural observability with dynamic and static code analysis and a real-time view of the application’s dependencies. vFunction helps the microservices architecture remain resilient and scalable while monitoring for technical debt accumulation and reducing software complexity as the team adds new features.
“We’re making this investment now in scale while the wheels are moving so that our product teams can continue to focus on the features that our users want.”
Adam Safran, Senior Engineering Manager at Turo
With vFunction, Turo identified a repeatable process to extract microservices and achieve scalability goals. Here’s what Adam describes as the best practices when modernizing an application:
Use vFunction to understand your domain, the interconnections between services, and how to maintain service boundaries to avoid technical debt
Over communicate with your teams about what the organization is doing and how they are doing it
Break down organizational silos to reduce domain pollution
Share your journey to help uplevel the organization
Document successes to demonstrate ROI
Rinse and repeat
As Turo’s business continues its growth trajectory, vFunction architectural observability enables the team to visualize its application and continuously find and fix technical debt before it has a chance to take root.
Organizations that deploy architectural observability experience improved application resiliency and scalability while increasing their engineering velocity. This ensures continued innovation and sharpens their competitive edge.
“I wish we had vFunction when I started at Turo. This kind of architectural observability gives us a much better understanding into our application and helps us with decision making as we move forward.”
Adam Safran, Senior Engineering Manager at Turo
If you’d like to learn more about how vFunction can help your organization, contact us. Tell us the big goals for your application and we’ll show you how architectural observability gives you a clear path to get there faster.
To meet the challenges posed by customers and competitors in today’s rapidly changing marketplace, you must regularly update and modernize the software applications on which your business operations depend. In such an environment, technical debt is inevitable and highly detrimental. Knowing how to measure and manage technical debt effectively is essential.
Understanding how much technical debt a company has is crucial for setting up accurate metrics and realizing the extent of the debt. Getting a complete picture of the technical debt metrics within your organization’s applications makes it easier to manage and track.
According to Gartner, companies that manage technical debt “will achieve at least 50% faster service delivery times to the business.” On the other hand, organizations that fail to manage their technical debt properly can expect higher operating expenses, reduced performance, and a longer time to market. As a report from McKinsey makes clear, “Poor management of tech debt hamstrings companies’ ability to compete.” With so much riding on managing and remedying technical debt, it’s a topic that architects, developers, and technical leaders must know well.
What is technical debt?
The term “technical debt” was coined in 1992 by computer scientist Ward Cunningham to vividly illustrate the long-term consequences of the short-term compromises and workarounds developers often incorporate into their code. Much like financial debt, where borrowing money now leads to interest payments later, technical debt accumulates “interest” through increased development time, decreased system stability, and the potential for future bugs or failures.As TechTarget explains, technical debt is an inevitable consequence of the “build now, fix later” mentality that sometimes pervades software development projects. With tight deadlines, limited resources, or evolving requirements, developers may opt for quick-and-dirty solutions rather than investing the time and effort to build robust, scalable, and maintainable code.
In essence, technical debt is the result of prioritizing speed over quality. While these shortcuts may seem beneficial in the short term, allowing teams to meet deadlines or deliver features faster, they can significantly impact reliability down the line. Just as ignoring financial debt can lead to financial ruin, neglecting technical debt can crush a software project, making it increasingly difficult and expensive to maintain, modify, or extend.
The snowball effect
Technical debt doesn’t just remain static; it accumulates over time. As teams implement more quick fixes and workarounds, the codebase becomes increasingly convoluted, complex, and difficult to understand. This, in turn, makes it harder for developers to identify and fix bugs, add new features, or refactor the code to improve its quality. The result is a vicious cycle where technical debt begets more technical debt, leading to a gradual decline in the software’s overall health and performance. Growing technical debt signifies that the complexity of the code is increasing, which will eventually require untangling and negatively impact code quality.
Understanding the nature of technical debt and its potential consequences is the first step toward managing it effectively. Although the impacts of technical debt can be gradual, they can result in massive disadvantages in the long run.
Disadvantages of technical debt
Since no software development project ever has all the time or resources required to produce a perfect codebase, some technical debt is unavoidable. That’s not necessarily bad if an application’s technical debt is promptly “paid off.” Otherwise, just as with financial debt, the costs of repaying the “principal” plus the “interest” on the debt can eventually reach crippling proportions.
The “principal” portion of technical debt is the cost of fixing the original code, dependencies, and frameworks to enable it to function in today’s technology environment. The “interest” is the added cost of maintaining such applications, which continues to compound over time. The challenge is keeping an aging and inflexible legacy application running as it becomes increasingly incompatible with the rapidly changing modern infrastructure it operates on top of.
Technical debt can significantly hinder a company’s ability to innovate. According to a recent U.S. study, more than half of respondents dedicate at least a quarter of their annual budget to technical debt. Poorly written code is a common form of technical debt, often leading to increased maintenance costs and reduced code quality.
And other costs of technical debt are, perhaps, even worse than the financial ones:
Less innovation: The time developers devote to dealing with technical debt is time taken away from developing the innovations that can propel the business forward in its marketplace.
Slow test and release cycles: Technical debt makes legacy apps brittle (easy to break), opaque (hard to understand), and challenging to upgrade safely. That means teams must devote more time to understanding the potential impact of changes and testing them to ensure they don’t cause unexpected disruptions in the app’s operation.
Inability to meet business goals: This is the inevitable result of the previous two issues. In today’s environment of rapid evolution in technology and market requirements, the inability to quickly release and deploy innovative new applications can impede a company’s ability to meet its goals.
Security exposures: Because modern security concerns were typically unknown or disregarded when older apps were designed or patched, security-related technical debt often constitutes a significant vulnerability for legacy code.
Poor developer morale: For many developers, dealing with technical debt can be mind-numbing and frustrating. In one survey, 76% of respondents affirmed that “paying down technical debt” negatively impacted their morale.
Ward Cunningham explains the destructive potential of technical debt this way:
“The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation.”
While increasing amounts of technical debt feels normal to most, it’s essential to understand the impact so that managing it becomes a priority. Teams must also be aware of various types of tech debt.
What are the types of technical debt?
Technical debt manifests in various forms, each with unique characteristics and potential consequences. Understanding these types is crucial for effectively identifying and managing technical debt within the applications that an organization owns and maintains. Here are examples of tech debt that can arise.
Code debt: This is the most common and easily recognizable type of technical debt. It refers to accumulating poorly written, overly complex, or outdated code. Code debt can result from rushed development, lack of adherence to coding standards, or simply the evolution of technology and best practices over time. Symptoms of code debt include excessive bugs, difficulty understanding and modifying the code, and slow performance. Additionally, changing existing code can be particularly challenging, often requiring significant time and effort to ensure stability and maintainability.
Design debt: Design debt arises when the software’s design is suboptimal, leading to challenges in implementing new features or modifying existing functionality. This can occur due to a lack of upfront design, changes in requirements, or a failure to adapt the design as the software evolves. Design debt can manifest as tightly coupled components, hard coded dependencies, or a lack of modularity.
Testing Debt: Insufficient or inadequate testing can lead to testing debt. This can include a lack of automated tests, outdated tests that no longer reflect the current state of the software, or simply a culture of neglecting testing in favor of rapid development. Testing debt increases the risk of bugs and regressions, making it harder to ensure the software’s reliability.
Documentation Debt: Outdated, incomplete, or inaccurate documentation constitutes documentation debt. This can make it difficult for new developers to understand the codebase, for existing developers to remember how things work, or for stakeholders to understand the software’s capabilities and limitations. Documentation debt can lead to misunderstandings, errors, and delays in development.
Architectural technical debt
At vFunction, we also focus on an additional aspect of technical debt that can accumulate: architectural technical debt. While sharing similarities with the types mentioned above, architectural technical debt is more deeply ingrained in the software’s structure. It refers to compromises made in the overall architecture or design of the system for short-term gains, such as meeting a tight deadline or delivering a specific feature. These compromises may involve:
Architectural drift: Deviation from the intended architecture over time due to ad-hoc changes or lack of governance.
Intentional violations: Deliberately violating best practices or established architectural principles due to time constraints or other pressures.
Unstable shortcuts: Using temporary or unreliable solutions that provide a quick fix but may not be sustainable in the long run.
While incurring architectural technical debt can sometimes be a strategic decision, it’s essential to know the associated costs and drawbacks, as well as keeping a metric in place to measure and monitor changes. Over time, architectural debt can lead to increased complexity, reduced flexibility, and even system instability.
Understanding the different types of technical debt and their potential impact is a critical step in managing this hidden cost of software development. In the next section, we’ll explore how to measure technical debt so you can understand its extent and make informed decisions about how to address it.
How to measure technical debt
As we’ve seen, companies need to understand how to manage technical debt. Yet, according to an article in Forbes, technical debt is difficult to measure. The article quotes Sven Blumberg and Björn Münstermann of McKinsey, saying, “Technical debt is like dark matter: you know it exists, you can infer its impact, but you can’t see or measure it.” Blumberg and Münstermann list some informal indicators of technical debt, such as product delays, out-of-control costs, and low developer morale. But are there any formal methods available to quantify the amount of technical debt that characterizes a particular application or an entire application portfolio?
Some have proposed using metrics such as cyclomatic complexity (the number of possible execution paths in the code) and cognitive complexity (a measure of a human’s difficulty in understanding the code and all its possible execution paths). Code quality metrics can also be used to calculate the remediation cost, which helps determine the technical debt ratio (TDR). TDR measures the overall future cost of technical debt in terms of time or resources, providing a clearer understanding of the effort needed for quality improvement. The problem with such indicators is the difficulty of measuring them in a large monolithic codebase with millions of lines of code.
Architecture technical debt consistently appears as the most damaging and far reaching type of technical debt in surveys, analyst reports, and academic studies.
Why knowing how to measure technical debt is crucial
Many companies today depend on traditional monolithic applications for business-critical processing. Due to their age and development over time, such apps typically have substantial technical debt that limits their ability to integrate and take advantage of today’s cloud-based technological ecosystem.
The solution is modernizing those legacy apps to give them essentially cloud-native capabilities. And that means dealing with the technical debt that’s holding them back. But, as management guru Peter Drucker famously said, “You can’t improve what you don’t measure.” Measuring the technical debt of legacy apps is critical to bringing them into the modern technological age. Tracking technical debt over time is crucial for continuous improvement and ensuring long-term code quality. One recent application modernization white paper explains it this way: “For any application modernization strategy to be successful, organizations need to first understand the complexity, risk, and technical debt of their current application estate. From there, they can prioritize and make the appropriate substantial investments into technology, resources, and the time it takes to implement the strategy.”
However, technical debt is notoriously difficult to identify and measure. Luckily, there are tools that can help to detect and monitor technical debt so that teams can stay on top of it. Next, let’s look at a few of the most popular tools for identifying technical debt and helping teams stay on top of it.
“Poor management of tech debt hamstrings companies’ ability to compete.”
McKinsey & Company
Five best tools for measuring technical debt
The first step to tackling technical debt is understanding its extent and nature within your codebase, infrastructure, and overall architecture. Various tools can automate this process, providing valuable insights and metrics to guide your debt management strategy. Additionally, emphasizing the importance of code ownership can significantly aid in managing technical debt by ensuring clear ownership and contributions from fewer developers, thus minimizing unreliable code and tech debt.
According to Gartner, technical debt analysis tools can fall into a few categories. Below is a diagram that explains the types of tools, their capabilities, and where they reside within the SDLC.
Based on these categories, here are five of the best tools available for measuring different types of technical debt:
vFunction
This AI-powered platform tackles architectural technical debt in large, complex legacy systems and modern, cloud-based microservices. vFunction statically and dynamically analyzes applications, identifying hidden dependencies, outdated architectures, and potential risks. It then provides actionable insights and recommendations for refactoring, modernizing, and systematically reducing technical debt.
CAST Software (Cast Imaging)
Cast Imaging takes a comprehensive approach to technical debt assessment, analyzing code quality, architecture, and security vulnerabilities. It provides a detailed view of the technical debt landscape, including metrics for code complexity, design violations, and potential risks. This holistic approach helps teams prioritize their remediation efforts based on the most critical areas of debt.
SonarQube
This popular open-source platform is a versatile code quality and security analysis tool. While not explicitly focused on technical debt, SonarQube provides valuable insights into code smells, bugs, vulnerabilities, and code duplication, often indicative of technical debt. By regularly using SonarQube, teams can proactively identify and address code-level issues contributing to technical debt.
Snyk (Snyk Code)
While primarily known for its security focus, Snyk Code also offers features for analyzing code quality and maintainability. It can identify issues like code complexity, potential bugs, and security vulnerabilities, often intertwined with technical debt. By addressing these issues, teams can improve code quality and reduce the overall technical debt burden.
CodeScene
This unique tool goes beyond static code analysis by analyzing the evolution of your codebase over time. It identifies hotspots—areas of the code that are frequently changed and prone to accumulating technical debt. It also analyzes social aspects of code development, such as team dynamics and knowledge distribution, to identify potential bottlenecks and risks. This behavioral code analysis provides valuable insights into the root causes of technical debt, helping teams address it more effectively.
By leveraging these tools, you can comprehensively understand your technical debt landscape. This knowledge empowers you to make informed decisions about which areas of debt to prioritize and how to allocate resources for remediation. Many of these tools can be embedded directly into your development pipelines with automated scans and monitoring to keep your teams informed as a project evolves. As we have discussed, monitoring and managing technical debt is crucial; these tools can help keep that at the forefront of our minds. Let’s review some tips on monitoring and managing technical debt.
Monitoring and managing technical debt
A significant portion of technical debt in legacy applications stems from their monolithic architecture and reliance on outdated technologies. These applications often have a complex codebase with hidden dependencies, making it challenging to assess and address technical debt effectively.
This modern approach leverages machine learning (ML) to analyze the dependency graph between classes within an application. The dependency graph, a directed graph representing dependencies between entities in a system, provides valuable insights into the complexity and risk associated with the application’s architecture.
By applying ML algorithms, we can extract three key metrics that represent the level of technical debt in the application:
Complexity: This metric reflects the effort required to add new features to the application. A higher complexity score indicates a greater likelihood of encountering challenges and potential issues during development.
Risk: This metric relates to the probability that adding new features may disrupt the operation of existing functionalities. A higher risk score suggests a greater vulnerability to bugs, regressions, and unintended consequences.
Overall debt: This metric quantifies the additional work required when adding new features. It provides an overall assessment of the technical debt burden associated with the application.
By training ML models on manually analyzed data incorporating expert knowledge, we can accurately assess the technical debt level in applications even without prior knowledge. This enables organizations to comprehensively understand technical debt across their legacy software portfolio. With this information, IT leaders can make data-driven decisions about which applications to prioritize for modernization and how to allocate resources for technical debt reduction.
In addition to this ML-based approach, continuous monitoring of critical metrics, such as code complexity, code churn, and test coverage, can help identify potential hotspots where technical debt accumulates. By proactively addressing these issues, organizations can prevent technical debt from spiraling out of control and ensure their legacy applications’ long-term health and maintainability. Tracking the technical debt ratio (TDR) is also crucial, as it measures the amount spent on fixing software compared to developing it. A minimal TDR of less than five percent is ideal and can help demonstrate to executives the value of proactively addressing technical debt.
Conclusion
For machine learning to be a practical solution to measuring technical debt, it must be embodied in an intelligent, AI-driven, automated analysis tool that delivers comprehensive technical debt metrics and allows users to build a data-driven business case for modernizing a company’s suite of legacy apps. These metrics should identify the level of technical debt, complexity, and risk for each app and the legacy app portfolio.vFunction architectural observability platform is a purpose-built platform that embodies those principles. It uses AI and machine learning to provide accurate measures of technical debt and can also help automate the refactoring of legacy apps to eliminate technical debt. To see vFunction’s answer to how to measure technical debt in action, schedule a demo today.
Many teams turn to microservice architectures hoping to leave behind the complexity of monolithic applications. However, they soon realize that the complexity hasn’t disappeared — it has simply shifted to the network layer in the form of service dependencies, API interactions, and data flows between microservices. Managing and maintaining these intricate distributed systems can feel like swimming against a strong current — you might be making progress, but it’s a constant struggle and you are left tired. However, the new distributed applications capability in vFunction provides a life raft, offering much-needed visibility and control over your distributed architecture.
In this post, we’ll dive into how vFunction can automatically visualize the services comprising your distributed applications and highlight important architectural characteristics like redundancies, cyclic dependencies, and API policy violations. We’ll also look at the new conversational assistant powered by advanced AI that acts as an ever-present guide as you navigate vFunction and your applications.
At the heart of vFunction’s new distributed applications capability is the Service Map – an intuitive visualization of all the services within a distributed application and their interactions. Each node represents a service, with details like name, type, tech stack, and hosting environment. The connections between nodes illustrate dependencies like API calls and shared resources.
OpenTelemetry
This architectural diagram is automatically constructed by vFunction during a learning period, where it observes traffic flowing through your distributed system. For applications instrumented with OpenTelemetry, vFunction can ingest the telemetry data directly, supporting a wide range of languages including Java, .NET, Node.js, Python, Go, and more. This OpenTelemetry integration expands vFunction’s ability to monitor distributed applications across numerous modern language stacks beyond traditional APM environments.
Unlike traditional APM tools that simply display service maps based on aggregated traces, vFunction applies intelligent analysis to pinpoint potential architectural issues and surface them as visual cues on the Service Map. This guidance goes beyond just displaying nodes and arrows on the screen. It applies intelligent analysis to identify potential areas of concern, such as:
Redundant or overlapping services, like multiple payment processors, that could be consolidated.
Circular dependencies or multi-hop chains, where a chain of calls increases complexity.
Tightly coupled components like separate services using the same database, making changes difficult
Services that don’t adhere to API policies like accessing production data from test environments
These potential issues are flagged as visual cues on the Service Map and listed as actionable to-do’s (TODOs) that architects can prioritize and assign. You can filter the map to drill into specific areas, adjust layouts, and plan how services should be merged or split through an intuitive interface.
Your AI virtual architect
vFunction now includes an AI-powered assistant to guide you through managing your architecture every step of the way. Powered by advanced language models customized for the vFunction domain, the vFunction Assistant can understand and respond to natural language queries about your applications while incorporating real-time context.
Need to understand why certain domains are depicted a certain way on the map? Ask the assistant. Wondering about the implications of exclusivity on a class? The assistant can explain the reasoning and suggest the next steps. You can think of it as an ever-present co-architect sitting side-by-side with you.
You can query the assistant about any part of the vFunction interface and your monitored applications. Describing the intent behind a change in natural language, the assistant can point you in the right direction. No more getting lost in mountains of data and navigating between disparate views — the assistant acts as a tailored guide adapted to your specific needs.
Of course, the assistant has safeguards in place. It only operates on the context and data already accessible to you within vFunction, respecting all existing privacy, security and access controls. The conversations are ephemeral, and you can freely send feedback to improve the assistant’s responses over time.
An elegant architectural management solution
Together, the distributed applications visualization and conversational assistant provide architects and engineering teams with an elegant way to manage the complexity of different applications. The Service Map gives you a comprehensive, yet intuitive picture of your distributed application at a glance, automatically surfacing areas that need attention. The assistant seamlessly augments this visualization, understanding your architectural intent and providing relevant advice in real-time.
These new capabilities build on vFunction’s existing architectural analysis strengths, creating a unified solution for designing, implementing, observing, and evolving software architectures over time. By illuminating and streamlining the management of distributed architectures, vFunction empowers architects to embrace modern practices without being overwhelmed by their complexity.
Want to see vFunction in action? Request a demo today to learn how our architectural observability platform can keep your applications resilient and scalable, whatever their architecture.