Category: Featured

AMA: Ask your Monolith Anything. Introducing the query engine for monoliths.

Posted on June 25, 2025 by Lee Altman - Featured

What if you could talk to your app’s architecture the way you talk to your favorite LLM? “Show me the top 5 classes by CPU usage in domain A,” “Find all classes from package B that sneak into other domains.” That’s exactly what we’ve built: a query engine that lets you ask your monolith questions—no custom scripts, no guesswork.

vFunction’s new GenAI-powered query engine lets architects and developers run natural language prompts against the structure of their monolithic application. Just ask a question, and we’ll handle the rest: translating it to safe, validated internal queries, running it against our database, and returning results in a readable table. All you need to do is type.

Why build a query engine?

Monoliths are famously opaque, and do you really want to spend the precious hours in your day trying to decode them? Understanding how the system behaves, what calls what, where coupling occurs, and how methods evolve is often buried under layers of code.

Customers asked us, “Can we export what we see in the call tree?” They wanted to include it in architecture reviews, technical documentation and diagrams. Screenshots weren’t cutting it. That’s when we realized the architectural graph powering vFunction should be queryable with natural language. That got us thinking—what else could we do?

Here are some examples of queries users can run that would previously require exporting and manually filtering a full call graph:

Show me all classes used in more than four domains that aren’t common.
→ Reveals architectural coupling or candidates for shared libraries.
Find all methods in the call tree under the domain ProductController that use beans.
→ Useful for mapping data access patterns, often buried in complex trees.
Which domain shares the most static classes with the domain InventoryService?
→ Helps determine which domains could be merged with the current domain.

How does the query engine work?

The query engine is not just a search box. It’s a full-blown architectural Q&A powered by GenAI, tied into your application’s live architectural state.

Here’s how it works:

You write a prompt like “Show me the classes using the SalesOrderRepository across domains.”
Our GenAI turns the prompt into a query.
We send only the query to the GenAI provider—no data, no context, just the natural language prompt.
The GenAI returns a query.
We run the query locally against your vFunction server’s architecture data and display the results in a table or CSV format.

Security first

LLMs can hallucinate. We don’t let them.

vFunction never sends your application data to the GenAI provider. Only the user’s natural language prompt is shared. Nothing else. The GenAI is used strictly to translate the prompt into a query tailored for vFunction’s internal schema. At no point is your measurement data exposed outside your environment.

After generating the query, vFunction validates and sanitizes it, then runs it locally on your server. You get the benefits of natural language interfaces with complete data privacy and protection.

The result: conversational architecture analysis

With the new GenAI-powered query engine, you don’t need to dig through call trees or guess how classes relate. Just ask.

Want to explore stack traces, track class reuse across domains or filter down a call path for documentation? Open vFunction’s query engine, describe what you’re looking for, and get the answer. Even the most complex monolith is now an open book—saving you hours of effort digging through code, tracing dependencies, and assembling documentation.

Curious how vFunction helps teams tackle technical debt and turn monoliths into modular, cloud-ready apps? Explore the platform and see what architectural modernization looks like in action.

Top 10 software observability tools of 2025

Posted on May 14, 2025 by Peter Styron - Featured

In 2025, the massive wave of changes in the software landscape continues to grow. Cloud native architectures, microservices, serverless functions, and AI have created huge shifts and unprecedented opportunities, complexity, and risk. Understanding what’s happening inside these intricate systems when things go wrong, or even when they operate as expected, is harder than ever. Traditional monitoring, which relies on predefined dashboards and alerts, can tell you that a problem exists, but struggles to tell you why.

This is where software observability comes in. More than just monitoring 2.0, observability is the ability to infer the internal state and health of complex systems by analyzing the data they produce in the form of logs, metrics, and traces. In this blog, we will cover everything you need to know about software observability tools and the best ones to add to your stack. Let’s get started by digging a bit further into what observability is.

What is observability in software systems?

At its core, software observability is the ability to measure and infer the internal state of a complex system based solely on the data it produces. The term comes from control theory, where it describes understanding a system by observing its external signals. In the context of modern software, especially distributed systems like microservices running in the cloud, observability means having the tools and data to understand why something is happening, not just that it’s happening (a staple of more traditional monitoring).

Observability is more than just collecting data from within an application; it’s about implementing high-quality, contextualized telemetry that allows you to explore behavior and performance effectively. Traditionally, observability has “three pillars”:

Logs: These are discrete, timestamped records of events that occurred over time. Logs provide detailed, context-rich information about specific occurrences, such as errors, warnings, application lifecycle events, or individual transaction details. They are essential in most apps for troubleshooting issues and tracking the steps that lead to a problem.
Metrics: Metrics are numerical representations of system health and performance measured over time. Think CPU utilization, memory usage, request latency, error rates, or queue depth. Metrics are usually aggregated so they can be easily assessed in dashboards, used for alerting on predefined thresholds, and understanding trends and overall system behavior.
Traces: Traces track the end-to-end journey of a single request or transaction as it flows through multiple services in a distributed system. Each step in the journey (a “span”) contains timing and metadata. Traces are key for visualizing request flows, identifying bottlenecks, and understanding inter-service dependencies. They can also be very helpful in diagnosing latency issues in microservices architecture and highly complex systems with a lot of moving parts.

While these three pillars (metrics, logs, and traces) are the foundation, the ultimate goal of combining them is to give teams the visibility to ask any question about their system’s behavior, especially the “unknown unknowns” or emergent issues that teams couldn’t have predicted, and get answers quickly.

Observability vs. traditional monitoring

While related, observability and traditional monitoring serve distinct purposes in understanding software systems. Monitoring typically involves tracking predefined metrics to check system health, whereas observability enables deeper exploration to understand why systems behave the way they do, especially when encountering unexpected issues. Monitoring is often a component that feeds into a broader observability strategy.

Here’s a breakdown of the key differences:

Comparison aspect	Traditional monitoring	Software observability
Primary goal	Health/status checking; alerting on known thresholds	Deep understanding; Debugging unknown & complex issues
Approach	Uses predefined metrics, dashboards, and alerts	Exploratory analysis using rich, correlated telemetry
Question focus	Answers predefined questions (“Is CPU usage high?”)	Enables asking arbitrary questions (“Why is this slow?”)
Problem handling	Addresses “Known unknowns” (anticipated failures)	Uncovers “Unknown unknowns” (unpredictable failures)
Core data types	Primarily metrics, basic logs	Correlated logs, metrics, traces (and often more)
System suitability	Effective for simpler, monolithic systems	Essential for complex, distributed systems (microservices)
Outcome	Identifies that a problem exists	Helps understand why a problem exists and its context
Nature	Often reactive (responds to threshold breaches)	Enables proactive investigation & hypothesis testing

When comparing the two, you can think of monitoring as the dashboard warning lights in your car: they tell you if something pre-determined is wrong (low oil, engine hot). On the other hand, observability goes a step further, providing a comprehensive diagnostics toolkit, enabling the identification of root causes such as a sensor failure affecting fuel mix, and insights into how different systems interact. Observability gives a deeper look at issues that traditional monitoring cannot provide.

Limitations of traditional runtime observability

Traditional observability and application performance monitoring (APM) tools are great for monitoring runtime performance – identifying latency, errors, and resource usage (the “what” and “when”) – but often fall short in explaining the deeper “why” rooted in application architecture. They don’t have visibility into the structural design, complex dependencies, and accumulated architectural technical debt that causes recurring runtime problems. They highlight the symptoms of poor architecture (slow transactions or cascading failures) rather than the underlying structural issues. This means you can’t leverage these insights to fix root causes or proactively plan for modernization.

Emerging areas in observability: beyond runtime

To fill the gaps left by runtime-focused tools, several purpose-built observability areas are emerging that complement standard observability tools offering deeper insights into specific domains:

Architectural observability: Focuses on understanding the application’s static and dynamic structure, component dependencies, architectural drift, and technical debt. Tools like vFunction analyze how the application is built and identify structural issues in the business logic, guiding modernization or refactoring efforts and supporting software architecture governance.
Data observability: Concentrates on the health and reliability of data pipelines and data assets. Monitors data quality, freshness, schema changes, and lineage so you can trust the data used for analytics and operations.
API observability: Provides deep visibility into the performance, usage, and compliance of APIs, which are the communication points in modern distributed systems. Helps track API behavior, identify errors, and understand consumer interactions. Some platforms, such as Moesif, can also use the observability data for monetization and API governance.

These emerging areas complement runtime observability and give you a more complete picture of complex software systems.

Gartner® Hype cycle for monitoring and observability 2024 showcases a range of technologies along the observability spectrum.

The strategic role of OpenTelemetry

Underpinning much of the progress in both traditional and emerging observability areas is OpenTelemetry (OTel). Its rapid adoption across the industry is a big shift towards standardized, vendor-neutral instrumentation. OTel provides a common language and set of tools (APIs, SDKs, Collector) to generate and collect logs, metrics, and traces across diverse technology stacks. By decoupling instrumentation from specific backend tools, OTel prevents vendor lock-in, ensures that observability can be future-ready, and captures rich telemetry data needed to power all forms of observability. OTel helps power almost all types of observability from runtime APM to architectural, data, and API analysis.

Why observability matters in modern software

With traditional monitoring being around for so long, why is observability so important in modern software? As software systems evolve into complex webs of microservices, APIs, cloud infrastructure, and third-party integrations, simply knowing if something is “up” or “down” is no longer sufficient. The dynamic, distributed nature of modern applications demands deeper insights. Observability has shifted from a ‘nice-to-have’ to a necessity for building, running, and innovating in 2025 and beyond.

Here’s why observability is so important for modern development teams and stakeholders:

Taming complexity and speeding up incident resolution

Modern systems can fail in countless ways. When something goes wrong, pinpointing the root cause across dozens or hundreds of services is impossible with traditional monitoring. Observability gives you the correlated telemetry (traces, logs, metrics) to follow the path of a failing request, understand component interactions, and find the source of the problem, reducing Mean Time To Detection (MTTD) and Mean Time To Resolution (MTTR). It lets you debug the “unknown unknowns”, a.k.a. the things you couldn’t have anticipated.

Building more reliable and resilient systems

By giving you a deep understanding of how systems behave under different loads and conditions, observability helps you identify potential bottlenecks, cascading failure risks, and performance degradation before they cause user-facing outages. This allows you to target improvements and architectural changes that make the system more stable and resilient in a data-driven manner.

Boosting developer productivity and enabling faster innovation

When developers can see the impact of their code changes in production-like environments, this visibility accelerates the whole development lifecycle. Integrating observability into your workflow early (known as “shifting left”) lets engineers debug more efficiently, gain confidence in their releases, and understand performance implications of code changes. The end result is faster and safer deployments within DevOps and SRE frameworks.

Better end-user experience

It’s no surprise that system performance impacts user satisfaction. Slow load times, intermittent errors, or feature failures can drive users away. Observability lets you identify and troubleshoot issues affecting specific user cohorts or transactions, even subtle ones that won’t trigger standard alerts, for a better, more consistent customer experience.

Optimizing performance and cost

Cloud environments offer scalability and potential cost savings. However, on the flip side, they can also lead to runaway costs if not managed. Observability helps you identify inefficient resource usage, redundant service calls, or performance bottlenecks that waste compute power and inflate cloud bills. Knowing where time and resources are spent by leveraging observability tools allows you to target optimization efforts, improve efficiency, and reduce operational expenses.

Enhanced security posture

There is an increasing overlap between operational monitoring and security, with observability as a major factor in these blurring lines. Observing system behavior, network traffic, and API interactions can reveal anomalies that indicate security threats, such as unusual access patterns, data exfiltration attempts, or compromised services. This allows observability data to provide context for security investigations and detection of exploits. The best defense in terms of security is always being proactive; however, when vulnerabilities do slip through, the early detection that observability can provide is critical.

Overall, observability is part of the fabric of modern applications. Beyond simply writing and publishing code, applications are expected to be scalable, secure, and performant. Observability is a major part of ensuring that applications are able to hit these expectations.

How to choose the right software observability tool

Knowing the necessity of observability leads to the crucial step of selecting the right tools. Decide by evaluating key features and capabilities offered by each platform. Focus on:

Data coverage and integration

When selecting an observability tool, ensure that the tool supports the essential data types (logs, metrics, traces) needed for your specific application and use case. Assess the tool’s ability to efficiently ingest, store, and correlate data. Understand costs and data storage impact, if relevant. Check the tool’s compatibility with key technologies in your stack, including:

Cloud providers such as AWS, Azure, and GCP
Container platforms like Kubernetes
Programming languages and frameworks: Java, Python, Go, Node.js, .NET
Databases, message queues, and serverless infrastructure

Additionally, look for support for OpenTelemetry (OTel), which offers vendor-neutral instrumentation, ensuring there are no lock-ins and allowing for future flexibility.

Correlation and contextualization

Isolated data points don’t offer much value on their own; it’s the connected insights that truly matter. The standout feature for observability tools is their capability to automatically link related logs, metrics, and traces from a single user request, transaction, or system event. Attempting to manually combine this data across various systems is not only slow but is also likely to lead to mistakes. Moreover, consider how well the tool can enrich telemetry data with additional context and automate subsequent actions. Can the tool associate performance data with specific actions like code deployments, changes in feature flags, infrastructure updates, or details of user sessions? This extra layer of context is vital for effective problem-solving and debugging.

Analysis, querying, and visualization

The freedom to explore your data is essential for effective observability. Evaluate the platform’s query capabilities—are they powerful and flexible enough to handle complex, ad-hoc queries, especially for high cardinality data? Also, the depth of visualization features should be considered. Do the dashboards provide intuitive, customizable, and effective tools for displaying complex system interactions? Key visualization features to look for include service maps showing dependencies, distributed trace views like flame graphs, and clear metric charting.

It’s also crucial to ensure the platform supports analysis and querying at your required scale, as telemetry data from modern applications often demands substantial storage and processing resources.

Beyond manual exploration, many platforms now incorporate AI and machine learning (AI/ML) features for tasks such as automated anomaly detection, noise filtering to reduce alert fatigue, root cause analysis, and even predictive insights. While AI features are becoming standard, it’s important to assess their maturity and usefulness in practice.

Ease of use and learning curve

A tool is only as effective as its usability. The platform should offer an intuitive interface tailored to the needs of developers, engineers, SREs, and operations teams. Evaluate the effort required to set up the tool, including application instrumentation (automatic vs. manual), alert configuration, and ongoing support and maintenance. Strong documentation, responsive vendor support, and a user community are equally critical—they can significantly impact both the ease of adoption and the long-term success of the tool within your organization.

Cost and pricing model

Observability can quickly become a big operational expense. Make sure you fully understand the pricing model and available options. Is the pricing model based on data volume (ingested or stored), number of hosts or nodes monitored, active users, specific features, a combination of these, or other pricing variables for emerging solutions? Ensure that the pricing model is transparent and predictable so that you can forecast the costs as you scale. Before committing, calculate the Total Cost of Ownership (TCO) including data egress fees and storage costs as your applications scale. Also be aware of any professional services or training that your team will need in order to deploy and use the tool. Your best bet is to look for vendors with flexible models that match your usage patterns and expected scale.

Specific needs and future goals

Ultimately, ensure the tool’s capabilities align with your specific goals. Are you primarily focused on APM, or do you also require features like in-depth infrastructure monitoring, log aggregation and analysis, or security event correlation? Critically, do your goals go beyond runtime monitoring? Do you need to understand the underlying application architecture, identify technical debt hotspots or gain visibility for modernization initiatives? Some tools are great at performance monitoring while others are more specialized such as vFunction, focused on deeper architectural observability and analysis.

Start by prioritizing your key requirements, then shortlist 2-3 tools for Proof of Concept (POC) testing using realistic workloads. Involve the daily users—developers, SREs, or operations teams—to gather actionable feedback. Use POC results to make an informed decision on the best tool for your needs.

Top 10 software observability tools (2025)

Selecting the right software observability tool is complex, with each tool offering unique strengths in data gathering, analysis, and application. Here, we spotlight the top ten tools and frameworks of 2025, showcasing both established solutions and innovative newcomers that tackle the critical issues faced in contemporary software development and operations. Let’s take a look:

vFunction

We’ll kick off our list of observability tools with vFunction, an emerging solution founded in 2017. Purpose-built to manage complexity and technical debt in modern software systems, it also complements traditional APM tools by providing deep architectural insight. As business logic becomes more distributed—and harder to trace—vFunction helps teams improve system understanding to regain architectural control, reduce risk, and accelerate decision-making. Using architectural observability, vFunction continuously visualizes application architecture, detects issues like redundancies and circular dependencies, and automates the identification and resolution of complex service interactions across both monoliths and microservices.

Key observability features:

vFunction provides real-time architectural insight across both monoliths and distributed services—using patented static and dynamic analysis to uncover hidden dependencies, dead code, and complex flows in Java and .NET monoliths, and leveraging runtime observation with OpenTelemetry to automatically visualize service interactions in distributed environments (supporting Node.js, Python, Go, and more).
Intelligent analysis is applied across the application architecture to address surface architectural concerns such as service redundancies, circular dependencies, tight coupling, overly complex flows, performance anomalies, and API policy violations, enabling teams to act before issues escalate.
Correlation of OpenTelemetry traces with AI-driven architectural analysis to uncover structural bottlenecks contributing to performance issues, going beyond traditional APM trace visualization.
Architectural insights from the vFunction analysis produce a list of ‘To Do’ development tasks that automatically feed Generative AI code assistants like Co-pilot and Amazon Q, which generate the actual code fixes for the architectural issues identified by vFunction.

vFunction’s observability features are best suited for engineering teams needing visibility and control over complex distributed applications or those modernizing monolithic systems. It helps teams who want to go beyond surface-level service maps to understand the why behind architectural complexity, identify hidden dependencies impacting performance and resilience, and proactively manage architectural drift and technical debt.

Image courtesy of Datadog

Datadog

Datadog, founded in 2010, is a leading SaaS observability platform designed for cloud-scale applications and infrastructure. It is known for providing a unified view across diverse monitoring domains, consolidating data from hundreds of integrations into a single interface. Datadog helps teams monitor infrastructure, applications, logs, security, and user experience in complex, dynamic environments.

Key observability features:

A unified platform integrates infrastructure, APM, logs, real user monitoring (RUM), synthetics, security, and network monitoring.
Extensive library of pre-built integrations for cloud providers, services, and technologies.
Real-time dashboards, alerting, and collaboration features enable teams to track key metrics like request latency, CPU utilization, and error rates.

Datadog’s observability features are best suited for DevOps teams, SREs, and developers who need comprehensive, end-to-end visibility across their entire cloud or hybrid stack within a single platform. Datadog excels at correlating data from different sources to provide context during troubleshooting.

Image courtesy of New Relic

New Relic

New Relic, established in 2008, pioneered the APM market as a SaaS solution. New Relic is known for its deep insights into application performance and has expanded into a full-stack observability platform. New Relic helps engineering teams understand application behavior, troubleshoot issues, and optimize performance throughout the software lifecycle.

Key observability features:

Deep code-level APM diagnostics and distributed tracing.
Full-stack monitoring including infrastructure, logs, network, and serverless functions.
Digital experience monitoring (Real User Monitoring (RUM), and Synthetics).

New Relic’s observability features are best suited for teams prioritizing application performance, reliability, and understanding the root cause of issues within the application code itself. It provides developers and SREs with the detailed data needed to optimize complex applications.

Image courtesy of Grafana

Grafana & Prometheus

Prometheus (started 2012) is a Cloud Native Computing Foundation (CNCF) open-source project focused on time-series metric collection and alerting, while Grafana is the leading open-source platform for visualization and analytics, often used together. They are known as a de facto standard for metrics monitoring and dashboarding, especially in Kubernetes and cloud-native ecosystems.

Key observability features:

Prometheus: Efficient time-series database, powerful PromQL query language, service discovery, and alerting via Alertmanager.
Grafana: Highly customizable dashboards, support for numerous data sources (including Prometheus, Loki, Tempo), extensive plugin ecosystem.
Often combined with Loki for logs and Tempo for traces to build a full open-source observability stack (PLG/LGTM stack).

Grafana & Prometheus observability features are best suited for teams seeking powerful, flexible, and often self-managed open-source solutions for monitoring and visualization. They excel in metrics-driven monitoring and alerting, providing deep customization for technical teams managing cloud-native environments.

Image courtesy of Elastic

Elastic Observability (ELK Stack)

Elastic Observability evolved from the widely adopted ELK Stack (Elasticsearch, Logstash, Kibana), initially known for its powerful open-source log aggregation and search capabilities. Elastic Observability now integrates metrics, APM, and security analytics (SIEM) into a unified platform, available both as a self-managed infrastructure and via Elastic Cloud.

Key observability features:

Robust log aggregation, storage, search, and analysis powered by Elasticsearch.
Integrated APM with distributed tracing and service maps.
Infrastructure and metrics monitoring using Elastic Agent (integrating capabilities previously in Beats).

Elastic Observability’s features are best suited for teams requiring strong log analytics as a core capability, often starting with logging use cases and expanding into APM and infrastructure monitoring. Elastic Observability is valuable for operations, security, and development teams needing integrated insights across logs, metrics, and traces.

Image courtesy of Splunk

Splunk

Splunk, founded in 2003, is a market leader in analyzing machine-generated data, renowned for its powerful log management and SIEM capabilities. It has extended its platform into the Splunk Observability Cloud, integrating APM and infrastructure monitoring with its core data analysis strengths.

Key observability features:

Industry-leading log data indexing, searching using Splunk Search Processing Language (SPL), and analysis capabilities.
Full-fidelity APM with NoSample tracing.
Real-time infrastructure monitoring, RUM, and synthetic monitoring.

Splunk’s observability features are best suited for organizations, often large enterprises, that need powerful data investigation capabilities across IT operations and security. Teams benefit from its ability to correlate observability data (metrics, traces, logs) with deep log insights and security events.

Image courtesy of Dynatrace

Dynatrace

Dynatrace, with origins in 2005, provides a highly automated, AI-powered observability and security platform. It is known for its OneAgent technology for automatic full-stack instrumentation and its Davis AI engine for automated root cause analysis and anomaly detection across complex enterprise environments.

Key observability features:

Automated discovery, instrumentation, and topology mapping via OneAgent.
AI-driven analysis (Davis) for automatic root cause detection, anomaly detection, and predictive insights.
Full-stack visibility including infrastructure components, applications, logs, user experience (RUM/Synthetics), and application security.

Dynatrace’s observability features are best suited for medium-to-large enterprises seeking a high degree of automation and AI-driven insights to manage complex hybrid or multi-cloud environments. It reduces manual effort in configuration and troubleshooting for IT Ops, SRE, and DevOps teams.

App Dynamics image courtesy of Splunk community

AppDynamics

AppDynamics, founded in 2008 and now part of Cisco, is a leading APM platform, particularly known for its ability to connect application performance to business outcomes. It helps organizations monitor critical applications and understand the business impact of performance issues.

Key observability features:

Deep APM with code-level visibility.
Business transaction monitoring, mapping user journeys, and critical workflows.
Correlation of IT performance metrics with business KPIs (Business IQ).

AppDynamics’ observability features are best suited for enterprises where understanding the direct link between application performance and key business metrics (like revenue or conversion rates) is crucial. It’s ideal for application owners, IT and business analysts focused on business-critical systems.

vFunction using OpenTelemetry tracing data to inform real-time sequence flows of applications

OpenTelemetry

OpenTelemetry (OTel) is not a vendor platform but an open-source observability framework stewarded by the CNCF, created from the merger of OpenTracing and OpenCensus around 2019. It is known for standardizing the way applications and infrastructure are instrumented to produce telemetry data (logs, metrics, traces).

Key observability features:

Vendor-neutral APIs and SDKs for code instrumentation across multiple languages.
OpenTelemetry Collector for receiving, processing, and exporting telemetry data to various backends.
Standardized semantic conventions for telemetry data, ensuring consistency.

OpenTelemetry is best suited for any organization building or operating modern software that wants to avoid vendor lock-in for instrumentation. It empowers developers and platform teams to instrument once and send data to their choice of observability backends, ensuring portability and flexibility.

Image courtesy of AWS fundamentals

AWS CloudWatch

AWS CloudWatch is the native monitoring and observability service integrated within Amazon Web Services, evolving significantly since its initial launch in 2009. It is known for providing seamless monitoring for resources and applications running on the AWS platform.

Key observability features:

Automatic collection of metrics and logs from dozens of AWS services.
Customizable dashboards and alarms based on metrics or log patterns.
Integration with AWS X-Ray for distributed tracing within the AWS ecosystem.

AWS CloudWatch’s observability features are best suited for teams whose operations are primarily in the AWS cloud. It offers convenient, built-in monitoring for AWS services, making it ideal for administrators and developers managing AWS infrastructure and applications.

Conclusion

As we advance into 2025, software becomes even more complex, especially with the widespread use of microservices and cloud-native approaches. Now, having a clear, full view of your systems is crucial. But achieving this insight goes beyond simple monitoring—it’s about observability.

Observability allows us to see not just what’s happening in our systems but also why issues like slow performance or errors arise. It sheds light on the hidden issues like bottlenecks and technical debt that can compromise system efficiency and growth. Combining insights from both the operational side and the architectural perspective helps teams identify and tackle root causes rather than just patching up symptoms.vFunction empowers teams to go beyond runtime monitoring by providing deep architectural insights. With patented tools that identify hidden dependencies, structural bottlenecks, and technical debt, vFunction enables you to fix root causes, not just symptoms. Simplify modernization, boost resilience, and scale with confidence. Ready to take your observability to the next level? Discover vFunction today!

Rethinking architecture in the age of AI: Findings from our latest research report

Posted on April 28, 2025 by Moti Rafalin - Featured

The rise of AI-driven code development is fundamentally reshaping the demands on software architecture. As AI accelerates the creation of new features, services, and applications, it also expedites application complexity, often without a system-wide view. Without strong architectural oversight, AI-generated code can lead to service duplication, unwanted dependencies, and microservices sprawl.

Architecture has a critical role to play in keeping systems resilient, scalable, and secure as they evolve. But today, architecture is often poorly documented, disconnected from day-to-day development, and left to drift. That gap isn’t just a technical problem, it’s a business risk, leading to project delays, security vulnerabilities, and performance challenges that organizations can’t afford to ignore.

That’s why we conducted a new research study with over 600 senior technology leaders, including architects, engineering leaders, and CTOs across the U.S. and U.K. To understand where architecture stands today, and where it must evolve to meet the demands of the AI era, we surveyed experts from organizations ranging from $100M in revenue to enterprises exceeding $10B.

Critical disconnect

Survey results expose a critical disconnect between architectural intent and implementation reality.

The findings are striking

93% of organizations report negative business outcomes tied to architectural misalignment.
Approximately 50% cite misalignment as a cause of project delays, security and compliance risks, and unexpected operational costs.
Only 43% say their architecture documentation fully reflects production reality.
90% agree that architectural insights should be integrated into observability tools to address application issues before they become outages.

While architecture is recognized as essential, many organizations still struggle to keep it aligned with fast-changing production environments. Smaller companies ($100M–$999M) maintain better alignment compared to enterprises above $1B, suggesting that architectural control decreases as organizations and complexity scale.

Why acting now matters

AI isn’t slowing down and neither is the pressure to deliver faster, more complex systems. Waiting to integrate architectural practices risks locking in technical debt and instability that become harder to unravel later. Without action, organizations risk scaling complexity instead of innovation.

This report explores how new technologies like OpenTelemetry, AI and architectural observability are reshaping how we build and govern modern systems—and why organizations must rethink architecture not just as a design artifact, but as a continuously manageable process embedded in the SDLC and security practices of the organization supported by real time observability tools .

I invite you to dive into the full research report to see where the gaps are, why they matter, and how organizations can close them.

Java Architecture: Components with Examples

Posted on April 16, 2025 by Shatanik Bhattacharjee - Featured

Overview of Java architecture and examples

Java, introduced by Sun Microsystems in 1995, remains a dominant programming language in the tech industry. Its enduring popularity is attributed to the robust architectural framework that facilitates the development of scalable and maintainable enterprise applications. The core components of Java’s architecture – JVM, JRE, and JDK – establish a foundation for platform independence, while application architecture patterns and design principles enhance Java’s effectiveness in enterprise settings.

Java appears consistently at the top of RedMonk’s programming language rankings. Source: RedMonk Language Rankings 2024

Modern enterprise Java applications have evolved to incorporate distributed systems, microservices, and cloud-native architectures, replacing traditional monolithic structures with more intricate designs. The shift towards complexity requires a comprehensive understanding of both the foundational Java platform architecture and the evolving application architecture for successful modernization, refactoring, and new development projects.

In this blog series, we delve into Java architecture at both the platform and application levels, exploring key concepts of this object-oriented programming language and its architecture. By examining the high-level principles within Java architecture, readers will gain valuable insights for navigating the complexities of modern enterprise software development.

What is Java architecture?

The architecture of a Java application can encompass both the underlying Java platform components that execute Java code and the higher-level application design patterns used to structure Java applications. Understanding these two facets of Java architecture helps to explain the unique advantages Java brings to enterprise software development.

Platform architecture

At its core, Java’s platform architecture consists of several interconnected components that enable its “Write Once, Run Anywhere” (WORA) philosophy:

Java Virtual Machine (JVM): The runtime engine that executes Java bytecode, providing platform independence and memory management
Java Runtime Environment (JRE): Contains the JVM and standard libraries needed to run Java applications
Java Development Kit (JDK): Includes development tools along with the JRE for creating Java applications

Using this platform architecture allows applications built with Java to have a clean separation between the application itself and the underlying hardware/operating system. Because of this, Java was one of the first languages and platforms to enable scalable cross-platform compatibility, allowing apps to run anywhere while being performant and secure.

Application architecture

Building on the foundation of the underlying platform architecture, Java application architecture refers to the organization of components, classes, and modules within Java applications. Although Java applications can use a wide variety of software design patterns, common Java application architecture patterns include:

Layered architecture: Organizing code into horizontal layers (presentation, business logic, data access)
Model-View-Controller (MVC): Separating application concerns into data models, user interface views, and controller logic
Microservices architecture: Decomposing applications into loosely coupled, independently deployable services
Event-driven architecture: Building systems around the production, detection, and consumption of events
Domain-Driven Design (DDD): Structuring code to reflect the business domain

Java’s unparalleled flexibility has solidified its position as the preferred language for enterprise applications. Numerous Java frameworks facilitate the implementation of best practices derived from common design patterns. In upcoming discussions, we will delve deeper into how Java embeds structure and efficiency into the development process.

Architectural principles in Java

Regardless of the specific software design pattern chosen, several core architectural principles guide Java application design. Developers generally adhere to well-established best practices, such as:

Modularity: Breaking down applications into cohesive, loosely coupled modules
Separation of concerns: Isolating distinct aspects of the application
Dependency injection: Providing dependencies externally rather than creating them internally
Interface-based programming: Programming to interfaces rather than implementations
Testability: Designing components that can be easily tested in isolation

These principles combined with Java’s platform architecture are Java’s secret recipe (or maybe not so secret!) for building and deploying enterprise software applications that are maintainable, extensible, and scalable.

Java architecture components

As discussed, Java architecture can be understood at two levels: the platform components that execute Java code and the application components that structure Java applications. Within these two levels, various components exist that create the overall architecture. Let’s take a deeper look at each of these components.

Platform architecture components

Although many subcomponents exist within the Java platform, they can generally be grouped under three high-level categories we touched on earlier: the JDK, JRE, and JVM.

Java Development Kit (JDK)

The JDK provides the tools needed for developing Java applications, including:

Java Compiler (javac): Converts Java source code into bytecode
Development tools: Including javadoc (documentation generator), jar (archiving tool), and debugging tools
Java Runtime Environment (JRE): For executing Java applications

Java Runtime Environment (JRE)

The JRE provides the runtime environment for executing Java applications, including:

Java Virtual Machine (JVM): The execution engine
Java class libraries: Standard libraries for common functionality
Integration libraries: For database connectivity, XML processing, etc.

Java Virtual Machine (JVM)

The JVM, the cornerstone that makes Java platform independent, includes:

Class loader subsystem: Loads, links, and initializes Java classes
Runtime data areas: Memory areas for execution (heap, stack, method area)
Execution engine: Interprets and compiles bytecode to machine code
Garbage collector: Automatically manages memory

It also supports the Java Native Interface (JNI), which allows Java code to interact with native applications and libraries written in other programming languages like C or C++. These integrations are often achieved using native methods, which are declared in Java but implemented in non-Java code via JNI. To make it a bit easier to comprehend, Oracle created this great visual breakdown of how each component and subcomponent exists within the platform.

Source: https://www.oracle.com/java/technologies/platform-glance.html

While platform-level architecture is crucial for understanding Java’s framework components, the application architecture is paramount when it comes to scalability. Developers and architects wield direct control at this level, making it essential to grasp for creating scalable Java applications.

Application architecture components

Modern Java applications typically consist of a layered approach to architectural components. Although this may vary slightly depending on the framework used or the design patterns being implemented, many use these paradigms as the building blocks. Looking at the image below, you can see how these layers tend to interact with each other.

Source: https://docs.oracle.com/cd/E76310_01/pdf/141/html/operations_guide/reim-og-architecture.htm

Digging in a bit further, you’ll see three distinct layers that developers have direct control over: the presentation, business, and data-access layers.

Presentation layer

This layer handles user interaction and generally consists of:

Controllers: Process user input and coordinate responses
Views/UI components: Display information to users
Data transfer objects (DTOs): Carry data between layers

Business layer

This layer contains the core business logic of the application, consisting of:

Service classes: Implement business operations and workflows
Domain objects: Represent business entities and their behavior
Business rules: Encapsulate company policies and regulations

Data access layer

Lastly, at the lowest level in our hierarchy, the data access layer manages data persistence and retrieval, including:

Repositories: Provide methods for database operations
Data access objects (DAOs): Encapsulate data access logic
Object-relational mapping (ORM): Maps between objects and relational databases

Cross-cutting concerns

Of course, shared between these layers are various cross-cutting concerns that need to be thought of holistically. Within the code and overall application architecture, aspects that span multiple layers include:

Security: Make sure that authentication, authorization, and encryption are handled and applied where needed through these layers. Generally, these mechanisms are applied at multiple or all layers throughout the application.
Logging: Ensure that all application activities and decisions are logged for easier debugging and auditability.
Error handling: The application should effectively manage and report exceptions, in conjunction with the previously mentioned point on logging.
Transaction management: Data consistency is dependent on how transactions are handled throughout the application. Although most critical at the data-access layer, the other layers must also make sure that data is synchronized to minimize any risk of discrepancy.
Caching: Each layer may benefit from improved performance by storing frequently used data within a cache. This can help with API requests, database response times, and many other areas where having caching in place can make the application more performant.

Java execution process

Having discussed the architectural components, it is now crucial to understand the execution process of building and running Java applications. Understanding the Java execution process is essential for optimizing application performance and troubleshooting issues. Because there are multiple steps involved, things can get a little confusing at times for the uninitiated. At a high level we will break it into compilation, loading, and execution.

Compilation process

The Java compilation process converts human-readable source code written by developers into machine-executable instructions. Overall, this consists of three steps:

Source code writing: Developers create .java files containing Java code
Compilation: The javac compiler converts source code to bytecode which is output as .class files
Packaging: Then, the related class files are typically bundled into WAR (Web Application Archive), EAR (Enterprise Archive), or JAR files

To demonstrate what this looks like, let’s look at the code for a very simple “Hello World” application:

// Example HelloWorld.java

public class HelloWorld {

    public static void main(String[] args) {

        System.out.println("Hello, Java Architecture!");

    }

}

Next, we would open a terminal pointed to the directory of our source code file and run:

javac HelloWorld.java

This would compile our code. During compilation, the compiler performs:

Syntax checking
Type checking
Optimization of the code

With the code compiled, we would then run by executing the java command in the same terminal, using:

java HelloWorld

Class loading

When a Java application runs, classes are loaded into memory through a relatively sophisticated loading mechanism. First, the ClassLoader reads .class files and creates binary data representations. It then moves on to linking where it performs:

Verification: Ensures bytecode follows proper format and security constraints
Preparation: Allocates memory for static fields and initializes with default values
Resolution: Replaces symbolic references with direct references

Lastly, things move to the initialization phase, where the execution process executes static initializers and initializes static fields.

Within this process, Java employs three main class loaders:

Bootstrap ClassLoader: Loads core Java API classes
Extension ClassLoader: Loads classes from extension directories
Application ClassLoader: Loads classes from the application classpath

Once the classes are loaded, the next step is for the JVM to actually execute that application.

Runtime execution

The JVM executes the application utilizing a few different mechanisms. Initially, the JVM interprets bytecode instructions one by one and uses JIT (Just-in-time) compilation for frequently executed code, compiling it into native machine code.

While the application is running, there is also automated garbage collection at work, allowing automatic memory management mechanisms to reclaim unused objects. On top of this, thread management is also in play, allowing the JVM to handle concurrent execution through thread scheduling.

Just-In-Time (JIT) vs. Ahead-Of-Time (AOT) compilation in Java

When a Java application runs, the JVM doesn’t execute bytecode as raw machine code. Instead, it uses compilation strategies that convert bytecode into native code at the optimal time—either during execution or in advance. These strategies are called Just-In-Time (JIT) and Ahead-of-Time (AOT) compilation.

Just-In-Time (JIT) compilation

By default Java uses JIT compilation, where bytecode is compiled into native machine code during runtime. The JVM starts off interpreting the code, but as it detects “hot” (frequently executed) methods, it compiles those into optimized native code on the fly using the JIT compiler. This allows the JVM to apply runtime optimizations based on actual program behavior.

Pros:

• Adaptive optimization based on real usage (e.g., method inlining, loop unrolling)

• Shorter startup time compared to AOT

• Works well for long-running applications where performance improves over time

Cons:

• May introduce small runtime pauses during compilation

• Slower warm-up performance, especially for serverless or short-lived applications

Ahead-of-time (AOT) compilation

Introduced in Java 9 and extended in later versions (e.g., via GraalVM), AOT compilation allows you to compile Java bytecode into native binaries before runtime. This is especially useful in cloud native and microservices environments where fast startup and low memory overhead are critical.

Pros:

• Much faster startup time—ideal for CLI tools, serverless functions, and microservices

• Predictable memory usage and reduced warm-up overhead

• Smaller runtime footprint in some cases

Cons:

• Fewer runtime optimizations compared to JIT

• Larger binary sizes (depending on the app and runtime)

• More complex build pipeline (e.g., native image generation with GraalVM)

When to use what?

For most traditional, long lived Java applications (like backend services), JIT is the default and works great. But for modern deployment models—like containerized apps, cold-start sensitive APIs, or serverless functions, AOT is worth considering for reducing latency and memory usage.

Additional steps in the application startup flow

For enterprise Java applications, the startup process includes additional steps:

Container initialization: For applications running in application servers or containers
Configuration loading: Reading properties files, environment variables, and other configuration
Dependency injection: Wiring application components together
Database connection: Establishing connections to databases
Service initialization: Starting various application services

Understanding the execution process helps developers optimize their applications, diagnose performance issues, and enables both developers and architects to make informed architectural decisions.

Memory management in Java

Memory management is one of Java’s defining strengths, making it easier for developers to focus on building features rather than worrying about manual allocation and deallocation. But even though Java automates most of the work through garbage collection (GC), understanding how memory is structured and managed under the hood is critical for building scalable, high-performance applications.

JVM memory structure

The Java Virtual Machine (JVM) divides memory into multiple regions, each with a distinct role in how Java applications are executed.

The heap is the main area where objects are created. It’s divided into:

Young generation: Where most objects start their life. It consists of:
Eden space: New objects are created here.
Survivor spaces (S0/S1): Objects that survive a few GC cycles are moved here temporarily.
Old generation (tenured): Objects that live long enough in the young generation are promoted here. These tend to be core application-level objects like caches or services.

Outside the heap, the JVM manages:

Metaspace: Holds class metadata, such as method definitions and bytecode. This replaced the older PermGen space in Java 8+.
Thread stacks: Each thread has its own stack, which contains method frames, local variables, and call information.
Code cache: Stores native machine code compiled from bytecode by the JIT compiler for improved performance.

Garbage collection explained

Garbage collection in Java works by automatically detecting and reclaiming memory used by unreachable objects—those with no active references in the application.

The process usually follows three steps:

Mark: GC identifies which objects are still accessible by tracing from GC roots like thread stacks and static references.
Sweep: Unreachable objects are removed, freeing up memory.
Compact (in some collectors): The heap may be defragmented to consolidate remaining objects and free space.

Java supports several garbage collection algorithms tailored to different needs:

Serial GC – Simple and suitable for small applications.
Parallel GC – Uses multiple threads to speed up collection; good for throughput.
G1 GC – Breaks the heap into regions and collects them incrementally to reduce pause times.
ZGC and Shenandoah – Advanced collectors that aim for ultra-low pause times, even on massive heaps.

You can specify the GC algorithm via JVM flags (e.g., -XX:+UseG1GC).

What about manual garbage collection?

Although Java provides System.gc() to suggest a garbage collection cycle, it’s rarely needed—and generally discouraged. Modern collectors are highly optimized, and manually forcing GC often does more harm than good (e.g., performance pauses, CPU spikes).

If your application relies on System.gc() to remain stable, it’s usually a red flag indicating deeper problems, such as memory leaks, unbounded caches, or excessive object churn.

While Java abstracts memory management away from the developer, a deep understanding of how memory is structured and reclaimed remains critical, especially for teams working on large, data-intensive, or low-latency systems.

By knowing how memory is allocated and how GC behaves, developers can design applications that are not only functional, but also performant, scalable, and resource-efficient under pressure.

Java security and performance considerations

Security and performance are the twin pillars on which any enterprise-grade Java application is built. Java offers built-in security mechanisms like bytecode verification and class loading isolation at the platform level. However, the real responsibility for security and performance lies with how developers structure and implement their applications.

Security in Java starts with the basics: protect user data, enforce access control, and validate every point of input. One of the most common mistakes developers make is embedding user input directly into SQL queries. That’s a recipe for disaster. SQL injection attacks are just one of the many risks you’ll face if you do that. Here’s an example of what a vulnerable statement would look like:

String sql = "SELECT * FROM users WHERE username = '" + username + "'";

Statement stmt = connection.createStatement();

ResultSet rs = stmt.executeQuery(sql);

On the flip side, here’s an example of one way of doing this safely by using a PreparedStatement that ensures that SQL injection attacks are not possible.

String sql = "SELECT * FROM users WHERE username = ?";

PreparedStatement pstmt = connection.prepareStatement(sql);

pstmt.setString(1, username);

That’s just one of the many ways to handle input safely. Beyond that, secure applications use encryption libraries, validate JWTs for authentication, follow the principle of least privilege when interacting with files, networks, or databases, and integrate logging and monitoring early on to detect unauthorized access or unusual behavior across services.

Performance is where Java really shines. If you know how to read the signs, you can tune the JVM for low-latency or high-throughput workloads. Modern garbage collectors like G1 or ZGC can help minimize pause times when you know how to use them. However, most performance wins come from the application layer.

Take connection management, for example. Opening a new database connection on every request is expensive. In practice, code that does this generally looks like this:

try (Connection conn = DriverManager.getConnection(...)) {

    // do work

}

The better approach is connection pooling, where a connection can be used throughout the application instead of having individual ones spun up all the time. This is what this may look like if we were to use something like HikariCP to create and manage connection pools:

DataSource ds = new HikariDataSource();

Connection conn = ds.getConnection();

Observability plays a critical role in verifying that an application is performing as expected. , Tools like JVisualVM, JFR (Java Flight Recorder), or distributed tracing frameworks let you see how your code behaves under pressure. You can then begin to look for clues that performance is taking a hit, answering questions like are memory spikes happening after a specific API call? Are threads getting blocked unnecessarily? Being able to see these performance metrics in a dashboard allows for easier optimizations and avoids components that are performing poorly from hitting production.

Building secure, high-performance Java applications is not just about checking boxes. It’s about being aware of the risks, using the right tools, and paying attention to the details. When you do that well, Java becomes a platform you can scale with confidence.

Real-world applications of Java

Java has been proven in many real-world scenarios, especially in large-scale enterprise applications. Its platform independence, mature ecosystem, and performance have made it the language of choice in industries that require reliability, scalability, and maintainability.

In financial services, Java is used to build trading platforms, portfolio management tools, and real-time analytics systems. The language’s focus on performance, memory safety, and concurrency support makes it perfect for applications that need speed and accuracy, like algorithmic trading and risk assessment engines.

E-commerce platforms use Java to manage high traffic, secure transactions, and complex product catalogs. Its ability to support modular application structures—combined with frameworks like Spring—makes it a good choice for teams that want to build scalable backend services that can evolve over time.

For enterprise resource planning (ERP) and business process management (BPM) systems, Java’s modularity and support for multi-layered architecture allow businesses to integrate different functions like HR, finance, and supply chain into one platform. Java-based platforms have been widely adopted in these domains because of their extensibility and long-term support.

When it comes to mobile development, Java is the primary language used to build Android applications. While Kotlin is now the official language for Android, Java is still used in existing apps and is fully supported by the Android software development kit (SDK), so it’s part of the Android ecosystem.

In big data and analytics, Java powers many of the foundational technologies used for distributed processing. Frameworks like Apache Hadoop, Apache Kafka, and Apache Spark either support or are written in Java, so it’s the natural choice for building scalable data pipelines and processing engines.

Finally, cloud-native applications use Java frameworks like Spring Boot to build microservices that can be deployed using container orchestration platforms like Kubernetes. Java’s ecosystem has evolved to support cloud requirements like observability, fault tolerance, and seamless CI/CD integration.

Whether in banking, retail, logistics, or analytics, Java is the backbone of applications that need to scale, stay secure, and be maintainable over time. Its ecosystem of tools and frameworks and large community of developers ensures it remains relevant in the ever-changing technology landscape.

Java architecture examples

Java architecture becomes more meaningful when applied in practice. Let’s look at a few examples that show how different architectural patterns are implemented using Java in enterprise applications.

A classic example of Java’s layered and MVC architecture is a standard web application built with the Spring Framework. In this setup, the presentation layer is Spring MVC controllers that handle HTTP requests and route them to the appropriate service methods. The business logic is in service classes that encapsulate workflows and orchestrate actions between layers. The data access layer is typically Spring Data JPA (Java Persistence API) and provides a clean and abstracted interface to the database. Applications like internal HR portals or CRM systems use this model because of the clear separation of concerns and maintainability.

In more complex environments, Java is often the backbone of microservices-based architectures. For example, an e-commerce platform might be decomposed into independent services like Product, Order, and Payment. Each microservice is its own Java application built using Spring Boot and communicates with others via REST APIs or messaging systems like Kafka. These services are deployed in containers (e.g., Docker) and orchestrated using Kubernetes for horizontal scalability and resilience. Companies like Netflix and Amazon have popularized this approach and show how Java can power large-scale globally distributed systems.

Java is also used in data processing and analytics. Tools like Apache Spark, written in Scala but fully compatible with Java, allow developers to write Java-based Spark jobs for processing huge amounts of data. For example, a logistics company might use Java with Spark to analyze real-time delivery data and optimize routes. In this case, Java’s ability to handle concurrent processing and its deep ecosystem of libraries makes it well suited for high-throughput computing environments.

These examples show the versatility of Java’s architecture, whether it’s handling the front-end of a web app, powering independently scalable services, or crunching massive datasets in real-time. Regardless of the use case, Java’s architecture provides the modularity, reliability, and performance that modern enterprises demand.

How vFunction can help refactor and support microservices design in Java

When it comes to architecting a Java application, many organizations are opting to move towards microservices. The choice to refactor existing services into microservices or to build them net new can be challenging. Refactoring code, rethinking architecture, and migrating to new technologies can be complex and time-consuming. vFunction is a powerful tool for modernizing and managing Java applications. By helping developers and architects simplify and understand their architecture as they adopt microservices or refactor monolithic systems, vFunction’s architectural observability provides the visibility and control needed to scale efficiently and adapt to future demands.

vFunction analyzes and assesses applications to identify and fix application complexity so monoliths can be more modular or move to microservices architecture.

Let’s break down how vFunction aids in this process:

1. Automated analysis and architectural observability: vFunction begins by deeply analyzing your application’s codebase, including its structure, dependencies, and underlying business logic. This automated analysis provides essential insights and creates a comprehensive understanding of the application, which would otherwise require extensive manual effort to discover and document. Once the application’s baseline is established, vFunction kicks in with architectural observability, allowing architects to actively observe how the architecture is changing and drifting from the target state or baseline. With every new change in the code, such as the addition of a class or service, vFunction monitors and informs architects and allows them to observe the overall impacts of the changes.

2. Identifying microservice boundaries: One crucial step in the transition is determining how to break down an application into smaller, independent microservices. vFunction’s analysis aids in intelligently identifying domains, a.k.a. logical boundaries, based on functionality and dependencies within the overall application, suggesting optimal points of separation.

3. Extraction and modularization: vFunction helps extract identified components and package them into self-contained microservices. This process ensures that each microservice encapsulates its own data and business logic, allowing for an assisted move towards a modular architecture. Architects can use vFunction to modularize a domain and leverage the Code Copy feature to accelerate microservices creation by automating code extraction. The result is a more manageable application that is moving towards your target-state architecture.

4. Bring clarity and control to your Java microservices architecture: Once applications have been broken into microservices, maintaining architectural integrity in Java environments may become challenging as different teams run through rapid release cycles. vFunction helps teams govern and manage these distributed systems by continuously analyzing service interactions, detecting architectural drift, and identifying anti-patterns like circular dependencies or overly complex flows. With real-time visualization, automated rule enforcement, and deep insights powered by OpenTelemetry, vFunction ensures your Java microservices architecture stays resilient, scalable, and aligned with best practices.

Key advantages of using vFunction

Engineering velocity: vFunction dramatically speeds up the process of creating microservices and moving monoliths to microservices, if required. By streamlining the Java architecture, vFunction helps modernized legacy applications increase deployment velocity, making it easier for teams to deliver updates faster, with fewer delays and less risk.
Increased scalability: By helping architects view their existing architecture and observe it as the application grows, scalability becomes much easier to manage.. With insights into service interactions, modularity, and system efficiency, teams can identify bottlenecks, improve component design, and ensure their applications scale smoothly as demand grows.
Improved application resiliency: vFunction’s comprehensive analysis and intelligent recommendations increase your application’s resiliency by supporting a more modular architecture. By seeing how each component is built and interacts with the other, informed decisions can be made in favor of resilience and availability.

Conclusion

Java’s architecture, spanning platform and application aspects, underpins its enduring success with features like “Write Once, Run Anywhere,” strong memory management, and security. By grasping JVM, JRE, JDK, and key application patterns, such as layered architecture and microservices, developers can craft scalable, secure Java apps. Understanding the execution process, memory management (including garbage collection), and optimizing performance contribute significantly. Real-world applications in varied sectors showcase Java’s adaptability.

By prioritizing security and performance, developers create applications tailored for modern enterprise needs. Java’s evolving nature is complemented by tools like vFunction and frameworks like Spring Boot for managing and evolving complex systems. This foundation equips developers to ensure their Java applications stay resilient, efficient, and ready for what’s next.

Software dependencies: Types, management & solutions

Posted on March 18, 2025 by Eldad Palachi - Featured

If you’re building software, chances are you have various software dependencies within your code base. These dependencies are on external components like libraries, frameworks, or modules that are part of almost every modern application. Just like how each puzzle piece is crucial to completing the picture, every dependency is essential for building and running your software application efficiently.

The modern software landscape is built upon a vast ocean of reusable code, enabling developers to speed up development and leverage existing high-quality software components maintained by a community of experts. However, these dependencies may also introduce complexity and require maintenance over time. Managing dependencies well is critical for application stability, security, and performance. A single poorly managed dependency can bring down the entire system.

This blog will explore the essentials of software dependencies, including their types, management techniques, and tools. Ideal for both seasoned developers and newcomers, mastering dependency management is crucial for efficient and secure software development.

What is a software dependency?

At its core, a software dependency describes a relationship where one piece of software relies on another to function properly. Think of them as components added to your application to enable its functionality. Modern software applications rarely exist in isolation, typically building on existing code, libraries, and frameworks, incorporating essential functionality to avoid starting from scratch.

To illustrate how this can work, consider a web application built using a framework like React. The use of React is a dependency; without it, the application won’t work. Another example would be a Python script that performs complex mathematical operations using the NumPy library. NumPy provides optimized computation capabilities. Developers don’t need to build these components from scratch. Instead, they include the dependency and use its functionality within their app.

Dependencies allow developers to focus on their application’s unique parts rather than re-invent standard and common functionality. They enable code reuse and speed up development cycles and standardization. However, using dependencies introduces a chain of interconnected components that need to be managed.

Dependencies include:

Libraries: Collections of pre-written code that provide specific functionality.
Frameworks: Structural foundation that provides a template for building applications.
Modules: Self-contained units of code that provide specific features.
APIs (Application Programming Interfaces): Interfaces that allow different software components to talk to each other.

Essentially, any external component an application relies on to work is a software dependency. Understanding this fundamental concept is the first step to managing dependencies effectively.

How do you identify software dependencies?

The first step in dependency management is identifying them. Without this knowledge, you risk version conflicts, security vulnerabilities, and runtime errors. Understanding your dependencies—their uses and potential issues—is crucial for efficiency and stability.

Identifying dependencies can vary depending on the programming language, development environment, and tools used. Several common ways exist, from highly straightforward to less standard approaches. Let’s take a look at some of them.

Package and build managers

Most modern programming languages have package and build managers that automate the process of installing and managing dependencies. For example, Node.js has npm (Node Package Manager) or yarn, Python has pip, Java has Maven or Gradle, and .NET has NuGet. These tools use a manifest or build files (like package.json, requirements.txt, or pom.xml) that specify all the project’s direct dependencies. By looking at this file, developers can quickly see the libraries and frameworks their application relies on. Some code IDEs visualize these dependencies and list all the derived/indirect dependencies brought by the direct dependencies

Software Composition Analysis

Software Composition Analysis (SCA) tools identify and manage security risks, outdated dependencies and licensing issues in your software. Modern applications rely on open source components but without oversight, these can introduce vulnerabilities and legal risks. SCA tools scan your projects to find all direct and transitive dependencies, cross check them against vulnerability databases and highlight the risks. They also ensure open source license compliance and recommend secure and up to date versions of libraries. By hooking SCA into your development workflow using tools like Snyk, Black Duck, Sonatype Nexus, and GitHub Dependabot, you can secure your applications proactively and reduce risk with minimal overhead to your development effort.

Manual inspection

While automated tools are helpful, manual inspection of the codebase is still essential. Reviewing import statements, library calls, and project documentation can provide valuable insights into the dependencies on which your application relies and in which context. This is especially important for identifying unnecessary dependencies that can be removed and simplify the implementation. It also helps, in the case of relying on a manifest file, to see which dependencies are actually used. For instance, in Node, to use a dependency in a component, you have to import it explicitly. Inspecting the code will allow you to verify this since manifest files may include dependencies that aren’t actually used upon manually inspecting the code.

Build tool outputs

Build tools typically list resolved dependencies in their output, revealing direct and transitive dependencies used in your application’s construction. However, this method can be unreliable as builds may omit dependencies included in prior iterations, complicating the identification of newly installed dependencies. This approach is least advisable and should be considered a last resort, such as when source code is inaccessible but build logs are available.

Developers can use one or more of these methods to get a complete picture of their application’s dependencies. That said, not all dependencies are equal or straightforward.

Static and dynamic analysis

Architectural observability pioneer, vFunction, can visualize the dependencies between software components within Java and .Net monolithic applications, including the details of which classes are using the dependencies and if there are any circular dependencies (which is a design flaw that must be corrected).

Dynamic analysis identifies dependencies in runtime, while static analysis composes a full dependency graph between the classes. Static and dynamic analysis methods complement each other, as some components might not be used at runtime. Some dependencies might be detected at runtime even if the binaries provided to the static analysis are missing or some software elements used by the application are generated at runtime.

Below is an extreme example from a highly complex Java monolithic application that is partitioned to many JAR (Java ARchive) libraries. Every node in the graph is a JAR file and every line is a dependency between the JAR files. The orange lines highlight circular dependency paths which are nontrivial bidirectional relationships in this case (as seen in the graph). A circular dependency is a critical architectural design flow creating complexity which may lead to build and runtime malfunctions that are hard to diagnose and fix, as well as a maintenance issue due to the tight coupling between the library components.

Hovering over a line in the below graph lists the dependencies between classes across the two jar files.

Graph in vFunction highlighting dependencies between JAR files.

Types of software dependencies

There’s a reason why they call it “dependency hell.” Modern software generally contains a complex web of dependencies, where each dependency can recursively rely on others, creating a multi-tiered structure that is far from ideal. They come in various forms, each with its own characteristics and implications for your project. Understanding these distinctions is crucial for management and anticipating how dependencies may impact your project.

Direct dependencies

Direct dependencies are the libraries or packages your project explicitly declares and imports. They are the components you’ve consciously chosen to include in your application. For example, if you’re building a web application with Node.js and you use the Express package for routing, Express is a direct dependency. Direct dependencies are the easiest to identify and manage in most modern languages or frameworks, as they are usually listed in the project’s manifest file (e.g., package.json, requirements.txt)..

Transitive dependencies

Transitive dependencies, or indirect dependencies, are the libraries that your direct dependencies rely on to function (a direct reference to dependency hell that we discussed earlier). For instance, if Express relies on the debug package for logging, debug is a transitive dependency of your application. Transitive dependencies can create a complex web, making it hard to understand the full scope of your application’s dependencies. They can introduce security vulnerabilities or version conflicts that slip under the radar if not managed carefully. This is where tooling can help determine if transitive dependencies introduce risk or security issues.

Development dependencies

Development dependencies are the tools and libraries required in the development process but not for the application to run in production. Examples include testing frameworks (e.g., Jest, JUnit), linters (e.g., ESLint, PyLint), and build tools (e.g., Webpack, Gradle). These dependencies help improve code quality, automate testing, and streamline the development workflow. They are usually separated from production dependencies to minimize the size and complexity of the deployed application.

Runtime dependencies

Runtime dependencies are the libraries and packages required for the application to run in the production environment. These dependencies provide the core functionalities the application relies on. Examples include database drivers, web frameworks, and networking libraries. Managing runtime dependencies is critical for application stability and performance.

Optional dependencies

Optional dependencies are dependencies that enhance the application’s functionality but are not strictly required for it to run. They provide additional features or capabilities that users can choose to enable. For example, a library might offer optional support for a specific file format, database, or operating system. Optional dependencies allow developers to provide a more flexible and customizable application that only includes these dependencies if required for their specific use case/build.

Platform dependencies

Platform dependencies are specific to your application’s operating system or environment. These dependencies may include system libraries, device drivers, or platform-specific APIs. Managing platform dependencies can be challenging as they often require careful configuration and testing across different environments. Modern portable languages and containerization reduce, but do not completely eliminate these issues that arise in specific scenarios.

Most dependencies fall somewhere within this spectrum. Understanding the different types of dependencies allows developers to make informed decisions about dependency management, ensuring the dependencies used are needed but also ensuring they are stable, secure, and performant. Now let’s look at some common ways to manage them within an application’s code base.

How do you manage software dependencies?

Managing software dependencies is not just about installing the libraries and going forward with adding anything and everything to your project; it has to be more pragmatic. Part of the pragmatism of dependency management is about having a process to ensure the dependencies used within your application are stable, secure, and maintainable throughout its life. Effective dependency management combines best practices, tools, and vigilance (especially regarding security and performance). Let’s look at some common best practices for managing dependencies.

Use a package manager

Package managers are essential tools for managing dependencies. They automate the installation, update, and removal of dependencies, as well as maintain consistency across different development environments. Package managers also resolve version conflicts and have a centralized repository for dependencies. Most languages have a preferred package manager, so getting started is generally not too hard; choose a package manager that fits your language and project (e.g., npm for Node.js, pip for Python, Maven for Java) and begin using it.

Pin versions

Version pinning specifies the exact version of each dependency your application requires. This prevents changes due to automatic updates, which can introduce breaking changes or compatibility issues. By pinning versions, you ensure your application always uses the tested and compatible versions of its dependencies. But review and update pinned versions periodically to get the bug fixes and security patches, potentially bumping the pinned version up while also performing regression testing for compatibility.

Scan dependencies

Use security scanning tools to scan your dependencies for known vulnerabilities regularly. These tools scan your project’s dependencies against vulnerability databases and alert you to potential security risks. This proactive approach helps identify and fix security issues before they can be exploited. Integrate dependency scanning into your CI/CD pipeline to catch vulnerabilities early in development. Integrating scans into each commit process ensures early detection of issues, allowing developers to address problems before they reach production. This is a big component of the “shift-left” movement and mindset.

Keep dependencies up-to-date

Keeping your dependencies up-to-date is crucial for getting bug fixes, performance improvements, and security patches. Tools like Dependabot can be really helpful in automating this. Remember that updating dependencies can also introduce risks, as new versions may introduce breaking changes. Have a clear process for updating dependencies, including testing and rollback mechanisms, to minimize the risk of downtime. You likely also want to consider using semantic versioning to understand the impact of updates as you roll out newer versions of your app with update dependencies.

Isolate dependencies

Use virtual environments or containers to isolate dependencies for different projects. This prevents conflicts between dependencies that may have different versions or requirements. Virtual environments create isolated spaces where each project has its own set of dependencies, so changes to one project won’t affect the others. Containers provide a more comprehensive isolation mechanism, packaging the application and its dependencies into a portable unit.

Document everything

Document your project’s dependencies, including versions, purposes, and specific configuration requirements. This will be a valuable resource for developers to understand the application’s dependencies and troubleshoot issues. To make things even easier, you can usually access or even generate certain sorts of documentation right from your package manager itself. For instance, you could use npm docs or pip show to help access and generate documentation from your package manager command line interface (CLI).

vFunction provides a report of standard Java and .Net libraries to detect the usage of aging frameworks as seen in the table below. A library or framework is marked aging if the version has an older non-minor version or it is more than two years old.

vFunction generates a report identifying aging Java and .Net libraries.

Do a dependency audit

Lastly, regularly audit your project’s dependencies to remove unused or outdated ones. Unused dependencies increase the size and complexity of your application, while outdated dependencies can introduce security vulnerabilities. Use depcheck or pip-check to remove unused dependencies from your project.

Following these best practices and using the right tools will give you a well-rounded dependency management process. Since dependencies are crucial to how your software functions, it makes sense to closely monitor which dependencies are used and how. Want a bit more clarity on what dependencies are? Let’s examine some examples next.

Software dependencies examples

Let’s see some concrete examples of software dependencies across different programming ecosystems:

Web application (Node.js)

Imagine a modern web application built with Node.js. To build such an application, we would mostly likely use one or more of the following dependencies:

Express.js: A web framework for routing, middleware, and HTTP requests.
MongoDB Driver: A library that interacts with a MongoDB database.
React: A JavaScript library to build user interfaces with a component-based approach.
Axios: A library to make HTTP requests to external APIs or services.
JWT (JSON Web Token): A library to implement authentication and authorization.

Data analysis script (Python)

When using Python to analyze data, most developers and data scientists use various dependencies to help them. Here are a few common ones you’d most likely see in a data analysis script:

NumPy: A fundamental library for numerical computing with array objects and mathematical functions.
Pandas: A data manipulation and analysis library with DataFrames for efficient data handling.
Matplotlib: A library to create static, interactive, and animated visualizations.
Scikit-learn: A machine learning library with tools for classification, regression, clustering, and dimensionality reduction.

Mobile application (Android – Java/Kotlin)

For mobile apps, using a vast amount of dependencies is also the norm. For instance, in Android mobile app development, you might find:

Retrofit: A type-safe HTTP client.
Gson: A library to convert Java objects to JSON and vice versa.
Glide: An image loading and caching library.
Room Persistence Library: An abstraction layer over SQLite.

These examples show how dependencies are the foundation of various software projects. Although these are very simple examples, it does demonstrate how many of the core functions we bring into our applications are handled by dependencies. So, how do you bring these dependencies into your project? As mentioned before, this will likely require some dependency management tools to handle.

Software dependency management tools

If your project is using dependencies (which almost every one will be), managing them is key to any software project. Luckily, there are many tools to help with that, automating tasks and providing insights and order to the complex world of dependencies. Here are the most popular and widely used dependency management tools across a variety of common languages:

npm (Node Package Manager)

Language: JavaScript

Description: The default package manager for Node.js, npm, gives you access to a massive registry of JavaScript packages to easily find and install the dependencies your project needs.

Features:

Simple package installation and management.
Version management (specifying ranges or pinning to specific versions).
Automatic dependency resolution.
Ability to publish your own packages.

pip (Python Package Installer)

Language: Python

Description: Pip is the standard package manager for Python, and it simplifies the installation and management of Python packages from the Python Package Index (PyPI) and other repositories.

Features:

Straightforward package and dependencies installation.
Tools for virtual environments to isolate project dependencies.
Supports different versioning schemes.
Ability to install from source code or wheels.

Maven

Language: Java

Description: A build automation tool that excels at dependency management, primarily used for Java projects. It uses a declarative approach with dependencies documented within a pom.xml file.

Features:

Central repository (Maven Central) for easy access to dependencies.
Standardized build lifecycle with phases for compile, test and package.
Extensive plugin ecosystem.
Support for multi-module projects.

Gradle

Language: Java, Kotlin, Groovy, and others

Description: A flexible and highly customizable build automation tool that also provides dependency management. It uses a Groovy-based DSL to define builds and dependencies.

Features:

Incremental builds for performance.
Support for many languages and platforms.
Powerful dependency management with support for multiple repositories.
Extensible with plugins and custom tasks.

Chances are that if you are already working in these languages, you are already using these tools. These tools, each with its strengths and focus, make modern software development much easier with their approach to dependency management. Although dependencies can be added to a project manually, these tools make this process much easier and scalable.

vFunction: Dependency management at the architectural level

Traditional dependency management tools typically focus on individual packages and libraries. In contrast, vFunction takes a broader approach, managing dependencies at the architectural level. To build quality software, clean architecture is essential—it has far-reaching impacts across the entire application. This means reducing unnecessary architectural dependencies. As an architectural observability platform, vFunction enables teams to visualize, analyze, and manage dependencies within the context of the overall application architecture.

vFunction’s key features for dependency management are:

Visualizing architectural dependencies: vFunction generates interactive diagrams that map out the dependencies between different components and services in your application. This gives you a clear and complete view of how different parts of your system interact.
Detecting architectural drift: As applications evolve, their architecture can drift from its original design, often because of new planned or unplanned dependencies or changes in how components interact. vFunction detects this drift and helps you maintain architectural integrity.
Analyzing the impact of changes: Before making changes to your application, vFunction allows you to analyze the potential effect of changes on dependencies and the architecture. This helps you avoid unintended consequences and make changes safely.
Managing technical debt: vFunction identifies and helps manage technical debt related to dependencies, including outdated libraries and complex issues like circular dependencies between services and components. This insight allows you to prioritize refactoring and improve your application’s long-term maintainability.

vFunction goes beyond simply showing the dependencies within your code. It illustrates how individual modules and internal dependencies are connected and function within the broader system. This holistic approach gives teams a complete view of code and architectural dependencies, highlighting their impact on the overall application architecture. As a result, vFunction empowers teams to make more informed decisions, reduce risk, and enhance the health and maintainability of their applications.

Conclusion

Software dependencies are the building blocks of modern software development. They let developers reuse code, speed up development, and build complex applications more efficiently. Managing these dependencies is crucial to application stability, security, and maintainability.

Throughout this blog, we’ve covered the different aspects of software dependencies, from their definition and types to the challenges and best practices for managing them. We’ve looked at traditional package managers like npm, pip, Maven, and Gradle, as well as vFunction, which offers an architectural perspective on your projects’ dependencies.

Ready to take control of your software architecture and dependencies?

Try vFunction for free and experience the power of architectural observability. Gain a deeper understanding of your application’s dependencies, identify potential risks, and make informed decisions to improve the health and maintainability of your software.

The true measure of software quality

Posted on February 28, 2025 by Amir Rapson - Featured

How to measure software quality: Architecture or code

This piece originally appeared in AWS ComSum Quarterly, an independent publication dedicated to knowledge-sharing within the AWS community. For this edition, Amir Rapson, CTO of vFunction, guest-edited the issue to highlight a critical truth: software quality isn’t just about code—it’s about architecture.

The principles of good software are a popular discussion topic at vFunction. Improving software quality is at the heart of our mission—whether for cloud migration, cost reduction, or simply building better software. Our focus, architectural observability, centers on improving applications via software architecture. You can have bad software built entirely of good code, because software quality isn’t just about clean syntax or following patterns. Software architecture is the crucial element of software quality.

Yet, we still encounter software architectures that make us question our principles and spark discussions like “But why is this bad?” or “How would you fix this?”

This article sets the groundwork for what makes software truly good—not just at the code level but at the architectural level. We hope sharing our perspective provides valuable insights and meaningful discussions on the essential elements of good software.

What makes one piece of software better than another?

To define software quality, let’s start with some common ground. Imagine that every piece of software meets its current requirements and satisfies its current user needs—it’s easy if you try. With that assumption, we can easily differentiate between good software and better software. Now, suppose this software is reliable under the current conditions—it isn’t hard to do—and even performs well with the current resources—I wonder if you can...

Alas, software is never static. Requirements, users, and usage patterns and conditions are always subject to change. What is considered “ok” in terms of operational costs and performance today may be the cause of a major headache tomorrow.

Assuming no software is future-proof—if it were, why wasn’t it released sooner?—then, software quality in this imaginary world can be defined by how easily it adapts:

Functionality & usability: Can the software be easily modified, updated, and repaired to meet new requirements and usage patterns?
Security & portability: Can updates to security vulnerabilities be made quickly without risking the stability of the software? Can the software run in new environments and platforms with minimal changes?
Reliability & quality: Can the software perform reliably despite constant changes, with little impact on other components? Will it deploy efficiently with changes?

Good architecture, good code, and their contribution to good software

A fabulous article from 2012, “In Search of a Metric for Managing Architectural Technical Debt,” describes a simple model to calculate architectural technical debt. By defining this metric, the authors imply that good architecture is one with minimal architectural technical debt. The article states that the “cost of reworking” a software element is the sum of the cost of reworking all its dependent elements. This means good architecture minimizes dependencies—whether code classes, modules, or libraries. Fewer dependencies ensure when we modify a piece of software, we only need to rework a small set of elements vs. a cascade of rework across the system.

Returning to the definition of better software—we can now say that better architecture makes for better software.

Good code is another matter. A class with good code can minimize the rework effort for that specific class. With readability, coding standards, and code reviews, a class can be maintained more easily. The effect of good code is primarily limited to its class. In other words, you can have bad software built with good code.

For further exploration, you may want to read “Is your software architecture as clean as your code?” which explains the principles of good architecture that lead to minimizing dependencies, and “Measure It? Manage It? Ignore It? Software Practitioners and Technical Debt” which emphasizes the importance of architectural technical debt as a source of technical debt.

AWS ComSum is an independent, community-driven initiative that fosters collaboration among AWS users, helping them navigate the ever-evolving cloud landscape.

Quality practices that support software quality

Besides the key element of having good software architecture, other practices contribute to the quality of software and reduce the cost of reworking or adding new capabilities to an existing software application:

Clear requirements & design: Clear and precise requirements and a detailed blueprint of the potential dependent elements ensure that engineers understand the task at hand. Bad requirements and design will lead to confusion and possible rework.
Robust testing & QA: Conducting various levels of testing (unit, integration, system, load, and acceptance testing) ensure the software functions correctly and meets quality standards. The more automated and complete the testing, the easier it is to minimize the time to release software. Less dependencies for the changed element allow QA to focus on the specific functionality rather than conducting acceptance on the entire system.
Automated CI/CD, security testing and tooling: Automating the process of code integration, testing, vulnerability testing, and deployment to ensure rapid and reliable delivery of software updates.

With these elements, software engineers can develop high-quality software that meets user expectations and performs reliably. However, while these practices support a good software development lifecycle, they do not ensure that the software itself is of good quality.

How to measure software quality

If you’ve made it this far, then you may agree that evaluating software quality predominantly involves assessing its architecture. Here are some approaches and metrics to evaluate it:

Architectural peer reviews: Systematically evaluate architectural decisions and trade-offs by conducting reviews with architects and engineers. Assess the architecture’s quality, feasibility, and alignment with requirements.
Documentation quality: Ensure the architecture is well-documented, including diagrams and design decisions to support understanding, maintenance, and evolution as the software changes. Strive, through the use of tools, to keep your documentation accurate and up to date with minimal overhead.
Coupling & cohesion: Measure the degree of dependency between components (coupling) and the degree to which components are related (cohesion). At vFunction, we call this metric “exclusivity,” which measures the percentage of classes or resources required solely for the component.
Technical debt & architectural complexity: Measure the amount of work required to fix issues in the architecture. For instance, you can count story points in your refactoring backlog or sum up weighted scores on your to-dos.
Modularity: Assess the degree to which the architecture supports modularization and reuse of components. Look for ways to ensure and monitor modularity through compile-time boundaries and runtime monitoring.
Code complexity & code churn: Use existing tools to measure cyclomatic complexity and maintainability. Although certain parts of the code need to be complex, if all your code is complex, you have a problem. Code churn is the percentage of code that’s added and modified over time. If the same code changes all the time, or too many classes get checked in at once, there are probably too many dependencies in the code.

By employing these approaches and metrics, you can effectively assess the quality of your software architecture and ensure it can support future changes to the application.

Other measures that support software quality

As you consider how to measure software architecture quality in your organization, the following measures serve as a safety net for software quality. Ironically, these are measured more often than software quality itself. Here are some common metrics and methods:

Test coverage: The percentage of code executed by the test cases, ensuring that all parts of the code are tested. Think about mimicking production behavior and not just covering every line in your code, since the same code can be executed in different contexts by your real users. In vFunction, we call this “Live Flow Coverage.”
Reliability metrics (MTBF, MTTR): The average time between failure and the time taken to repair the software following a failure. Application performance monitoring (APM) tools are built to monitor this.
Performance issues and error rates: The frequency of performance glitches and user errors while interacting with the software. This, too, can be monitored with APM tools.

GenAI, code quality, and its contribution to software quality

GenAI tools enhance code quality by automating and assisting with various aspects of coding, including:

Code generation: Generating boilerplate code, repetitive code patterns, and even complex algorithms, which reduces human error and improves consistency.
Code reviews: Assisting in code review processes by identifying potential issues, code smells, and suggesting improvements.
Testing: generating unit tests, integration tests, and other forms of automated testing to ensure code correctness and reliability.

GenAI also plays a lesser role in improving overall software quality by supporting higher-level aspects of software development, such as:

Requirements analysis & design: Analysing user requirements and generating relevant documentation.
Performance optimization: Suggesting optimizations to enhance software performance.

While GenAI tools are particularly effective in enhancing code quality by automating and optimizing coding tasks, their contribution to software quality is limited. If GenAI tools can create a lot more code, then measuring and maintaining good quality and good architecture becomes even more critical.

Software architecture: The foundation of software quality

Good software architecture is essential for building high-quality applications that adapt easily to changing requirements and support future enhancements. By measuring and assessing the quality of your architecture, you can ensure that your software will be able to meet the changing needs of your users and business.

Consider how your organization currently measures software quality, and the growing importance of doing so in a world increasingly driven by GenAI tools. Take proactive steps to ensure your software is reliable, maintainable, and scalable, which is what engineering excellence is all about.

What is software architecture? Checkout our guide.

Bridging the gap in microservices documentation with vFunction

Posted on January 29, 2025 by Lee Altman - Featured

Keeping microservices documentation up to date is a challenge every team knows too well. With competing priorities—building, deploying, and keeping up with the pace of innovation—documentation often falls by the wayside or gets done in sporadic bursts. Clear, current architecture diagrams are critical for productivity, yet creating and maintaining them often feels like an uphill battle—too time consuming and always a step behind your system’s reality.

To address these challenges, vFunction builds on its recently released architecture governance capabilities with new functionality that simplifies documenting, visualizing, and managing microservices throughout their lifecycle.

Real-time documentation for modern development

Seamless integration with existing tools and workflows, including CI/CD pipelines and documentation platforms, is essential for effective application development and management. New functionalities in vFunction’s architectural observability platform enable you to integrate and export vFunction’s architectural insights into your current workflows for greater efficiency and alignment.

Sequence diagrams: From static images to dynamic architecture-as-code diagrams

One of the standout features of our latest release is the ability to generate sequence diagrams based on runtime production data. We now track multiple alternative paths, loops, and repeated calls in a single flow—simplifying and speeding the detection and resolution of hard-to-identify bugs. Behind the scenes, we use Mermaid, a JavaScript-based diagramming and charting tool selected for its simplicity and compatibility with other programs. Mermaid uses Markdown-inspired text definitions to render and modify complex diagrams. The latest release also provides the option for the user to export these diagrams as code, specifically as Mermaid script.

mermaid based sequence diagram — *Mermaid-based sequence diagram in vFunction.*

export architecture as code — *Export architecture-as-code written in Mermaid syntax from vFunction.*

architecture as code into documentation — *Bring the architecture-as-code into your documentation tool of choice to share live sequence flows captured by vFunction.*

Exporting flows in Mermaid script retain all the architectural details needed for teams to embed, modify, and visualize diagrams directly within tools like Confluence or other documentation platforms. This support ensures teams can reflect live architecture effortlessly, maintaining dynamic, up-to-date documentation that evolves alongside their systems.

Quickly identify flows with errors

vFunction simplifies troubleshooting complex microservices by using sequence diagrams to identify flows with errors. These flows are marked in green (ok) or red (error), making it easy for developers to identify issues quickly. To see the number of calls and errors, developers can toggle between flows and dive into the sequence that caused the issue. Flows with an error rate above 10% are highlighted, making it easy to sort and prioritize areas that need attention. This streamlines the debugging process, helping teams quickly identify and fix issues, improving overall application reliability.

Support for the C4 model and workflows

The C4 model is a widely used framework that improves collaboration by illustrating the complex structures and interactions between context, container, component, and code (C4) diagrams within software systems. Now teams can export and import C4 container diagrams with vFunction to help detect architectural drift and support compliance, enhancing how engineers conceptualize, communicate, and manage distributed architectures throughout their lifecycle.

oms exported as C4 diagram — This diagram illustrates our Order Management System (OMS) demo application, exported as a C4 diagram and visualized as a graph with PlantUML. Support of C4 enables seamless export of live architecture into widely used formats for broader insights and collaboration.

Using “architecture as code,” vFunction aligns live application architecture with existing diagrams, acting as a real-time system of record. This ensures consistency, detects drift, and keeps architecture and workflows in sync as systems evolve. By moving beyond drift measurements within vFunction and enabling teams to compare real-time architectural flows against C4 reference diagrams, this capability ensures that teams can identify where drift has occurred and have the context to understand its impact and prioritize resolution.

Grouping services for better microservices management

If you have many microservices, you can now group services and containers using attributes like names, tags, and annotations, including those imported from C4 diagrams. This feature enhances governance by allowing you to apply architecture rules that enforce standards, prevent drift, and maintain consistency across your microservices. It helps teams organize, manage, and monitor their microservices architecture more effectively, ensuring alignment with best practices and reducing complexity.

Distributed application dashboard

The new dashboard for distributed applications provides a centralized and actionable overview of architectural complexity and technical debt across your distributed portfolio of apps, empowering teams to make informed, data-driven decisions to maintain system health.

vfunction new portfolio dashboard — vFunction’s new portfolio view tracks architectural complexity and technical debt across distributed applications, providing a clear overview of application health and actionable insights in the form of related TODOs.

The related technical debt report helps teams track changes in technical debt across distributed applications, providing valuable insights to prioritize remediation efforts and enhance architectural integrity. For example, Service X may have a higher technical debt of 8.3 due to circular dependencies and multi-hop flows, while Service Y scores 4.2, indicating fewer inefficiencies. Teams can focus remediation efforts on Service X, prioritizing areas where architectural debt has the most significant impact.

vfunction tech debt score — *The tech debt score delivers actionable clarity, enabling teams to understand newly added debt and streamline the prioritization of technical debt reduction.*

The accompanying technical debt score offers a clear, quantified metric based on all open tasks (TODOs), including inefficiencies such as circular dependencies, multi-hop flows, loops, and repeated service calls. Developers use this score to focus on resolving issues like multi-hop flows. For example resolving redundant calls in a service can bring the debt down from 7.5 to 4.5. This score delivers actionable clarity, enabling teams to understand newly added debt and streamline the prioritization of technical debt reduction.

Additionally, CI/CD integration for debt management takes this functionality a step further. Triggered learning scripts allow teams to incorporate technical debt insights directly into their CI/CD pipelines. By comparing the latest system measurements with baseline data, teams can approve or deny pull requests based on technical debt changes, ensuring alignment with architectural goals and mitigating risks before they escalate.

Sync architectural tasks with Jira

vFunction brings architecture observability and workflow management closer together by syncing TODO tasks directly with Jira. Engineering leaders can integrate architectural updates into sprints and bake related tasks into existing workflows.

export todos into jira — *TODOs identified in vFunction can now be opened as tickets in Jira in order to incorporate architecture modifications into the development lifecycle.*

Architecture audit log

The new logs section tracks key architectural decisions made relative to “to-do’s.” This log captures impactful actions such as adding or removing TODOs, uploading references, changing baselines, and setting the latest measurements, while excluding routine auto-generated or completed TODOs. Each entry includes details about the affected entities and the context for the action, along with the user responsible for the decision and the timestamp of when the decision was made. This enhancement highlights how tracking architectural decisions ensures alignment with organizational standards, aids in meeting compliance requirements and reduces the risk of miscommunication or errors by providing a clear historical record of changes.

Why it matters: Efficiency, alignment, and agility

With these new capabilities, vFunction empowers teams to reclaim valuable engineering time, improve productivity, and maintain alignment. By bridging the gap between architecture and workflow, this release makes it easier than ever to document, update, and share architectural insights. Teams can focus on building resilient, scalable systems without the overhead of disconnected tools and outdated diagrams.

In the fast-paced world of microservices, maintaining clear, actionable architecture diagrams is no longer a luxury—it’s a necessity. vFunction equips your team to stay agile, aligned, and ahead of complexity.

Ready to transform your management of microservices?

Explore how vFunction can help your team tackle microservices complexity. Contact us today to learn more or start a free trial.

OpenTelemetry tracing guide + best practices

Posted on December 10, 2024 by Michael Chiaramonte - Featured

As applications grow more complex to meet increasing demands for functionality and performance, understanding how they operate is critical. Modern architectures—built on microservices, APIs, and cloud-native principles—often form a tangled web of interactions, making pinpointing bottlenecks and resolving latency issues challenging. OpenTelemetry tracing provides greater visibility for developers and DevOps teams to understand the performance of their services and quickly diagnose issues and bottlenecks. This data can also be used for other critical applications, which we’ll incorporate into our discussion.

This post will cover all the fundamental aspects of OpenTelemetry tracing, including best practices for implementation and hands-on examples to get you started. Whether you’re new to distributed tracing or looking to improve your existing approach, this guide will give you the knowledge and techniques to monitor and troubleshoot your applications with OpenTelemetry. Let’s get started!

What is OpenTelemetry tracing?

OpenTelemetry, from the Cloud Native Computing Foundation (CNCF), has become the standard for instrumenting cloud-native applications. It provides a vendor-neutral framework and tools to generate, collect, and export telemetry consisting of traces, metrics, and logs. This data gives you visibility into your application’s performance and behavior.

dzone trend report observability survey — *79% of organizations who use observability tools use or are considering using OpenTelemetry. Get the latest in observability from the DZone Trend Report.*

Key concepts of OpenTelemetry

At its core, OpenTelemetry revolves around a few fundamental concepts, which include:

Signals: OpenTelemetry deals with three types of signals: traces, metrics, and logs. Each gives you a different view of your application’s behavior.
Traces: The end-to-end journey of a single request through the many services within your system.
Spans: These are individual operations within a trace. For example a single trace might have spans for authentication, database access and external API calls.
Context propagation: The mechanism for linking spans across services to give you a single view of the request’s path.

Within this framework, OpenTelemetry tracing focuses on understanding the journey of requests as they flow through a distributed system. Conceptually, this is like creating a map of each request path, encapsulating every single step, service interaction, and potential bottleneck along the way.

Why distributed tracing matters

Debugging performance issues in legacy, monolithic applications was relatively easy compared to today’s microservice applications. You could often find the bottleneck within these applications by looking at a single codebase and analyzing a single process; but with the rise of microservices, where a single user request can hit multiple services, finding the source of latency or errors is much more challenging.

For these more complex systems, distributed tracing cuts through this noise. This type of tracing can be used for:

Finding performance bottlenecks: Trace where requests slow down in your system.
Error detection: Quickly find the root cause of errors by following the request path.
Discovering service dependencies: Understand how your services interact and where to improve.
Capacity planning: Get visibility into resource usage and plan for future scaling.

Combining these basic building blocks builds the core functionality and insights from OpenTelemetry. Next, let’s examine how all of these components work together.

How OpenTelemetry tracing works

OpenTelemetry tracing captures the flow of a request through your application by combining instrumentation, context propagation, and data export. Here’s a breakdown:

Spans and traces

A user clicks a button on your website. This triggers a request that hits multiple services: authentication, database lookup, payment processing, etc. OpenTelemetry breaks this down into spans, or units of work in distributed tracing, representing a single operation or a piece of a process within a larger, distributed system. Spans help you understand how requests flow through a system by capturing critical performance and context information. In the example below, we can see how this works, including a parent span (the user action) and a child span corresponding to each sub-operation (user authentication) originating from the initial action.

opentelemetry tracing spans — *Spans help you understand how requests flow through a system by capturing critical performance and context information. Credit: Hackage.haskell.org*

Within a span, there are span attributes. These attributes include the following:

Operation name (e.g. “database query”, “API call”)
Start and end timestamps
Status (success or failure)
Attributes (additional context like user ID, product ID)

One or multiple spans are then linked together to form a trace, giving you a complete end-to-end view of the request’s path. The trace shows you how long each operation took, where the latency was, and the overall flow of execution.

Context propagation

So, how does an OpenTelemetry link a span across services? This is where context propagation comes in. Conceptually, we can relate this to a relay race. Each service is handed a “baton” with trace information when it receives a request. This baton, metadata contained in headers, allows the service to create spans linked to the overall trace. As the request moves from one service to the next, the context is propagated, correlating all the spans.

To implement this, OpenTelemetry uses the W3C Trace Context standard for context propagation. This standard allows the trace context to be used across different platforms and protocols. By combining spans and traces with context propagation, OpenTelemetry gives users a holistic and platform-agnostic way to see the complex interactions within a distributed system.

Getting started with OpenTelemetry tracing

With the core concepts covered, let’s first look at auto-instrumentation. Auto-instrumentation refers to the ability to add OpenTelemetry tracing to your service without having to modify your code or with very minimal changes. While it’s also possible to implement OpenTelemetry by leveraging the OpenTelemetry SDK and adding tracing to your code directly (with some added benefits), the easiest way to get started is to leverage OpenTelemetry’s auto-instrumentation for a “zero-code” implementation.

What is auto-instrumentation?

For those who don’t require deep customization or want to get OpenTelemetry up and running quickly, auto-instrumentation should be considered. Auto-instrumentation can be implemented in a number of ways, including through the use of agents that can automatically instrument your application without code changes, saving time and effort. The way it’s implemented depends on your specific development language / platform.

The benefits of running auto-instrumentation include:

Quick setup: Start tracing with minimal configuration.
Comprehensive coverage: Automatically instrument common libraries and frameworks.
Reduced maintenance: Less manual code to write and maintain.

To show how easy it is to configure, let’s take a brief look at how you would implement auto-instrumentation in Java.

How to implement auto-instrumentation in Java

First, download the OpenTelemetry Agent, grabbing the opentelemetry-javaagent.jar from the OpenTelemetry Java Instrumentation page.

Once the agent is downloaded, you will need to identify the parameters needed to pass to the agent so that the application’s OpenTelemetry data can be exported properly. The most basic of these would be:

Export endpoint – The server endpoint where all the telemetry data will be sent for analysis.
Protocol – The protocol to be used for exporting the telemetry data. OpenTelemetry supports several different protocols, but we’ll be selecting http/protobuf for this example

Exporting will send your telemetry data to an external service for analysis and visualization. Some of the popular platforms include Jaeger and Zipkin and managed services such as Honeycomb or Lightstep.

While these platforms are great for visualizing traces and finding performance bottlenecks, vFunction complements this by using this same tracing data to give you a deeper understanding of your application architecture. vFunction will automatically analyze your application traces and generate a real-time architecture map showing service interactions and dependencies as they relate to the application’s architecture, not its performance. This can help developers and architects identify potential architectural issues that might cause performance problems that are identified within the other OpenTelemetry tools being used.

analyze app dependencies with vFunction — *vFunction analyzes applications to clearly identify unnecessary dependencies between services, reducing complexity and technical debt that can degrade performance.*

Once you have the settings needed for exporting, you can run your application with the agent. That command will look like this:

java -javaagent:path/to/opentelemetry-javaagent.jar -jar myapp.jar
-Dotel.exporter.otlp.protocol=http/protobuf
-Dotel.exporter.otlp.endpoint=https://opentelemetryserver

In the command, you’ll need to replace path/to/ with the actual path to the agent JAR file and myapp.jar with your application’s JAR file. Additionally, the endpoint would need to be changed to an actual endpoint capable of ingesting telemetry data. For more details, see the Getting Started section in the instrumentation page’s readme.

As you can see, this is a simple way to add OpenTelemetry to your individual services without modification of the code. If you choose this option, ensure that you understand and have confirmed the compatibility between the auto-instrumentation agent and your application and that the agent supports your application’s libraries. Another consideration revolves around customization. Sometimes auto-instrumentation does not support a library you’re using or is missing data that you require. If you need this kind of customization, then you should consider updating your service to use the OpenTelemetry SDK directly.

Updating your service

Let’s look at what it takes to manually implement OpenTelemetry tracing in a simple Java example. Remember, the principles apply across any of the different languages within which you could implement OpenTelemetry.

Setting up a tracer

First, you’ll need to add the OpenTelemetry libraries to your project. For Java, you can include the following dependencies in your pom.xml (Maven) or build.gradle (Gradle) file:

<dependency>
  <groupId>io.opentelemetry</groupId>
  <artifactId>opentelemetry-api</artifactId>
  <version>1.26.0</version> 
</dependency>
<dependency>
  <groupId>io.opentelemetry</groupId>
  <artifactId>opentelemetry-sdk</artifactId>
  <version>1.26.0</version> 
</dependency>
<dependency>
  <groupId>io.opentelemetry</groupId>
  <artifactId>opentelemetry-exporter-otlp</artifactId>
  <version>1.26.0</version> 
</dependency>

Next, your code will need to be updated to initialize the OpenTelemetry SDK within your application and create a tracer:

import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor;


public class MyApplication {


    public static void main(String[] args) {
        // Configure the OTLP exporter to send data to your backend
        OtlpGrpcSpanExporter otlpExporter = OtlpGrpcSpanExporter.builder().build();


        SdkTracerProvider tracerProvider = SdkTracerProvider.builder()
                .addSpanProcessor(BatchSpanProcessor.builder(otlpExporter).build())
                .build();


        OpenTelemetrySdk openTelemetrySdk = OpenTelemetrySdk.builder()
                .setTracerProvider(tracerProvider)
                .buildAndRegisterGlobal();


        // Create a tracer
        Tracer tracer = openTelemetrySdk.getTracer("my-instrumentation-library");


        // ... your application code ...
    }
}

This code snippet initializes the OpenTelemetry SDK with an OTLP exporter. Just like with auto-instrumentation, this data will be exported to an external system for analysis. The main difference is that this is configured here in code, rather than through command-line parameters.

Instrumentation basics

With the tracer set up, it’s time to get some tracing injected into the code. For this, we’ll create a simple function that simulates a database query, as follows:

import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.StatusCode;
import io.opentelemetry.context.Scope;5


public void queryDatabase(String query) {
    Span span = tracer.spanBuilder("database query").startSpan();
    try (Scope scope = span.makeCurrent()) {
        // Simulate database operation
        Thread.sleep(100); 
        span.setAttribute("db.statement", query);
        span.setStatus(StatusCode.OK);
    } catch (Exception e) {
        span.setStatus(StatusCode.ERROR, "Error querying database");
    } finally {
        span.end();
    }
}

This code does a few things that are applicable to our OpenTelemetry configuration. First, we use tracer.spanBuilder(“database query”).startSpan() to create a new span named “database query.” Then, we use span.makeCurrent() to ensure that this span is active within the try block.

Within the try block, we use Thread.sleep() to simulate a database command. Then we add a span.setAttribute(“db.statement”, query) to record the query string and set the span.setStatus to OK if successful. If the operation causes an error, you’ll see in the catch block that we call span.setStatus again, passing it an error to be recorded. Finally, span.end() completes the span.

In our example above, this basic instrumentation captures the execution time of the database query and provides context within the query string and status. You can use this pattern to manually instrument various operations in your application, such as HTTP requests, message queue interactions, and function calls.

Leveraging vFunction for architectural insights

Combining the detailed traces from OpenTelemetry implementation like the one above with vFunction’s architectural analysis gives you a complete view of your application’s performance and architecture. For example, if you find a slow database query through OpenTelemetry, vFunction can help you understand the service dependencies around that database and potentially reveal architectural bottlenecks causing the latency.

To integrate OpenTelemetry tracing data with vFunction:

1. Configure the OpenTelemetry collector:

NOTE: If you’re not using a collector, you can skip this step and send the data directly to vFunction

– Configure the OpenTelemetry Collector to export traces to vFunction.
- Example configuration collector:

extensions:
  health_check:
  headers_setter:
    headers:
      - action: insert
        key: X-VF-APP
        from_context: X-VF-APP
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
        include_metadata: true
processors:
  batch:
    metadata_keys:
    - X-VF-APP
exporters:
  otlphttp/vf:
    endpoint: $PROTOCOL://$VFSERVER/api/unauth/otlp
    auth:
      authenticator: headers_setter
service:
  extensions: [health_check, headers_setter]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlphttp/vf]

2. Configure your service to include the required vFunction trace headers:

Each service that should be considered for the analysis needs to send its trace data to either vFunction or the collector. As part of this, a trace header must be added to help vFunction know with which distributed application this service is associated. After creating the distributed application in the vFunction server UI, it provides instructions on how to export telemetry data to the collector or vFunction directly. One way this can be done is via the Java command line:

-javaagent:<path-to-opentelemetry-javaagent.jar>
-Dotel.exporter.otlp.protocol=http/protobuf
-Dotel.exporter.otlp.endpoint=http://opentelemetrycollector:4318
-Dotel.traces.exporter=otlp
-Dotel.metrics.exporter=none
-Dotel.logs.exporter=none
-Dotel.exporter.otlp.traces.headers=X-VF-APP=<app header UUID>,X-VF-TAG=<env>

Other examples are provided in the server UI. The <app header UUID> above refers to a unique ID that’s provided in the vFunction server UI to associate a service with the application. The ID can be easily found by clicking on the “Installation Instructions” in the server UI and following the instructions.

3. Verify the integration:

– Generate some traces in your application.

– Check the vFunction server UI, and you should see the number of agents and tags increase as more services begin to send telemetry information. Click on “START” at the bottom of the interface to begin the analysis of the telemetry data. As data is received, vFunction’s UI will begin to visualize and analyze the incoming trace data.

By following these steps and integrating vFunction with your observability toolkit, you can effectively instrument your application and gain deeper insights into its performance and architecture.

Best practices for OpenTelemetry tracing

Implementing OpenTelemetry tracing effectively involves more than just adding instrumentation. As with most technologies, there are good, better, and best ways to implement OpenTelemetry. To get the full value from your tracing data, follow these best practices Consider the following key points when implementing and utilizing OpenTelemetry in your applications:

Semantic conventions

Adhere to OpenTelemetry’s semantic conventions for naming spans, attributes, and events. Consistent naming ensures interoperability and makes it easier to analyze traces.

For example, if you are creating a span for an HTTP call, add all of the relevant details to the span attributes and attribute the key-value pairs of information to the span. This might look something like this:

span.setAttribute("http.method", "GET");
span.setAttribute("http.status_code", 200);

The documentation on the OpenTelemetry website provides a detailed overview of the recommended semantic conventions, which are certainly worth exploring.

Efficient context propagation

Use efficient context propagation mechanisms to link spans across services with minimal overhead. OpenTelemetry supports various propagators, such as W3C TraceContext, the default propagator specification used with OpenTelemetry.

To configure the W3C TraceContext propagator in Java, do the following:

GlobalOpenTelemetry.setPropagators(
    ContextPropagators.create(W3CTraceContextPropagator.getInstance())
);

If you want to bridge outside of the default value, the OpenTelemetry docs on context propagation have extensive information to review, including the Propogators API.

Tail-based sampling

When it comes to tracing, be aware of different sampling strategies that can help to manage data volume while retaining valuable traces. One method to consider is tail-based sampling, which makes sampling decisions after a trace is completed, allowing you to keep traces based on specific characteristics like errors or high latency.

To implement tail-based sampling, you can configure it in the OpenTelemetry Collector or directly in the backend. More information on the exact configuration can be found within the OpenTelemetry docs on tail-based sampling.

Adhering to these best practices and incorporating auto-instrumentation as appropriate can enhance the efficiency and effectiveness of your OpenTelemetry tracing, yielding valuable insights into your application’s performance.

Troubleshooting and optimization

Although OpenTelemetry provides extensive data, effectively troubleshooting and optimizing your application requires understanding how to leverage this information. Here are strategies for using traces to identify and resolve issues:

Recording errors and events

When an error occurs, you need to capture relevant context. OpenTelemetry allows you to record exceptions and events within your spans, providing more information for debugging.

For example, in Java you can add tracing so that error conditions, such as those caught in a try-catch statement, can be captured correctly. For example, in your code, it may look something like this:

try {
    // ... operation that might throw an exception ...
} catch (Exception e) {
    span.setStatus(StatusCode.ERROR, e.getMessage());
    span.recordException(e); 
}

This code snippet sets the span status to ERROR, records the exception message, and attaches the entire exception object to the span. Thus, you can see not only that an error occurred but also the specific details of the exception, which can be extremely helpful in debugging and troubleshooting. You can also use events to log important events within a span, such as the start of a specific process, a state change, or a significant decision point within a branch of logic.

Performance monitoring with traces

Traces are also invaluable for identifying performance bottlenecks. By examining the duration of spans and the flow of requests, you can pinpoint slow operations or services causing performance issues and latency within the application. Most tracing backends that work with OpenTelemetry already provide tools for visualizing traces, filtering by various criteria (e.g., service, duration, status), and analyzing performance metrics.

opentelemetry tracing complexity — vFunction uses OpenTelemetry tracing to reveal the complexity behind a single user request, identifying overly complex flows and potential bottlenecks, such as the OMS service (highlighted in red), where all requests are routed through a single service.

vFunction goes beyond performance analysis, by correlating trace data with architectural insights. For example, if you identify a slow service through OpenTelemetry, vFunction can help you understand its dependencies, resource consumption, and potential architectural bottlenecks contributing to the latency, providing deep architectural insights that traditional performance-based observability tools don’t reveal.

Pinpoint and resolve issues faster

By combining the detailed information from traces with vFunction’s architectural analysis, you can reveal hidden dependencies, overly complex flows, and architectural anti-patterns that impact the resiliency and scalability of your application. Pulling tracing data into vFunction to support deeper architectural observability, empowers you to:

Isolate the root cause of errors: Follow the request path to identify the service or operation that triggered the error.
Identify performance bottlenecks: Pinpoint slow operations or services that are causing delays.
Understand service dependencies: Visualize how services interact and identify potential areas for optimization.
Verify fixes: After implementing a fix, use traces to confirm that the issue is resolved and performance has improved.

OpenTelemetry tracing, combined with the analytical capabilities of platforms like vFunction, empowers you to troubleshoot issues faster and optimize your application’s performance more effectively.

Next steps with vFunction

OpenTelemetry tracing provides a powerful mechanism for understanding the performance and behavior of your distributed applications. By instrumenting your code, capturing spans and traces, and effectively analyzing the data, you can identify bottlenecks, troubleshoot errors, and optimize your services.

Discover how vFunction can transform your application development.

Start Free Trial

Navigating complexity: Overcoming challenges in microservices and monoliths with vFunction

Posted on October 15, 2024 by Nenad Crncec - Featured

We’re excited to have Nenad Crncec, founder of Architech, writing this week’s blog post. With extensive experience in addressing architectural challenges, Nenad shares valuable insights and highlights how vFunction plays a pivotal role in overcoming common stumbling blocks. Take it away, Nenad!

In my journey through various modernization projects, one theme that consistently emerges is the challenge of managing complexity—whether in microservices and distributed systems or monolithic applications. Complexity can be a significant barrier to innovation, agility, and scalability, impacting an organization’s ability to respond to changing market demands.

Complexity can also come in many forms: Complex interoperability, complex technology implementation (and maintenance), complex processes, etc.…

“Complex” is something we can’t clearly understand – it is unpredictable and unmanageable because of the multifaceted nature and the interaction between components.

Imagine trying to assemble flat-pack furniture without instructions, in the dark, while wearing mittens. That’s complexity for you.

What is complexity in software architecture?

Complexity, in the context of software architecture and system design, refers to the degree of intricacy and interdependence within a system’s components and processes. It encompasses how difficult it is to understand, modify, and maintain the system. Complexity arises from various factors, including the number of elements in the system, the nature of their interactions, the technologies used, and the clarity of the system’s structure and documentation.

Complexity also arises from two additional factors, even more impactful – people and time – but that is for another article.

complexity different architectures — *Complexity creates all sorts of challenges across different types of architectures.*

The double-edged sword of microservices

I recently assisted a company in transitioning from a monolithic architecture to microservices. The promise of microservices—greater flexibility, scalability, and independent deployability—was enticing. Breaking down the application into smaller, autonomous services allowed different teams to work concurrently, accelerating development.

Allegedly.

While this shift offered many benefits, it also led to challenges such as:

Operational overhead: Managing numerous services required advanced orchestration and monitoring tools. The team had to invest in infrastructure and develop new skill sets to handle containerization, service discovery, and distributed tracing. Devops, SRE’s were spawned as part of agile transformation and a once complex environment…remained complex.
Complex inter-service communication: Ensuring reliable communication between services added layers of complexity. Network latency, message serialization, and fault tolerance became daily concerns. Add to that communication (or lack thereof) between teams, building those services that need to work together and you have a recipe for disaster. If not managed and governed properly.
Data consistency issues: Maintaining consistent data across distributed services became a significant concern. Without clear data governance, the simplest of tasks can become epic sagas of “finding and understanding data.”

And then there were the people—each team responsible for their own microservice, each with their own deadlines, priorities, and interpretations of “RESTful APIs.” Time pressures only added to the fun, as stakeholders expected the agility of microservices to translate into instant results.

Despite these challenges, the move to microservices was essential for the company’s growth. However, it was clear that without proper management, the complexity could outweigh the benefits.

The hidden complexities of monolithic applications

On the other hand, monolithic applications, often the backbone of legacy systems, tend to accumulate complexity over time. I recall working with an enterprise where the core application had evolved over years, integrating numerous features and fixes without a cohesive architectural strategy. The result was a massive codebase where components were tightly coupled, making it difficult to implement changes or updates without unintended consequences.

This complexity manifested in several ways:

Slower development cycles: Even minor changes required extensive testing across the entire application.
Inflexibility: The application couldn’t easily adapt to new business requirements or technologies.
High risk of errors: Tightly coupled components increased the likelihood of bugs when making modifications.

But beyond the code, there were people and time at play. Teams had changed over the years, with knowledge lost as developers, business analysts, sysadmins, software architects, engineers, and leaders, moved on. Institutional memory was fading, and documentation was, well, let’s say “aspirational.” Time had turned the once sleek application into a relic, and people—each with their unique coding styles and architectural philosophies—had added layers of complexity that no one fully understood anymore.

people in complexity equation — *As people leave organizations, institutional memory fades and teams are left with apps no one understands.*

Adding people and time to the complexity equation

It’s often said that technology would be simple if it weren’t for people and time. People bring creativity, innovation, and, occasionally, chaos. Time brings evolution, obsolescence, and the ever-looming deadlines that keep us all on our toes.

In both monolithic and microservices environments, people and time contribute significantly to complexity:

Knowledge silos: As teams change over time, critical knowledge can be lost. New team members may not have the historical context needed to make informed decisions, leading to the reinvention of wheels—and occasionally square ones.
Diverging priorities: Different stakeholders have different goals, and aligning them is like trying to synchronize watches in a room full of clocks that all think they’re the master timekeeper.
Technological drift: Over time, technologies evolve, and what was cutting-edge becomes legacy. Keeping systems up-to-date without disrupting operations adds another layer of complexity.
Cultural differences: Different teams may have varying coding standards, tools, and practices, turning integration into an archaeological expedition.

Addressing complexity with vFunction

Understanding the intricacies of both monolithic and microservices architectures led me to explore tools that could aid in managing and reducing complexity. One such tool is vFunction, an AI-driven architectural observability platform designed to facilitate the decomposition of monolithic applications into microservices and observe behaviour and architecture of distributed systems.

Optimizing microservices architectures

In microservice environments (distributed systems), vFunction plays an important role in deciphering complexity:

Identifying anti-patterns: The tool detects services that are overly chatty, indicating that they might be too granular or that boundaries were incorrectly drawn. Think of it as a polite way of saying, “Your services need to mind their own business a bit more.”
Performance enhancement: By visualizing service interactions, we could optimize communication paths and reduce latency. It’s like rerouting traffic to avoid the perpetual construction zone that is Main Street.
Streamlining dependencies: vFunction helps us clean up unnecessary dependencies, simplifying the architecture. Less is more, especially when “more” equals “more headaches.”

understand and structure microservices — *vFunction helps teams understand and structure their microservices, reducing unnecessary dependencies.*

How vFunction helps with monolithic complexity

When dealing with complex monolithic systems, vFunction can:

Automate analysis: vFunction scans the entire system while running, identifying dependencies and clustering related functionalities. This automated analysis saved countless hours that would have been spent manually tracing code. It was like having a seasoned detective sort through years of code crimes.
Define service boundaries: The platform suggested logical partitions based on actual usage patterns, helping us determine where natural service boundaries existed. No more debates in meeting rooms resembling philosophical symposiums.
Prioritizing refactoring efforts: By highlighting the most critical areas for modernization, vFunction allows us to focus on components that would deliver the most significant impact first. It’s amazing how a clear priority list can turn “we’ll never finish” into “we’re making progress.”

architecture is important — *Does your organisation manage architecture? How are things built, maintained, planned for the future? How does your organisation treat architecture? Is it part of the culture?*

Bridging the people and time gap with vFunction

One of the unexpected benefits of using vFunction it its impact on the people side of the equation:

Knowledge transfer: The visualizations and analyses provided by the tool help bring new team members up to speed faster than you can say “RTFM.”
Unified understanding: With a common platform, teams have a shared reference point, reducing misunderstandings that usually start with “I thought you meant…”
Accelerated timelines: By adopting it in the modernization process, vFunction helps us meet tight deadlines without resorting to the classic solution of adding more coffee to the project.

Practical use case and lessons learned

Now that this is said and done, there are real-life lessons that you should take to heart (and brain…)

Does your organisation manage architecture? How are things built, maintained, planned for the future? How does your organisation treat architecture? Is it part of the culture?

Every tool is useless if it is not used.

In the project where we transitioned a large European bank to microservices, using vFunction (post-reengineering) provided teams with fine-tuned architecture insights (see video at the top of this blog). We analyzed both “monolithic” apps and “distributed’ apps with microservices. We identified multi-hop and cyclic calls between services, god classes, dead code, high complexity classes… and much more.

We used initial measurements and created target architecture based on it. vFunction showed us where complexity and coupling lies and how it impacts the architecture.

vfunction todos to tackle issues — *vFunction creates a comprehensive list of TODOs which are a guide to start tackling identified issues.*

One major blocker is not treating architecture as a critical artifact in team ownership. Taking care of architecture “later” is like building a house, walls and everything, and deciding later what is the living room, where is the bathroom, and how many doors and windows we need after the fact. That kind of approach will not make a family happy or a home safe.

Personal reflections on using vFunction

“What stands out to me about vFunction is how it brings clarity to complex systems. It’s not just about breaking down applications but understanding them at a fundamental level. This comprehension is crucial for making informed decisions during modernization.”

In both monolithic and microservices environments, the vFunction’s architectural observability provided:

Visibility: A comprehensive view of the application’s structure and interdependencies.
Guidance: Actionable insights that informed our architectural strategies.
Efficiency: Streamlined processes that saved time and resources.

Conclusion: Never modernize again

Complexity in software architecture is inevitable, but it doesn’t have to be an insurmountable obstacle. Whether dealing with the entanglement of a monolith or the distributed nature of microservices, tools like vFunction offer valuable assistance.

By leveraging platforms such as vFunction, organizations can:

Reduce risk: Make changes with confidence, backed by data-driven insights.
Enhance agility: Respond more quickly to business needs and technological advancements.
Promote innovation: Free up resources to focus on new features and improvements rather than wrestling with complexity.

From my experiences, embracing solutions that tackle architectural complexity head-on is essential for successful modernization. And more than that, it is a tool that should help us never modernize again, by continually monitoring architectural debt and drift, helping us to always keep our systems modern and fresh. It’s about empowering teams to understand their systems deeply and make strategic decisions that drive growth.

Take control of your microservices, macroservices, or distributed monoliths with vFunction

Request a Demo

vFunction recognized as a 2024 Gartner® Cool Vendor

Posted on August 22, 2024 by Moti Rafalin - Featured

Major alert: vFunction was just named a 2024 Gartner Cool Vendor in AI-Augmented Development and Testing. We’re incredibly grateful and proud of this recognition. We’re also excited about the opportunity to share our platform on a larger scale and bring architectural observability to more enterprises.

According to the “Gartner Cool Vendors™ in AI Augmented Development and Testing for Software Engineering” report, “As the codebase and architectural complexity grow, the processing power required to handle local builds escalates significantly. Many organizations struggle to equip their software engineers with the necessary tools to meet the increasing demand for faster delivery from idea to production, impacting overall productivity and efficiency.” For too long, organizations have long struggled to fully grasp the complexity of their application architectures as they evolve throughout the SDLC. Enterprises juggle all types of application architectures, modular monoliths, distributed monoliths, miniservices, microservices and more, having to make tradeoffs between agility and complexity. Traditional approaches—relying on manual code reviews, fragmented documentation, and institutional knowledge—have proven largely inadequate in identifying and addressing architectural risks and prioritizing necessary fixes at today’s speed of business. This has resulted in an architectural blind spot that has significantly impeded modernization efforts and led to mounting technical debt – particularly architectural technical debt – as well as unrealized revenue potential in the billions.

To remediate technical debt effectively, Gartner recommends that “organizations use architectural observability tools to thoroughly analyze software architecture, identify inconsistencies, and gain deeper insights.”

At vFunction, we see software architecture as a critical but often underutilized driver of business success. We believe being recognized as a Gartner Cool Vendor validates our innovative approach to empower engineering teams to innovate faster, address resiliency earlier, build smarter, and create scalable applications that change the trajectory of their business. With our AI-driven architectural observability platform, teams are equipped with valuable insights to find and fix unnecessary complexity and technical debt across large, complex applications and modern, highly distributed microservices health throughout the organization. Software teams use the platform to understand their applications, identify the sources of technical debt, and find refactoring opportunities to enhance scalability, resiliency, and engineering velocity.

Five reasons why we believe vFunction was recognized as a Cool Vendor.

Architectural observability plays a key role in managing the complexities of modern software development. Gartner states that, “By 2027, 80% of software engineering groups will monitor software architecture complexity and architecture technical debt in near real time, up from less than 10% today.” We feel we’re at the forefront of this trend, providing the tools necessary to meet this growing need.

By vigilantly monitoring architectural technical debt and drift across the entire application portfolio, our solution equips software engineering leaders and their teams with the insights necessary to make informed decisions. Here’s why we believe vFunction stands out:

AI-powered. vFunction’s architectural observability platform understands and visualizes application architecture to reduce technical debt and complexity.
Find and fix technical debt. vFunction uses extensive data to identify and remediate architectural technical debt across the entire application portfolio.
Shift left. Address the root causes of technical debt to prevent performance issues before they arise using vFunction’s patented methods of static and dynamic analysis.
Prioritize and alert. vFunction incorporates a prioritized task list into every sprint to fix key technical debt issues, based on your unique business goals.
Any architecture. The platform relies on OpenTelemetry to support a wide spectrum of programming languages in the distributed world, and Java and .NET for monolithic architectures, so you can use it for a variety of use cases, from monoliths to distributed microservices, refining microservices, and considering modular monoliths.

Organizations face immense pressure to deliver high-quality software rapidly, stay competitive, and pivot quickly in response to market demands. The rapid accumulation of technical debt exacerbates these challenges, hampering engineering velocity, limiting application scalability, and impacting resiliency. This often results in increased risks of outages, delayed projects, and missed opportunities.

Ready to put the freeze on software complexity and mounting technical debt? Let us partner with you to unlock the full potential of your software architecture. Contact us today to learn how vFunction can be an indispensable asset in transforming your software development practices.

Gartner, Inc. Cool Vendors in AI-Augmented Development and Testing for Software Engineering. Tigran Egiazarov, Philip Walsh, etl. 8 August 2024.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved. Cool Vendors is a registered trademark of Gartner, Inc. and/or its affiliates and is used herein with permission. All rights reserved.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Recent Posts

Recent Comments