Soft vs. Hard Dependency
A Better Way to Think About Dependencies for More Reliable Systems
Hello! Today, we’re exploring a key aspect of distributed systems: how to think about dependencies between components and why it matters for reliability.
Introduction
When we build a system composed of multiple components (e.g., database, services, caches), it’s important to understand the dependency graph. For example, a service might depend on:
A database to store data
A messaging layer to exchange information
A cache to reduce latency
Having a clear understanding of the dependencies in a system helps us maintain it more efficiently. But there's one question we often overlook: Are these dependencies soft or hard?
Soft dependency: One that is non-critical for the service to operate properly.
Hard dependency: One that is critical for the service to operate properly.
“Operate properly” in this context means, for example, that a service responds to requests, doesn’t lose data, and maintains an acceptable level of performance. In short, the service works reliably.
Two examples to illustrate the concept of soft and hard dependencies:
A recommendation service is a soft dependency for a video platform. If it’s down, users can still watch videos, just without personalized suggestions.
An authentication service is a hard dependency for a system that requires users to log in. If authentication is down, users can’t access the system.
Why It Matters
Understanding the type of dependency helps us make the right decisions:
Reliability expectations:
Soft: High reliability expectation may not be necessary. Back to the example of a recommendation service for a streaming system, this service doesn’t need 5 9s availability (99.999%) if it isn’t on the critical user journey.
Hard: A hard dependency must match or even exceed the reliability of the dependent service. If a critical backend is only available 99.5% of the time but our own SLO is 99.9%, we have a structural problem. Setting the right expectation for a hard dependency is critical.
Fault-tolerance strategy:
Soft: If the dependency is unavailable, we are not obliged to build a proper fault-tolerant strategy. We can let it degrade gracefully and wait for it to be back.
Hard: If the dependency is unavailable, we need to work on a strategy, such as establishing an efficient fallback strategy to keep our service running.
Observability and alerting:
Soft: Observability is still important, but alerts can often have a lower priority or be routed differently.
Hard: The dependency must be tightly monitored. Failures or even minor degradation, such as latency spikes, error rates, or availability dips, must be tracked continuously.
Rollout and change management:
Soft: Changes can be managed with more flexibility. Rollout may not require tight coordination or strict sequencing, and temporary failures might be acceptable.
Hard: Rollouts become delicate operations. We often need tight orchestration between teams, version compatibility checks, gradual rollouts with validation at each step, and well-tested rollback mechanisms. Any mistake could trigger a production incident.
Soft or Hard Dependency?
Classifying a dependency isn’t always obvious.
In some cases, it’s fairly straightforward. For example, if a REST endpoint requires a database query, that database is a hard dependency. But gray areas are fairly common, for example:
A service can run without a certain dependency at runtime, but it still needs that dependency at startup to initialize. In this case, the dependency is hard from an operational point of view. If it’s down during a deploy or a scale-out, we can’t even get the service running.
A service calls a soft dependency, but the RPC call has no timeout or fallback. If the dependency becomes unresponsive, the latency of our service spikes, possibly exhausting thread pools or request queues. What was supposed to be a soft dependency now puts the entire system at risk.
These are examples of soft dependencies not handled correctly, turning into hard ones in practice. Whether a dependency is technically optional doesn't matter if the failure of this dependency ends up blocking our service.
In many systems, identifying these cases is not trivial. Approaches like deliberately breaking dependencies or introducing hazardous conditions (e.g., random network delays) can help reveal which dependencies are truly non-critical and which ones only appear to be.
Evolutions Over Time
To make things even more complex, we need to keep in mind that the type of a dependency is not set in stone. A dependency that starts as soft can easily turn into a hard one over time.
Let’s consider a service that reads data from a database. We introduce a cache to reduce latency. Initially, this cache is a soft dependency. If it goes down, we fall back to the database, which results in an acceptable latency increase.
Yet, as traffic grows, the service begins to rely on the cache not just for latency but for throughput. At some point, if the cache becomes cold and every request hits the database, the database may no longer be able to handle the load.
In this example, the cache was a soft dependency, but it became a hard one due to changes in system conditions (more traffic).
This evolution (from soft to hard) is, unfortunately, much more common than the reverse. Without active effort on efficient maintenance and continuous, it’s fairly common for a soft dependency to turn silently into a hard one.
Improving Reliability
On the other hand, with active and continuous effort, it’s possible to turn a hard dependency into a soft one. One effective approach is to design a fallback strategy that makes the dependency’s downtime essentially invisible.
Designing a solid fallback is anything but simple (we’ll explore this in a future post). However one principle stands out: fallbacks need to be tested, and they need to be tested continuously. A fallback that hasn’t been exercised in months isn’t a fallback. It’s dead code.
Once we’ve reached a point where the dependency can go down and users don’t notice, then the dependency is soft. Turning hard dependencies into soft ones is one of the most effective ways to improve the reliability of a system.
Conclusion
To manage dependencies effectively, we need to classify them as either soft or hard.
To avoid surprises, we must understand that soft dependencies can turn hard without warning, especially as systems scale.
To improve reliability, we should actively turn hard dependencies into soft ones using strategies like efficient fallbacks.
💬 Have you seen a soft dependency quietly become critical over time?
❤️ If you made it this far and enjoyed the post, please consider giving it a like.