Today I’m excited to announce beta availability of WhiteSource Merge Confidence, a feature designed to save time and reduce risk when keeping dependencies up-to-date. Merge Confidence identifies and flags undeclared breaking releases based on analysis of test and release adoption data across WhiteSource Renovate’s early-adopting user base. The new feature was created to help users avoid the pain of un-mergeable Pull Requests or worse -- a broken dependency in production.
What Problem Are We Solving?
Keeping Open Source dependencies up-to-date is ideal, but up till now many have found this challenging:
There’s a risk that a regression error in a dependency update could cause production problems
Manually reviewing each update to decrease that risk requires a massive amount of work
That’s why most projects today still don’t automate dependency updates, leaving themselves open to another type of risk. Outdated dependencies become a liability when vulnerability fixes require a major version leap, which results in unfixed bugs that impact user experience.
How Does Merge Confidence Change Things?
Merge Confidence makes choosing dependency automation a no-brainer, turning low-risk and low-touch dependency updating into a reality. It harnesses crowd intelligence about open source updates to provide us with more clarity about the reliability of a release, far exceeding what we normally get from individual projects, even with great test coverage. Better yet -- since Merge Confidence encourages wider adoption of dependency automation, the increased user base will in turn improve our ability to identify good and bad updates.
How Does It Work?
The WhiteSource Renovate App has enabled a diverse user base across github.com and gitlab.com to keep dependencies up-to-date since 2018, and generated millions of Pull Requests in the process. We found that by aggregating and analyzing metrics we already had, such as release age, release adoption, and Pull Request test results, we can accurately identify releases of open source packages that show signs of having undeclared breaking changes.
Example 1: Broken minor release
Once we started analyzing package update data, we were surprised and delighted to find that the vast majority of Open Source releases did not fail any of our users’ project tests. While it was almost disappointing to not have broken examples jumping out at us instantly, that's great news for the npm ecosystem and for Merge Confidence as a concept. If we can accurately identify the bad releases, everyone can benefit from the 99.9% of updates that are good.
Here’s an example of one that jumped out, because it failed essentially everyone:
Sure enough, release 1.5.1 fixed it:
While this is satisfying to see, a better example is of a release that fails only some tests. In such a case, being aware that a release failed a significant amount of others’ tests would be very useful, even if it passes your own. Here’s an example from “postcss”:
In the above example, it’s obvious that release 8.1.5 should be avoided. Checking in the repo reveals an issue confirming a fix in release 8.1.6. This is an example of the exact type of release that Merge Confidence was created to identify. It’s, in theory, a non-breaking patch release and it passes tests for most users, however there is definitely something wrong and everyone would be better off waiting until a subsequent release reaches high confidence.
Although we’ve enjoyed digging into some of the low-confidence releases above, keep in mind that the entire idea of this new capability is that you actually don’t need to do detailed research like we did in the above examples when deciding whether to upgrade -- you can simply trust Merge Confidence.
Example 2: A Passing Major Release
Without a doubt, reducing the risk of regression errors in non-major releases is the greatest gain from Merge Confidence. Another nice side effect has been identifying major updates that don’t break most projects. It’s very common for projects to assume major updates might require a lot of work and put them off indefinitely, so being able to update major releases that haven’t broken anybody is a great gain too. Take this example from a dependency we had put off updating even in the WhiteSource Renovate App’s source code:
Looking at the release notes, it was not surprising to see why it had passed everybody’s tests -- the major release was created due to deprecating support for long-ago deprecated Node.js versions. Another way we can confirm this is by looking at the percentage of cases where projects needed to add commits to Renovate’s Pull Request before merging them. If nobody is adding commits, it reinforces our finding that the major update was compatible for them, while if additional commits were needed by one or more projects then it’s a sign that changes were necessary, even if original tests had passed.
Based on analysis of recent releases in the npm ecosystem, more than one-third of all major updates passed in excess of 90% of people’s tests. In fact 15% of all major updates even passed 100% of tests amongst Renovate users.
Edge Cases and Other Challenges
Merge Confidence will continue to improve and evolve over time. Initially, we will be conservative before we declare an update “high” or “low” confidence. The starting point we’ve chosen for updates is “neutral”, which covers scenarios like:
We don’t have enough users using this package to be able to be sure about it, OR
The release is not old enough to derive a confidence yet (for example, did you know that npm releases can be deleted/withdrawn within the first 72 hours?), OR
The percentage of test failures is enough to miss high confidence, but not so bad that we want to declare it “low” confidence.
Another challenge is deriving confidence scores for large jumps of dependency versions, e.g. you are a year out of date. First of all, we don’t have data stretching back years, but also there is always going to be some additional uncertainty when an update this large. There may be one or more releases in-between that “broke” users (i.e. with a low confidence score) but if they were unbroken in the very next patch, then that low confidence release you skip over is immaterial. Our experience is that most package authors today don’t make breaking changes in non-major releases, and if they accidentally do then they usually quickly correct the problem.
For now, we will treat the Merge Confidence of a non-major update as being dependent on the “to version” and not on the “from version”. In other words, if a release is bad then it’s going to be bad for everyone updating to it, regardless of whether a one patch update or multiple minor, and vice versa. We will be looking for a way to identify when packages release a non-major update that includes breaking changes which are subsequently never reverted.
In A World of Merge Confidence, Are Tests Still Necessary?
It’s true that projects without any tests can benefit from using Merge Confidence to reliably update dependencies, although I’d recommend being as conservative as you can and waiting for very high confidence. However, the reality is that you should have great tests in a project regardless of whether you are using Merge Confidence, because your project is going to need more testing than just its dependencies.
Beyond badges, we believe that the ultimate value of Merge Confidence will be realized once we support it as part of workflow decision making. For example:
Do not create an update Pull Request unless confidence is high
Automerge Pull Requests if confidence is very high
Even better, the addition of Merge Confidence will make aggressive grouping of updates better. Today, if you group updates together (e.g. weekly or monthly), there’s a risk that one bad update means the PR fails tests and you can’t update anything without spending precious time debugging. However, if your weekly or monthly updates contain only high confidence updates, the chances of that are significantly reduced. Naturally, we are also planning to support a filter where bundled PRs can be configured to only contain updates that both (a) pass your tests, and (b) have a high confidence score. This combination should reduce both the workload and the risk of staying up-to-date significantly.
Beta Availability and Roadmap
In the first iteration of Merge Confidence, we’re starting with Merge Confidence badges like these:
Although we love the look of the full badge, screen width in Pull Requests is limited so we think people are more likely to use the slim badges within a table with a “Confidence” heading, as can be seen in earlier screenshots.
We’re also making badges available for age, test percent, and adoption percent. While we think long term users will migrate to simply trusting the confidence score in a pull request, many will appreciate the additional detail early on when they start out and are still experimenting with this new capability.
Enabling and Disabling
We plan to automatically enable Merge Confidence badges to all users of the WhiteSource Renovate App over the next 1-2 weeks, but any app or self-hosted users (including WhiteSource Remediate) can opt into the beta immediately.
For Renovate users of all types, if you add
to your Renovate config then you can experience it right away.
Similarly, if you’d prefer to disable it, add
to your config instead.
For WhiteSource Remediate users, please follow the instructions here.
Initially, anyone clicking on these badges in a Pull Request will be directed to this very long blog post, but we plan to soon put up a site that badges will deep link to, that will show users all the details we have about an upgrade, including age, test results, and adoption.
We aim to introduce Merge Confidence workflows within the next 1-2 months and have them in beta during Q1/2021, at which time they will be free for all WhiteSource Renovate App users. After that, they’ll remain free for public and open source repositories as well as for all WhiteSource customers.
Platform and Language Support
Merge Confidence is enabled using Renovate’s existing presets capability, so is available for all platforms (GitHub, GitLab, etc) immediately. Initially badges will be enabled only for npm packages while others will be enabled once we are confident that everything’s going smoothly with npm.