MidScale project has finished lately. The project aimed at the increase of midPoint scalability, performance and manageability to support large and complex midPoint deployments. The project was a success! Yet, it was far from being easy.
When midPoint started a decade ago, the primary target was a mid-size enterprise with thousands of identities to manage. It made a perfect sense back then, that was the scale we could handle – both from business and technology perspective. However, the world is a different place now. Deployments reaching beyond millions of managed identities are much more common. As our customers have changed, we have changed as well. MidPoint had to adapt to the new environment.
We have been working on midPoint performance improvements for years. First results were delivered in 2018 when “Watt” was released. However, at that time, we have fully realized that there is a component limiting out the potential. MidPoint data storage layer (which we call “repository”) was built in a generic way, supporting several database engines. However, every abstraction has its cost. Supporting many databases with the same code meant that we are doomed to mediocrity. It was very difficult to take advantage of any database-specific features. Every improvement we made had to be implemented and tested for all the supported databases. The effort was prohibitively high, and the results were somehow disappointing. We realized that this was not the way to go.
The way forward was quite clear. As the support for many databases dragged us down, we had to specialize on a single database. The choice of the database engine was quite clear as well. MidPoint is open source platform, therefore we had to choose open source database. PostgreSQL was an obvious choice. The approach was clear as well. A decade ago, when midPoint was designed, we anticipated that we may need to re-work our “repository” code. In fact, that had already happened once. Therefore, the plan was to do it again. This time, we would take the full advantage of PostgreSQL features. We had everything we needed. Except for two little things, those two notorious troublemakers: time and money.
Fortune favors the prepared. In 2019 we came across NGI_TRUST. We had very little experience with European community funding, and coming from Eastern Europe, most of the experiences were quite negative. Therefore we did not know what to expect. However, NGI_TRUST looked good, and we decided to submit a proposal. The proposal was accepted, and the MidPrivacy: Data Provenance Prototype project started. The project went well, and it was a success. After that, we were prepared for a bigger challenge. We took the chance, and we submitted a proposal for MidScale. The proposal was not accepted immediately and the committee kept us in suspense for quite some time. Fortunately, the proposal was accepted at last, and the project took off.
The project was a challenge from the beginning. Due to various reasons, we got the green light a month later than originally planned. This was a complication, as the original plan was to synchronize the project with midPoint development cycle. Also, midScale was meant to be the very last project of the funding program, therefore our project had to be finished exactly on time, not a day later. This has stirred the project plan at the very beginning of the project. Yet, due to rules given by funding, we were not able to change the plan. This created a challenge that rolled through the entire project, from milestone to milestone. We have added few more people (including myself) to the project, completely funded by Evolveum, on top of original budget. This helped to smooth out the project progress, and we were back on track. With a good deal of flexibility, management acrobatics, and a dash of personal heroism, we have managed to keep things going according to plan.
Of course, the repository replacement was the most challenging part of the project. We have never expected this to be easy. However, the amount of work was still quite surprising, more than we expected. More flexibility, management acrobatics and heroism did it, and at the end we had brand-new, lemon-scented, native PostgreSQL repository implementation.
While the repository was a crucial part, it would not boost up midPoint scalability just by itself. We have significantly improved (read: reworked beyond recognition) management of distributed tasks, improving horizontal scalability. There were performance improvements in almost every part of midPoint, from the low-level data representation libraries all the way to the user interface. The error detection and handling was improved, many bugs fixed, including those nasty multi-threading issues, improving robustness. MidPoint is much more scalable, faster and more reliable system now.
However, much more than raw power is needed to run a large-scale identity management and governance deployment. Identity management is, quite obviously, all about management of identities. Therefore we had to improve manageability and overall visibility of midPoint. There are numerous diagnostic improvements in many parts of the system, most notably in the task management subsystem. A brand-new Axiom Query Language was designed and implemented, providing ability to construct complex queries in a (reasonably) human-friendly way. User interface was improved, providing much better user experience. On top of the original project plan, there are improved dashboards and native reports. New connectors can be auto-loaded now, reducing downtime. Large midPoint deployments are much easier to manage than they were a year ago.
None of this would be possible without testing. We have had automated tests for ages. However, the tests mostly focused on functionality. There was only a handful performance-oriented tests, and we could not even do much more in our rudimentary testing environment. Design and buildup of the new testing environment was an essential activity in midScale project. The environment turned up to be much better than we expected, yet it was also much harder to build it. It took a lot of time, with several improvement rounds. This was supplemented with major improvements to Schrödinger, the framework for automated testing of user interface. MidPoint user interface is quite a big and complicated piece, Schrödinger was a crucial component to keep it in working condition. At the end, we got excellent testing results. It is officially confirmed that midPoint is much better now and ready for the future.
MidScale project was finished on time and with excellent results. Due to the management acrobatics, the project did not end with midPoint release, but the last milestone was a release candidate. There were still some bugfixes to do before midPoint could be released.
At last, midPoint 4.4 “Tesla” has been released lately. Tesla follows up on Faraday release, which brought some results of midScale project to the community. MidPoint 4.4 “Tesla” will be a major milestone in midPoint history. It is also a long-term support release, therefore Tesla will be with us for quite a long time.
MidScale project has been completed, yet the work continues. This is only the start. We will further improve midPoint in following releases. There is also a lot of work on business side, documentation, practices, and lot of other things. Software development never ends.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the NGI_TRUST grant agreement no 825618.