Data Provenance Prototype Is Finished

The development of data provenance prototype is finished. The prototype will be a part of midPoint 4.2 release. This concludes the first phase of midPrivacy initiative. There are interesting results, both practical and theoretical.

Metadata, data about data. That is the core of data provenance. However, metadata have their structure similar to the structure of ordinary data. The first problem was how to express that structure. None of the existing popular data modeling languages had any support for metadata. Therefore we had to invent our own language: Axiom. Creating new language is a major task and we have considered all the options to avoid reinventing the wheel. But in the end, Axiom was the right way to go.

We have used Axiom to create metadata schemas. We have updated all of the midPoint core to support metadata. Metadata are stored in the repository, there are metadata mappings and value consolidation and reconciliation algorithms are fully metadata-aware. MidPoint user interface was extended to display value metadata.

If you want to see the results of our work, there is an recording from our workshop (and slides) that also includes the demo of metadata functionality. All the other details can be found on project page under the midPrivacy initiative. If the concept of metadata is new to you, then perhaps the Identity Metadata In A Nutshell story is a good place to start.

This project was really interesting and enlightening. Metadata are one of the fundamental building blocks for data protection functionality. But it is also an area that was not completely explored yet. We have encountered a lot of challenges during the project. Some of them were very expected, such as the difficulty to design Axiom. But other challenges came entirely out of the blue, such as metadata multiplicity problem. Some of these challenges may perhaps be even classified as discoveries. Anyway, we have dealt with them in one way or another. The prototype was a success in both ways: it uncovered hidden problems and we have a working code in the end.

The prototype code is now integral part of midPoint. It will be released in midPoint 4.2, which is planned to happen soon. However, this is still a prototype. Entire metadata functionality is marked as experimental. The new implicit value metadata live alongside the old explicit metadata. The old metadata as we know them from midPoint 3.x are still there and they are fully supported. We have preferred compatibility and decided not to use the new experimental code until it is sufficiently stable. The new metadata functionality is part of midPoint, but it is turned off by default.

Most of the costs of this project were covered by European community funding, in the form of NGI_TRUST initiative. We are more than thankful for this opportunity. I would like to thank the mentors which were very helpful, especially given that this was our first “Europroject”. However, we felt that we have to go beyond the scope of original project proposal and therefore we have also invested our own resources into the project.

Phase 1 of midPrivacy initiative is done. But we are still far from our ultimate goal. There is still a lot to work on to develop the data protection and privacy functionality that we need. However, data protection is quite a special field in many ways. One of the characteristics of data protection is that it is very difficult to secure commercial funding for data protection and privacy features. We all know that data protection is needed, but it is hard to get anyone to actually pay for it. Therefore the major obstacle to continue midPrivacy initiative is, of course, the funding. We have tried to follow-up by submitting several proposals for European community funding. But sadly, none of the proposals to continue midPrivacy was successful. Therefore the future of midPrivacy is not certain yet. But one thing is certain: data protection and privacy is absolutely necessary and we are not giving up!

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the NGI_TRUST grant agreement no 825618.

Leave a Reply

Your email address will not be published.