Amongst those that care and share open data on international aid and humanitarian activities, a long-running ambition has been to use the data to discover networks: places where activities that are related can be identified, and those relationships analysed. 

This networked view of the interactions between organisations enables people using IATI data to get a better understanding of how cooperation between organisations takes place and how organisations are dependent on each other. For example, it allows the discovery of alliances of organisations working together towards a common goal. A networked view can show the link between funding sources and the realisation of the intended positive changes, and all related actors.

The standard supports this: there are a number of elements where a publisher can refer to another activity by its identifier: the participating-org, transaction/provider-org, transaction/receiver-org, and related-activity elements all provide space to refer to other IATI activities. 

Use of this provision is not widespread, and this limits the networks that are discoverable. We’re starting work to support those that wish to further address this. Part of this work is to add a function to the IATI Validator to check that where another activity is referenced, it actually exists in the corpus of already-published data. This both helps improve data quality by prompting publishers to check for typos or errors if referenced activities don’t exist, and signals that these references are a valuable-enough aspect of the data to check for, if they’re not provided at all. 

In our initial research, we’ve detected that approximately 10% of activities references that are specified in published IATI data don’t resolve to activities that are present in the wider IATI corpus. This is a bit like a Page Not Found on a website: if you try to follow the link, you won’t find anything.

We believe that this figure can be improved significantly over time through providing better validation for publishers. There are approximately 150,000 references to activities in the corpus; out of over 1 million published activities this is a relatively small proportion, that we’d hope to see an increase in. 

We want to alert publishers to this situation, but we’re aware that in some cases they won’t be able to do anything about it: if the organisation who publishes the activity being referenced has removed the reference, or hasn’t published yet, or has made an error in its publication, then there’s nothing that the publisher can do apart from alert the other organisation to the problem. Therefore, we are clear that this is not an “error” like other messages from the validator, but a new type of information. In the Secretariat, we can monitor the results of this new check and look for cases where we’re able to intervene to improve the quality of the network. 

There are, of course, trade-offs and design decisions to be made: do we use the data that’s in the datastore (which is all data without critical errors and therefore guarantees that someone trying to follow the links between activities gets data that uses the IATI standard), but excludes data that might have errors that aren’t relevant to a particular data user. Or, do we take the opposite approach and check against every activity identifier that we have ever seen in the data, even if it has critical errors or has disappeared a long time ago? 

We are also aware that there are other limitations: there may be timing issues if an activity identifier appears in a very recently-published file, for example. 

Even with this in mind, we believe this will be a positive step and will introduce a useful prompt to the publication process. 

We’ll share more details as we get underway, but we’d be keen to hear your feedback on this work. How important do you think it is to have a better network of data? Would the specific validation on activity-identifiers in the corpus be helpful to your work, and how?

We’ve also made a post to solicit input on the technical implementation; input on those details would be welcome as well. 

Comments (3)

leo stolk
leo stolk

Dear Rob
Thanks for this initiative and post. In my view linking upwards to provider activity ID is an important ingredient to contribute to a networked view of the complexity of relationships, delivery trails and other forms of collaboration. 
Building up and maintaining this networked view requires a continuous effort of all involved in the network, publishers and data users. those that refer upwards and those that are referred to.
It would be good if we could list and share ideas or good practices that support this effort. For instance: 
- including the actual correct IATI provider activity ID in any outgoing funding contract; 
- provide a regular report to publishers (like the Dutch MFA) pointing out the inaccurate or incorrect use of provider activity identifiers;
- train smaller publishers and CSOs so the network expands and include Local actors;
- acknowledge and reward the use of correct provider act ID's;
looking forward to more suggestions and IDs. 

Maaike Blom
Maaike Blom

Thanks Rob for sharing your thoughts and progress on this topic. I strongly support this line of thinking. When we want to follow the money, the correct referencing up and down the aid flow is essential, also to get a proper understanding what is actually happening in the field. And rewarding the use of correct provider & receiver IDs is a great idea, Leo.
I would focus first on the data that has actually passed the first validation to build up a solid model of reference material, before venturing out to the more rough and unstructured data with critical errors. The likelihood of finding relevant information regarding IDs is less anyways, since it is probably old or misplaced data to start with. 
Besides this, I would like to plea for a simplified version of the standard with less fields that lower the threshold for publishing your data while keeping a strong focus on the correct IDs. The more networked and expanded the data becomes, the more interesting the insights will be related to who does what when and where.

Herman van Loon
Herman van Loon

Dear Rob,

Thank you very much for this initiative. Validation of links to activities is i.m.o. one of the most important improvements you can make to the IATI data validator for the reasons you so eloquently described above. No one actively engaged in development or humanitarian aid works in splendid isolation and the data should reflect this. Your proposal will i.m.o. contribute to that goal.

Since one of the key assumptions of the IATI standard is that every publisher should publish a complete set of all activities including all historical closed activities, i.m.o. the corpus of all valid IATI activity identifiers can be found by taking all published IATI activities in XML files referred to in the IATI registry into account. The validity of every link referring to an IATI activity should be checked against that corpus. 

The data validator message should be a warning, something like "you are publishing an IATI activity identifier <IATI identifier> which does not (yet) seem to exist". 

Please log in or sign up to comment.