For publishers that create and publish their own XML files (i.e. they don’t use a tool like IATI Publisher or Aidstream) it is important to publish those files on an open website where everybody can get them. This post describes a common problem we are seeing and discusses the approach we will take to help publishers solve this problem.
A web server is a computer on the internet that serves data to people who request it. People request information from the web server and it responds. People requesting information from a web server might do so using a web browser or may do so using some other computer program. These other computer programs are referred to as bots and may be run for several different purposes. They may collect information for a search engine, a data pipeline or people researching the web.
They can also be used for malicious purposes, such as trying to attack a website or steal information from it.
For this reason, it is very common to put security in front of a website that attempts to allow real people using web browsers access but to stop bad bots. The popular market leader in this area is Cloudflare.
Some publishers publish IATI data on their own website and then put security on the website to try and protect themselves from bad bots. But this often has the side effect of stopping all the users who want to use the data from getting the data.
In this case, it may look like the data is available. It is on the publisher's website and when you open it in a web browser you can see the data. However, the data cannot be used in various tools such as the Dashboard, Validator, Datastore, and more.
Recently we have seen increasing problems as people use these security products more. It’s understandable that people would want security and it’s easy to not realise that this will cause problems for publishing IATI data. However, we have to address these problems.
There is an approach based on allow-listing that many people take to try and solve this problem. Unfortunately that does not work.
Instead, the approach has to be to publish data in a way that does not block bots by default (although blocking malicious bots is still ok).
In this post and the linked guidance we have tried to lay out our thinking in a way that is understandable; this is a technical topic so please let us know if it’s still unclear.
We will continue to update and improve the new guidance. We will also work on pages to describe detailed technical solutions in common products like Cloudflare.
We will offer help via our support desk for any publishers to explain the issue and our approach more as needed. We can also help people work with any particular security products or to create a new website on a sub-domain.
Do please share your questions and comments via the comment box below. We are curious to hear your thoughts!