Dear IATI Community, 

Last week we released some changes to the IATI Datastore, specifically, how our search engine, Solr, organises its data. These changes were necessary for the work we are doing to make possible an improved view of networked and linked data. As a result, we have two new fields available that can be used for Datastore searches. At the moment, these fields are only accessible via the Datastore API, and not via the Datastore website. The two new fields are:


iati_activities_document_hash

This allows you to search by or retrieve the document hash for any given dataset. The hash is a unique value for a dataset which changes every time the dataset is updated (unlike the dataset id, which remains the same, and is available through the iati_activities_document_id field). See this page for more information of why the document hash is helpful.


iati_identifier_exact

This field is a duplicate of the `iati_identifier` field, but it is always treated by the search engine (Solr) as a complete field (in technical terms, it is a non-tokenised field). This makes it more reliable to search for specific IATI Identifiers. If your query is iati_identifier_exact:SOME-IDENTIFIER, the result will be results where the IATI Identifier is an exact match of SOME-IDENTIFIER (note that double quotes are unnecessary, unless you want to search for an Identifier which contains characters used by Solr as part of its request syntax, such as ‘:’). By contrast, if your query is iati_identifier:SOME-IDENTIFIER you will receive as results all activities that contain either SOME or IDENTIFIER in the IATI Identifier. It has always been possible to enclose the search term in double quotes, e.g., iati_identifier:”SOME-IDENTIFIER”, but this search will return results that match SOME-IDENTIFIER exactly as well as results that contain SOME-IDENTIFIER as a sub-part of a larger IATI Identifier. The new `iati_identifier_exact` field therefore allows for more precise searching based on IATI Identifiers.

 

Before making this functionality available via the Datastore website, we will carefully review how the Datastore is currently used, so as to cause only minimal disruption.


If you have any questions about these changes, please feel free to share your question in the comment box below.

Comments (3)

Sylvan Ridderinkhof
Sylvan Ridderinkhof

Thanks for the comprehensive post. I think both fields make sense to have available. Curious if there is some Solr configuration for the behaviour of the iati_identifier field, as in IATI.cloud the field behaves as you described for the iati_identifier_exact field. Either way, having the fields available is definitely valuable!

I am curious what use cases you have that require searching for the dataset hash rather than the dataset ID? We use it for update management, but I have seen a use case for searching with the hash.

Simon  Kittle (IATI Secretariat)
Simon Kittle (IATI Secretariat)

Dear Sylvan,

IATI.cloud is a different product, and it may be that that the iati_identifier field is treated there in a way that allows for a more precise searching. On the IATI Datastore (https://datastore.iatistandard.org/), the field functions as described above, so the additional field is helpful.

We have added the file hash to the Datastore to allow us to improve the speed and efficiency of the Solr update process (Solr is the search engine which runs the IATI Datastore). At the moment, when a dataset is updated, there is a brief period where the data is inaccessible while it is re-indexed by Solr. Adding this hash will allow us to reduce that period further, meaning that the data in the Datastore is more reliable.

Simon

Herman van Loon
Herman van Loon

Dear Simon,
This is a very useful extension of the datastore functionality. We are checking the validity of references on a regular basis. The tokenization of the iati-identifier causes some technical challenges for exact matching which are now addressed.

Thanks for your update and all the work you have done!


Please log in or sign up to comment.