I’ve noticed a few threads about machine readable, version controlled OECD DAC codelists.

I’ve had a go at a DAC CRS codelist scraper (built from earlier work by Mark Brough ) that auto- sends pull requests to a github repo, with an octopub frontend.

Demo here:

(NB there’s a few “csv invalid” errors there that I need to iron out!)

The scraper runs daily and lives on morph.io.

I wonder if this is something that could be useful to the secretariat, as a tool to help manage non-embedded codelists from OECD DAC?

Tagging some people from related threads:
Bill Anderson Steven Flower Wendy Rogers Yohanna Loucheur Herman van Loon

Comments (6)

Mark Brough
Mark Brough

Great work Andy

Just to flag that the CSV files also contain both English and French versions and are quite nicely structured. So they might be useful for a number of other users, especially those building French or multilingual tools.

Steven Flower
Steven Flower

Thanks for sharing this Andy Lulham

Alongside the need to get these lists in a machine readable version, there’s the other need of “what has changed?”

Andy Lulham
Andy Lulham

Alongside the need to get these lists in a machine readable version, there’s the other need of “what has changed?”

Absolutely! So that’s why the scraper sends pull requests – to benefit from git’s version control.

For instance, the auto pull request sent on Friday already shows a change to a DAC CRS code – OOF was removed from flow types. You can see this in the pull request diff:

If you check the ‘type of flow’ sheet of the DAC CRS codelist xls, you can see that is the case – code 20 (Other Official Flows) has gone.

Note that I’m following the same model here as mySociety’s EveryPoliticianBot. One improvement would be to create human-readable descriptions of the pull requests, as that bot does – rather than having to read diffs. But in general, the diffs here are likely to be small and relatively easy to understand.

The next step here would be to build on Ben Webb ’s work, to pull this stuff into non-embedded codelists and maintain a list of withdrawn codes:
github.com/IATI/IATI-Codelists-NonEmbedded Pull external codes automatically including withdrawn (historic) codes IATI:master ← IATI:9-historical-codes opened 03:02PM - 04 Feb 15 UTC Bjwebb Bjwebb +3213 -546

Please log in or sign up to comment.