I’ve noticed a few threads about machine readable, version controlled OECD DAC codelists.
I’ve had a go at a DAC CRS codelist scraper (built from earlier work by Mark Brough ) that auto- sends pull requests to a github repo, with an octopub frontend.
Demo here:
https://andylolz.github.io/dac-crs-codes/
(NB there’s a few “csv invalid” errors there that I need to iron out!)
The scraper runs daily and lives on morph.io.
I wonder if this is something that could be useful to the secretariat, as a tool to help manage non-embedded codelists from OECD DAC?
Tagging some people from related threads:
Bill Anderson
Steven Flower
Wendy Rogers
Yohanna Loucheur
Herman van Loon
Absolutely! So that’s why the scraper sends pull requests – to benefit from git’s version control.
For instance, the auto pull request sent on Friday already shows a change to a DAC CRS code – OOF was removed from flow types. You can see this in the pull request diff:
https://github.com/andylolz/dac-crs-codes/pull/4/files#diff-cd40d8ab
If you check the ‘type of flow’ sheet of the DAC CRS codelist xls, you can see that is the case – code 20 (Other Official Flows) has gone.
Note that I’m following the same model here as mySociety’s EveryPoliticianBot. One improvement would be to create human-readable descriptions of the pull requests, as that bot does – rather than having to read diffs. But in general, the diffs here are likely to be small and relatively easy to understand.
The next step here would be to build on Ben Webb ’s work, to pull this stuff into non-embedded codelists and maintain a list of withdrawn codes:
Bjwebb
+3213
-546
github.com/IATI/IATI-Codelists-NonEmbedded Pull external codes automatically including withdrawn (historic) codes IATI:master ← IATI:9-historical-codes opened 03:02PM - 04 Feb 15 UTC