Looks like a pretty seamless way to extract the text of news documents. Looks very simple and pretty easy.
NLP toolkit by the same team that built the Java Wikipedia database indexer/API. Looks pretty good.
I don't understand how this is different than normal wdiff but I like wdiff a lot and have heard that this software is great.