  1. Looks great, but not a mixture model (i.e., apparently it assumes that each document contains a single topic...)
  2. Looks like a pretty seamless way to extract the text of news documents. Looks very simple and pretty easy.
  3. NLP toolkit by the same team that built the Java Wikipedia database indexer/API. Looks pretty good.
    updated: 2011-11-08, original: 2011-11-07
  4. Another cool looking tool from The King.
  5. Take an arbitrary regex and create one monster optimized overlapping regex that matches it all. Never loop over regexes again.
  6. Stet is a cool piece of software that was used in the GPLv3 process. I think there are better tools now, but it's nice that the code is now easily available online.
  7. I don't understand how this is different than normal wdiff but I like wdiff a lot and have heard that this software is great.
  8. "The policy turnaround faces sjpeg opposition in Congress, which twice authorized Constellation with bipartisan support. Even in today’s polarized political environment on Capitol Hill, opposition to the Obama plan last week also was bipartisan."
  9. Beautiful.
  10. Cute.
    2009-08-31
  11. Nice example of the ? replacing the smart quotes.
  2008-10-17
  2007-12-19

