This idea reflects a certain reality that at the end of the day people are going to write about what interests them, and a clinical treatment of a topic with rigid constraints isn't necessarily all that appealing. So my challenge instead is to automate the connection-building between articles.. which given the state of information retrieval is certainly doable without having to invent any new IR algorithms.
The ultimate goal is to really start scouring the Internet for game development resources so commonly scattered across blogs in varying formats and reproduce them in a common article format for everyone to freely use. If we are able to expose all the connections between articles it should make for a nice information archive with some permanence.. unlike some articles that just drop off the net once they close up their wordpress account.
My first tests have involved using Apache Solr to create an article index. Out of the box it's turned out to be quite versatile for this purpose. Over the past few days I did go about getting a little crazy though.. extracting all the document term vectors from the Solr Lucene index and running a cosine similarity check against them exhaustively. I had to write a job that would utilize Amazon's Elastic MapReduce service and turn a problem that was going to take roughly a week to process down to one that would take roughly six hours.
The funny thing is, after doing all that I ended up discovering a feature in Solr that when turned on would meet about 75% of my immediate needs. The next challenge is going to be finding a way to build document clusters.
The approach I'm currently shooting for may involve the community training some sort of learning machine to have an ideal model of articles for a particular subject and then it would work to classify other articles that match up closely. That way if you create a class of documents called "Component Entity Systems" then you would hopefully be able to find all articles that match up.
At the end of the day single well-organized lists of articles can still be very useful if they are well-organized. That is challenging to do automatically..
In my ideal world, as you use the site the site would begin to morph to show you more stuff that matches up with your tastes. Apache Mahout exposes something like that.. but I haven't experimented with it too much yet.