|
I get frequently asked about all kinds of programming problems, new technologies or simply some good ideas for a thesis or prototype. And I've noticed that colleagues or students often go out on information hunting expeditions. Now I am a heavy Google user as well but switched to newsletters and portals for ongoing topics. Information should find me automatically. I will list my sources of information (with comments) for the benefit of others. At the end I'd like to speculate on automatic information gathering methods (goodle API, autonomous agents, products like Autonomy). I use mostly seven kinds of sources: (You will find links and references in the section below)
I do not use RSS feeds yet but this might change shortly (I found a good introductory article with examples in one of my newsletters). I might even integrate some of my sources into my site via RSS. Weblogs are a fairly new communication platform with very interesting social features. The way they work are investigated by collecting statistics on linking and citing and how new messages are dissipated, whether small clusters of cross-bloggers show up etc. A lot of blogs are just ego dumps but some are very interesting because they are written by real visionaries. Look below at the one on social software e.g. Other blogs which I like are Lisa Rein's OnLisaReinsRadar and Meg Hourihans blog at Megnut . Another very good blog is run by my friend Andreas Kapp on Concentrator where you will find ideas on the future of the internet. Andreas is one of the few persons who really understand what it means to "being digital". One of my favourites is Gunter Dueck, the former math prof. and now Data Mining Guru at IBM. He thinks against mainstream nonsense. Take a look at his blog at the Omnisophie Site. A newsletter is a short textual representation of new pages on a site. In very concentrated form a newsletter informs you about new stuff and one click brings you to the new article. A newsletter creator needs to create useful abstracts of the new content which are short enough to let the reader quickly browse through them but long enough to give a faithful description of the full source. A newsletter is an extremely useful way to inform readers. Newsgroups used to be more important in the past. They have been replaced by mailing lists, in many cases because of spam problems. But when I started working on a new topic (e.g. when I had to port a framework from OS/2 to NT) I first went through the proper comp.sys.... newsgroups to learn about existing problems and get some good hints. Sometimes there are more than 2000 postings available. I usually started with the last 1000-2000 messages and went through them quickly. Now this may sound like a big waste of time. Fact is: not knowing about a well-known problem can cause you much larger delays and is very frustrating. I know that some colleagues still favor the trial and error approach but working with alpha/beta releases of huge software packages like application servers or operating systems made me realize that I just need the collective experience made with those beasts. Once you have gone through the past postings you are pretty much current on a technology and its problems. You should always read the FAQ of those groups first before posting something. But don't be too shy to post something. Chances are that you will find help. But do use a special e-mail address for posting to avoid spam getting to your main address. If your company blocks newsgroups at the firewall use e.g. google to read the newsgroups (look at the "groups" link in google) Mailing lists are the centralized version of newsgroups. The better ones are usually moderated and spam is blocked. Examples are the xml-dev mailing list. Traffic can be very high on those lists. Still, if you start with a new technology (e.g. eclipse IDE), register for the associated mailing lists to learn about the latest bugs and problems or to meet interesting participants. List archives are good for high-traffic mailing lists which are only sometimes relevant for your work. E.g. I receive the xml-dev mailings directly but for XML Schema or XSL questions I go to the respective archive and read the postings there. Otherwise my inbox would just overflow in only a couple of days. Something in between the complete mailing list and archives are digests: basically an extract of the mailing list with sender and subject information. Good for high-volume lists. I found it hard to read through digests because many subjects to not really reveal a lot about the content. One must know the senders a bit to handle digests. Portals are the most important sites with respect to a certain topic. A good example is TheServerSide.com which covers J2EE related things. Other examples are openp2p for peer-to-peer things, eclipse.org for the eclipse IDE etc. I noticed that I use some portals heavily for a certain time and then not for a much longer time. Still, you should know the important portals for your areas of work (I guess I should ask for those in tests (;-)) Directories are hierarchically organized informations, collected by humans. I use the the open directory project (DMOZ) sometimes - usually way too late after spending too much time with search engines. You will find a lot of information directly relevant to your topic doing a directory search. Directories are an excellent way to get a quick overview within a certain field of work. So why use search engines if there is DMOZ? As I said, I use google too much and DMOZ not enough. Nut search engines allow you to find only partially relevant informations as well and sometimes they retrieve things which a category based search in a directory would not have brought together: Search engines are an association tool as well and help you getting new ideas or finding new associations. This process is by necessity vague and time consuming but also exciting. Just don't get lost too much (The dictionary effect where you look for Z, stumble over B, D, G and at the end you've even forgotten that you where looking for Z) I just said something about search engines above. I use google mostly but there are others as well like kartoo.com - an associative engine. I just bought the book on 100 google hacks to optimize and possible automate some of my searches. I am quite sceptical about most conferences. Most of them seem to be an expensive way to waste your time. But I do like the OOPSLA conferences even though I've never been there. If I need to get into a new area of development quickly I usually try to find last years OOPSLA workshop on this topic. My latest example is the workshop on Model-Driven Architecture where I downloaded about 15 short papers which gave me a fast intro into the current state of affairs.
What are you going to do with all the information? I'd like to show you my production line for downloaded information. As I am a frequent traveller just printing out books, articles, papers etc. from the Internet does not work for me. I cannot read through large stacks of paper on the train without losing some pages and getting lost. Here is what I do: I bought a fast lexmark optra S1850 laser printer through ebay and a duplex unit for it as well. It runs as a network printer in our network and wasn't expensive. Cartridges are good for 17600 pages and can be bought cheaply in the aftermarket. Duplex printing is essential because of the reduced weight. And I bought a ring binding system (a puncher) large enough to bind books with 500 sheets of paper. Those are available from Ibico, Renz and others and cost about 250 euro/dollar. Supplies for it (plastic ring binders, cover papers and transparents are available for little money on ebay. I use white plastic ring binders because I usually write the book title with a permanent marker on the back. A big advantage of plastic ring binders over all glue based techniques is that you can change/add to the content at any time - something I do frequently when I find new information about a specific topic. Some of my "books" start with only one article and then grow over a term. You can switch to a larger size binder any time. As you can see the paperless office has not quite reached me - I just hate reading online. Not the least because I usually have a text marker and pen to highlight text or write down my comments (e.g. "slide" to denote that I will use this picture in one of my lecture slides). As I said above I do not use automated information finding yet, except for simple searches via search engines. Ebay offers a nice automated feature through stored searches. Once you've defined a useful search on ebay you can store it and ebay will run the search when new messages are posted. If your pattern matches you will get a message from ebay telling you which new articles fit to your query. I used this a couple of times when I was looking for parts for my yamaha FJ1100. RSS will be an important topic for me and others in the future. I'd like to offer a RSS service for my site and also integrate other sites with mine through RSS. Sounds like a thesis to me (;-). Another thesis idea would be autonomous agents using the web-service based google api to harvest information. Autonomy is a product that uses Bayesian networks and other technologies to filter relevant information according to your personal profile - which it creates as well through usage analysis. I've once seen a demonstration of it but couldn't do anything in a project yet. Last but not least there is also the battle between the markup people (semantic network, topic maps) which believe in users defining the meta-data for their documents and the statistics or collaboration based people (google, autonomy, amazon) which believe in automated ways to create the meta-data from existing documents. This is of course a rather hot debate. Personal experience makes me believe that in most areas the statistical or collaborative filtering approaches will prevail - simply because tagging your documents with meta-data is a rather laborous process. This does of course not question the value of topic maps at all. The real question is about the costs to create those maps. Another thesis? |
|