Books: wisdom of crowds & swarms

Tuesday, November 06, 2007

tagged: algorithms, artificialintelligence, collaborativefiltering, randomness, wisdomofcrowds

Need a book? I am surprised by this website: http://www.whatshouldireadnext.com/ Where I read about it, it called it "last.fm for books." The idea is to start from a book that you like (or multiple books - I didn't try the service: I'll get to that in a minute) and then run through their current database of user's books and suggest a book for you. Like last.fm, pandora... it's collaborative filtering put to specific use (or maybe they don't use an algorithm, they use people - dunno). It's wisdom of crowds. Sounds reasonable, but here's my big fat question: Who needs this? There are "problems" that I could use help with, that wisdom of crowds might be useful to me: what email tool to use (if, say, I wanted to switch from GMail at home & Outlook 07 at work), or whether to install the new Mac OS ( Dave Winer would answer "no," what about everyone else?)... I don't need this service: I have stacks and stacks of books that I want to read and haven't. I keep Amazon lists of them (86). I also have a boss's office with several more books not on my lists that I want to read. Further, if I were to zero out those lists, I still wouldn't have a problem: it's far easier and faster to add books than remove them. I've been on a bit of a spree lately: in the last 4 weeks, I've finished Know How, Freakonomics, and Peopleware. This has a been a good year generally, but those were relatively easy and interesting reads and I had a plane trip in there to help (Freakonomics was my YYC-->SFO flight). I'll probably slow down over the next while, at least in terms of # of volumes, since my current book is the 3.8 pound, 800 page, Designing Interactions. Great so far, but seriously, 800 pages? That's 3-4 books - I guess there are about 30 authors, so that's forgiveable (and the quality is so high in the first 2 chapters, that it's more than forgive-able), but still it's unusual - like a 6-hour movie would be. My point in all this: I am a fast reader and enjoy reading can easily outpace my reading with additions to the list. For example, the references in Peopleware alone added 2 more books to the list: so I'm actually moving backward! I don't plan on ever completing my reading list. Amazon lets you rank them and, honestly, the ones ranked "low" or, poor souls, "lowest" will never get read: it's just my list of books that could possibly be interesting. But seriously, are there people who read a book and then wonder what to read next? Is the problem really "I'd read more if only I knew about where to find more interesting content to read about?" I don't get it. Prey My current "fun" book (read: fiction helps me sleep) is Prey, by Michael Crichton. Interesting, but I think it's a poor choice: I don't think it's helping me sleep. It inspired some thoughts about swarm behavior and I think I'll do a few fun programming things to emulate emergent behavior, independent randomized agent behavior, artificial learning, and learning networks (in that order). Some of the stuff is a bit "out there," but some of it is actually true. I think his most insightful piece (by research or luck) is the use of randomness in programming learning, or at least programing analysis of data (analyzing & understanding data = learning). But first: most programming is very much not like what is described in this book or in movies. Why not? Because programming isn't really terribly interesting to watch or read about (not to non-coders, anyways: it's too abstract). What programs do, on the other hand, is very interesting. But for some reason, fiction about anything technical feels the need to include the fictitious technology behind it. I'm guessing it, ironically, helps the audience feel it is real. However, techniques used in collaborative filtering (see link above) sometimes do... Hmmm... Collaborative Filtering I've mentioned some of this before, but let's take the "what should i read next?" question. You have a stack of user's books. And, presumably they rate them. They might get fancy, but let's just say that they rate them from 1 - 5. So you have a bunch of books that Joe rated 3, some he rated 4, some 5 (cuz, really, unless you really disliked a book, are you going to take the time to enter it into the system just to rate it "1?" - maybe some people). Anyways, if you look at all of Joe's books, you can find patterns: lots of books that are westerns gets 5s and sometimes 4s. Good, you learned something: Joe likes westerns. Then you wonder, what about those western 4s - why didn't he like them as much? You notice that most of the western 4s are written not by Louis L'Amour. Good, you learned something else: he really likes Louis L'Amour. You can now predict a few things about Joe's book preferences. Now, say John taps in his first book: Valley of the Sun, by Louis L'Amour (you can see where I'm going with this). Which book would you suggest to him? Maybe one of the one's that Joe rated a 5. That's collaborative filtering. Except computers do that many, many more times (though not too many times - seriously, it's called "overfitting") with each user to learn many different "rules" to predict behavior; they look at many more users; and, they don't know what the rules really are. You can articulate: it's "Louis L'Amour," but the computer just knows that this group of books goes together - it don't know it's the author (and it doesn't matter for functional purposes). That brings up a very interesting point about machine learning: it can't necessarily articulate what it has learned or what it means in our language. A human would be able to (for relatively simple rules, anyways). Let's say you wanted to map these on a page: say you put a dot for each book, and then put dots closer or further away from each other depending on how much they related to each other, you could end up with an interesting " map." (the link is of movies, not books, but the concept is the same.) Why? because it helps you visualize what is going on and, if you are dealing with thousands of these, it's hard to understand in a list of say, 18,000 movies clustering in groups: like this image: But where would you put the first dot/book? You'd put it in the center maybe. The second? Nearby, if it was related, but to which side? Now push those questions out thousands of times... How do you figure out where to put dots based on their relationships and, more importantly, if you want to have them closer or further from each other, how do you know where to put them if there are thousands of related dots/books all "pulling" on that dot with different weights. Back to Prey and randomness. It turns out that randomness is a good way to go. You introduce some guessing ("dot 2 goes left this time") and, once you've put dots down, you take more guesses ("what if we picked this dot up and moved it over here: is that better or worse?"). That may seem inefficient, but remember, a computer can make hundreds of guesses in the time I wrote this sentence - and it would take a human a long time to program a "better" way. It's sometimes called measuring "energies" or "forces" (push/pull from other dots) and "local maxima": really nerdy academic pdf article here (check 2.4.1 for minimizing local maxima with random "jumps"). Aside: Fun quote from the article: "However, the energy 'surface' for thousands of vertices[aka "dots"] is so chaotic (both spatially and temporally), that, in practice, we have found the simpler method performs better." read: we tried other ways cuz you would think that would work but, honestly, it was too hard and the random thing worked better than what we tried. Sometimes you don't really understand something until you explain it: your brain was churning away in the background (while you slept) and had the answer ready: you just had to ask for it in a different way (instead of "how does this work?", "how do I explain this?"). But the mind working while you sleep, is a whole nother discussion. And not relevant for this evening, clearly. I'm guessing the evil "eat people" of the agent swarms in Prey is probably why I'm still awake right now. I'll go try method #2 for sleeping: football.