w: an algorithmic design (or two) for (distributional) social justice

This is a lecture note designed primarily for those interested in "algorithmics as a second language" but also may contain a few nuggets of interest for those of us that like to think algorithmically.

An early draft describing some of these ideas was sent to P, who I think is on his way to SIGGRAPH 2B0ST0N6.

I tried to tell my mom about some of these ideas, with some distributional success. Thanks mom, for many things and almost everything.

-----

"an algorithmic design, or two, for (distributional) social justice"

Have you ever played this game called "pin the tail on the donkey"?

I wasn't sure if I did, so I asked my mom about it, and she told me yesterday that I did, in fact, play this game at one of my earlier birthday parties, when I was perhaps a bit less complex, organically speaking, than I am as I write this note on my warm-as-a-brownie MacBook Pro.

I don't remember playing it, but perhaps the donkey has come back to haunt me. Either that, or the dizzying experience made me subceptible to the algorithmic flu at a fairly young age.

I'm going to re-roll this game a little bit, as "immunize the donkey".

You see, it doesn't seem very nice to stick sharp objects into (admittedly synthetic and/or Shrek-like, on average) donkeys for no apparent reason than for the glee of our little ones.

Instead, in this game, you play the role of an immunobiologist trying to take care of wild donkeys in the land of your proper ancestors. The local CDC has suggested that donkeys not far from your grandfather's old village may be subceptible to a recently discovered bug that has come to be known as wild donkey flu. (Symptoms may include non-deterministic braying and loose stools, but in the distributional worst case, stroke and/or cardiac arrest.)

As one of the better known immunobiologists in the area, you feel that you are called to help fight this disease, even if you wake up at night, sweating, drenched in nightmares of a common donkey/human business interchange format based on donkey kong's latest markup language (DKML). In other words, you really do not want to inspect that donkey up close!

Your engineerd friends have come up with two different solutions. One, if you wait until midnight, all the wild donkeys will have fallen asleep, if and only if they remain in (almost) complete darkness. If you can find the donkey, you can stick it with a hypodermic needle containing the immunizing agents, which you hope will contain the spread of WDF.

The other option is an aerial attack. Learn how to fly a helicopter (no sane pilot agrees to come with you, given the jumping ability of beasts infected with wild donkey flu, and drop the immunizing dart on the right part of the donkey's body.

(Note that you don't want to wake up the donkey using infrared or visible light illumination, as the wild donkeys near your hometown are fairly sensitive to a wide range of spectral frequencies.)

You happen to have a nice boss, so she asks you to estimate the probability that your first immunizing (blow or rocket-propelled) dart will hit donkey in the head, calming donkey down, and then in the tail, calming him down, and finally in the trunk, for complete immunobiological protection (supposedly).

What're you gonna do?

Luckily for you, last quarter you took algorithmics as a second language, and so you have some understanding of how be an algorithmic bad-ass in a way that's not, to borrow a phrase from a QA lead I once knew, "assy".

In fact, the very first algorithm that you learned, if you recall...

(which you probably wouldn't until some time in the near future, as this algorithm is new to me, even if I Don't Know It's Not Russian)

...was called donkey sort.

Yes, good old (new) donkey sort. Way cooler than bubble (bobble) sort, faster to implement offline than quick as-a-firefox sort, and easier to explain than triple-u sorting (also known as triple-you ranking).

To implement donkey sort (not to be confused with the non-agile Donkey Kong Markup Language aka DKML), offline, one simply needs a picture of a Donkey, some tracing paper, and some magnetic darts.

Other materials may be substituted at your own risk! Don't come crying to me if you poke your eye out with something sharp. Be careful and remember that donkey sort is not yet approved for use with newborns in most boring parts of your second favorite country.

You may print out a so-called Donkey by using an early 21st century search engine (as they were known back then) such as Windows Live (yes, I know it's a funny name. No need to laugh).

For example, some of you may be able to access the URL, http://www.live.com/?q=donkey&FORM=LVSP&scope=images

As of this writing, following this "hyperlink" is often accomplished through non-breathy pointyclicks. Funny, don't you think? Yea, I know, those of us who lived to see Y2K come and go ended up pretty weirdly boring, no?

Anyway, after printing out your favorite Donkey, use the tracing paper (and your second favorite writing instrument) to trace several closed Jordan curves that delineate the bodyparts that interest you.

For example, you may outline the donkey's head, body, and tail, disjointly (or not, which I leave as an exercise to the reader).

Now, throw the "immunizing darts" (magnetic and terribly less pointy, one might hope) at the outline of the donkey, with some xen composure.

To compute a donkey sort of these three components, head, body, and tail, simply throw magnetic darts at a suitably magnet-friendly surface where you have safely attached tracing paper (or some form of electrons scattered using more modern equipment known to us peons as a projector).

Keep throwing darts, like a reasonable Feller, until you hit any one body part. Write down the name of that body part in your YAML-based notebook. If you have enough mana, you might be able to make the body part you first hit demagnetize, but otherwise, from that point forward, ignore any further "collisions" with that first body part.

Recurse, like the sophisticated recursion fairy of your dreams.

By that I mean keep on throwing darts until you hit some other body part for the first time, given a suitable definition of the English adjective "other". Write the Mandarin name of that body part in your YAML notebook, below (or above, to the left or right, etc. -- just be consistent!) the name of the first body part that you serialized.

Continue as desired, until all three body parts are described in some distributional order. Recursing on additional body parts may or may not earn you extra credit. Please talk to your k-nearest advisers.

More concisely put, to donkey sort a set of elements in R3, you throw immunizing darts at them according to some uniform or non-uniform distribution (you get to pick, will it be van der Corput or MersenneT?), until all elements have been immunized or the universe ends, whichever comes first.

Given that discrete (?) distribution of elements, you may chose to recycle all but the first appearance of every element.

Another way to think of this involves books or roughly 2d playing cards. Take these pulp-based books and stick them on the wall. Note that some have a greater surface area, and some have a smaller such area.

Throw dart-equivalent safespace projectiles at the wall, and put the first book you hit at the bottom of a new pile, where the pile is not attached to the wall using crazy glue or other adhesives. Throw more projectiles (or point your low-discrepancy sampling laser, possibly) until you hit a second book (quite likely smaller than the first, don't you think?) and put that above the first book.

Recurse until your stack can grow no more!

For extra credit, run this experiment a (un)reasonable number of times and try to build a data-based intuition about what it means to perform low-discrepancy sampling in a (possibly high-d) permutation space, in the limit, over a large number of disjoint universes.

(You may also use BillG or BillC flavored pancakes, but I'm not sure this is a good idea.)

You may also send me a picture of your version of Frank the Goat if he had a Donkey-based relative, and I may be able to post your visual aids for many to see. I was going to do this, but I might as well try to finish this single entry by midnight EST.

What, by golly, have we done? This is not the bubbly bobbly sort of yore, this is some distributional ranking. Madness!

When I think about it, I believe that we are trying to accomplish something that may or may not be known as "approximation sampling". In the same spirit of its androgynous sibling, "approximation clustering", approximation sampling tries to be conscious of some underlying probability distribution function as it samples and samples and samples.

What kind of crazy error metrics should we use to bound distances when making deterministic proof certificates wrt approximation sampling? Hausdorff? L_blabla? This is another exercise for you or your third best friend.

Thinking about distributions, I hope, gets us around false dichotomies such as trying to pit The Long Tail against the stubborn Head of some mythical men month distribution. We simply apply our dee-falsedichotomizer and realize that head and tail are part of some larger beast, perhaps one that shall be known as the Bubbly Bobble.

It doesn't take a particle physicist from CERN to see that something akin to donkey sort might be rather useful to a large number of folks, let's say a soi-disant or unsuspecting data ho from the Fungible Empire (to be or not to be confused with the Frugal Empire, which may or may not have a relation to any legal entity id in the known integer universe), or more simply, someone who wants to contribute their little piece to a larger, distributional intelligence.

(As an aside, distributional intelligence tries to take the best of individual intelligence and combine it with the best of collective intelligence, to synthesize a more peaceful and wholesome Universe. As an aside to this aside, note that distributional algorithmics are a proper superset of distributed algorithms. All distributed algorithms are inherently distributional, but not versa vice, at least in the world I now help to define.)

Donkey sort is a lob, a request for diversity and sociocombinatorial justice in the base cased language of Time.now algorithmics.

To implement donkey sort more concisely in a language such as Ruby, consider the following pseudo-algorithm.

Instead of books and connected donkey components, we just consider integers for the simplicity of modeling and explication. Sort these integers in descending order, and think of putting them side by side.

For example, the array [8, 7, 2, 1] can be visualized as the numbers eight, seven, two, and one, side by side. Now, replace each number with a virtual stack of bricks. Going left to right, you have a stack of eight bricks, then seven bricks, two bricks, and one solo brick.

Organize these bricks more neatly into a two-dimensional grid (also known as a matrix or 2d array). This grid is eight blocks high -- [8,7,2,1].max -- and four blocks wide -- [8,7,2,1].size.

Gently move your stacks of bricks into the grid, so that the first column has eight blocks with some coloring, the next column has seven, and so on.

Now, pick a low-discrepancy (or pseudo-random) sample, by "randomly" throwing a dart into your 2d grid. Look in the column that your dart has landed, and see whether or not said dart has intersected above the top brick in that column or below the bottom brick. If so continue. Otherwise, congrats, you have your "first" distributional element.

You may either remove the stack of bricks, or transpose the firstly-intersected stack with the leftmost stack for speed of intersection. You may also want to downsize your grid as you do this, to avoid unnecessary "head count" and return a smaller matrix.

I leave it as an exercise to me or you to write this more simply in Ruby, wet noodles, or electrons, your pick. You might also want to prove something about how long it takes to donkey sort using this concrete brick model of computation.

Does any of this make sense?

I know that I've been taking it slowly, but I wanted to make this entry, in particular, accessible to folks such as my mother.

If you're more familiar with the literature, you might realize already that this work is somewhat indirectly influenced by folks such as Cecilia Aragon and Raimund Seidel and of course the now "long" history of Monte Carlo methods, Metropolis, and quasi-randomized algorithms that you might use for robot motion planning.

One of us going to have to try to show that donkey sort is (?) equivalent to triple-u ranking, as described in a previous blog entry.

Historians may want to note that I derived triple-you sampling first, and then donkey sort shortly afterwards, to make distributional algorithmics slightly more accessible to six year olds and their parents.

Apologies to fans of Riemann, centroidal voronoi diagrams, Dr. BlaBla and future hobbyists interested in VLSD and scalable methods for human-computer I/O.

I'll try to explain my speculative visions about those topics in another post.

Class dismissed.

That means you!

Okay, for those of you that stayed, chew on this until we meet again.

arr.sort! { |x,y| (x <=> y) * (rand <=> [x,y].max/(x+y)) }

Good night, good luck, go public ^_^

w:

Sunday, July 30, 2006

an algorithmic design (or two) for (distributional) social justice

1 Comments:

About Me

Advertising

Previous Posts

Search