24 March 2006

More Squiggly Lines

I realized that someone could use spectrograms to compose as well as recognize sounds. That is, with enough training, I could think of a sound (virtually any sound), then imagine what the picture of the sound must look like and draw it. It's like mastering an instrument: The Pickettheremin Musical Whiteboard.

With this in mind, I threw together a quick script so that I could point my webcam at a whiteboard, press a button on Lappidactus, then listen to the (...ahem) beautiful muzak [sic] produced.

Below is my first composition (both the sound and its spectrogram). It's called "Squiggly Lines I: John Cage Shooting a Laser Shotgun".



19 March 2006

A Story of Predator Drones, Signal Intelligence, GPS, and 20th Century Human Intervention

I was reading an article in the April 2006 issue of The Atlantic on the NSA and their purported ability to snoop on all information, everywhere. The piece is a bit of hype given the domestic "wiretapping" (they mean to say signal intelligence gathering, apparently) press of late. One page of the article had an inset on the 2002 predator drone strike in Marib, Yemen that killed Qaed Salim Sinan al-Harethi. Harethi, the article says, was a high priority target. Below is an image of a predator drone like the one in the attack.





The story goes on to say that when a cell phone was used, a monitoring team saw the alert, located the call using GPS, and began listening. The people involved made the decision to attack the target after listening to the speakers on the phone. This time, Harethi's driver was using the phone and only when Harethi began giving directions to his driver did the staffer realize the target was present.

Let's consider this problem. Harethi was using any number of cellphones and cards between cell phones to avoid eavesdropping or tracking. The NSA was able to be alerted to any one of the phones/cards being used and immediately listen to the conversation. The NSA located the cell phone using GPS. The CIA launched an attack from an unmanned Predator drone carrying Hellfire missles. There's much to be said about these technological feats, but there's more to be said and done about the 20th century approaches to the rest of the operation.

A human being, trained to speak in the language of the enemy, had to be listening in real-time in order for this mission to succeed. The human had to listen to a conversation containing at least three people, separate who was saying what and when, and determine that Harethi was one of the speakers. There is no way, today, to get a computer to do these human tasks, but it poses a nice playground for research.

18 March 2006

Math For Programmers

I know this site is supposed to cover issues of Cognition, Robotics, and Learning (as it applies to Artificial Intelligence), but I thought this blog was particularly valuable anyway. Steve Yegge did a nice evaluation of the current state of mathematics teaching in the US and why the modern programmer uses very little of what they learned in school. He proposes an alternative methodology for learning math and suggests everyone try to learn a bit more in their free time.

15 March 2006

Graceful robots

Here's a video of four QRIO robots dancing. They have been programmed beautifully.

Speaking of robots - jabberwacky.com has chatbots that you could chat up. They are kind of slow but worth a try! The conversation could get a bit edgy though - the chatbots are afflicted with severe ADD. :)

More on artificially generated music

I'd come across GenJam a couple of years ago. It is a software that learns to play jazz solos. The creator of this software is Al Biles As the name indicates it is essentially a Genetic Algorithm.

The website has a few snippets from a jam session between the creator and GenJam. Some of the results are actually very listenable (i.e., if you are "into" jazz music) I recommend "LadyBug" because there are distinct pieces played by the software and the creator and is a good showcase of the software's "talent".

He actually plays regular gigs using that software as a part of his Virtual Quintet.

13 March 2006

Learning Elmospeak

Toys are fun and toys that you can modify or extend (e.g. LEGO) expand everyone's imaginations. After seeing someone's attempt to hack Elmo, I had to try myself.





Casey points toward the innocuous file "temp.inf". In temp.inf we can see a grammar, of sorts, for Elmospeak. For me, I'm interested in what language Elmo speaks (I call it Elmospeak). With stripping out extraneous bits, we're left with a bunch of non-terminals of the form PL_# -> SOUND1 SOUND2 ... SOUNDN, where # is an integer between 1 and 620 and SOUNDN is some terminal associated with a particular sound (generally a word or generic phrase). Also, an [NMSEC] terminal "plays" silence for N milliseconds. Here is an example production:

PL_059 -> ONE [150MSEC] TWO [100MSEC] THREE


Further, there are GAME and STORY non-terminals that expand to any number of non-terminals and terminals. But, this has all been said before. Let's instead consider learning Elmospeak. Overall, Elmospeak is a regular language and regular only because of its perceived finiteness, I think. In my experience with Elmo, the same utterance is heard each time a conversation begins. From there, with some probability, a non-terminal is selected from a subset of non-terminals. The problem is that there doesn't appear to be any nondeterminism in Elmo by default. Right now I'm experimenting with adding nondeterminism to Elmo and seeing just how productive Elmo can be.

12 March 2006

Squiggly Lines

I claim that The Speech Recognition Problem is fundamentally The Vision Problem, and both are really just The Squiggly Line Problem. My hypothesis is that with a few weeks' training. I'll be able to translate spectrographs (no background noise) into their English phrase.



Tim and Bill each have $10 that says that I can't do this. Is anyone else interested in giving me their money (i.e., betting that I can't do this)? Unless your name is Bill or Tom, I'm gonna cap the amount that you give me to $1. Just to warn you, I've trained myself to read Suetterlin handwriting, which *I* once thought was impossible.

Here is a sample spectrogram of me saying "Chicky check, microphone check, chicky check-a, sibilance sibilance.", one of the coolest sounds ever (check out its spectrogram), and my java code that gives you a real-time spectrogram.

(After I lose my hearing from my iPod, I'll be using this software so that I can understand people.)

09 March 2006

Generating Music

At Playing the Market (via /.), music is generated based upon some physical phenomenon (e.g. the stock market) that need not result in classic musical styling, but rather, conform to the physical phenomenon. The goal is not to mimic, but to be artistic. Take a listen to Fibonacci's Random Walk (part 1) which is based on the Fibonacci sequence [ F(n) = F(n-1) + F(n-2) ] augmented with deterministically generated noise for an example of this approach. The result is eminently listenable, but other works on the album lack that quality.



Veering away from data inspired music, we look at music generated by the approach in Grammar Based Music Composition (see the example above). Jon McCormack transforms "string rewriting grammars based on L-Systems into a system for music composition" that seeks to aid in the composition of works. While not using the L-System approach, a sample of work generated from human speech offers a taste of his music.

06 March 2006

amazing video game with evolution and AI

Will Wright, of Sim City fame, has created a new video game named Spore, where players start with a single celled organism and help evolve it over many generations (using a "creature editor") through a sequence of more complex organisms that walk on land, mate to produce offspring, eat other creatures to survive, and so on. Eventually, the creatures form villages, cities, and leave their home planet to meet creatures evolved by other players. There is a fantastic video available where the developer shows off what the game can do. Text on the video's web page says "Will Wright talking at the Game Developer's Conference about 'Spore', which looks like it could possibly be the best video game ever". That may not be hyperbole.



The video contains interesting comments about "procedural verbs", like "bite" and "walk" and how they can be combined to produce "drag" = "bite and walk". It's unclear whether there is significant NPC AI, or whether the creatures are all the result of the work of other players.

A bit of cognition woo-hoo

Here's a nice bit of cognition related food-for-thought I found on my friend's blog. Before you click on the link below, read the following. (If you don't do so, you'll miss the point!) There are two teams of people, one wearing white tees and other wearing black tees. Each team has a ball that they throw to another member of the same team. Your job is to find out the number of throws made by the white team members.




Did you find anything wierd? Not really? Please go back and watch the video again(trust me!). This time around, just watch the video relaxed.

How on earth did you miss that before? Are we missing out on some eye doctor appointments? :)

01 March 2006

Music To My Ears

According to this story on PR Leap, an Isreali company (could it be the people putting porn on Sprocket?) has announced the release of a new product to stream mp3s to 3G cell phones. "MusicGenome, the leading expert in Artificial Intelligence (AI) and Music Cognition, announced today the release of Smart DJ – a new innovative fun product for the global cellular market. Smart DJ offers 3G mobile phone users the ultimate in digital music experience, by creating a new online entertainment service that accurately analyzes listeners’ music preferences and then delivers a stream of mp3 songs totally in-sync with the listener’s taste with 80% success rate." It does not discuss the specifics of the technology employed, but it does claim to be better than products that employ Collaborative Filtering. The company's website does not appear to be compatible with Firefox for Mac OS X.

Speech Accent Archive

I just happened across an interesting data source: the George Mason Speech Accent Archive. Accent and dialect classification can improve speech recognition approaches and automatically labeling native tongues would make our friends by the airport happy. The archive, as of 1 March 2006, contains 502 recordings of individuals reciting the same paragraph of English text. Here's an excerpt:

"Ask her to bring these things with her from the store: Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob."


One of my favorites is a recording made by a native Wolof speaker from Senegal. At UMBC, we offer language courses in Wolof that I am considering auditing.

Watch out for that port-scanner half a planet away...

It was a cold Wednesday morning. The inhabitants of Coral and Maple were busy with their daily bustle of intense research activity. Dr. Marie desJardins walks in to the lab and says "Could someone shut this window on sprocket?" Curious we assemble around our good ol'server (sprocket). Lo and behold! There is a Safari window open with a fairly graphic picture from a porn website in Israel open!

A whirl of activity ensues. Disturbing questions are asked - Who did this? How to prevent it? How to track this down? Was it just a prank or a malicious attack? Are Mac servers (gulp! gulp!) vulnerable? (Incidentally, slashdot carried this post on two days back)

The obvious suggestions are all made and shot down in turn. Concerned lookers-on (self included) throw in their tuppence-worth nonetheless - check Safari's history, check network usage, check VNC's logs... After a frantic morning of grepping through logs and other thingmajigs, our sys-admins track the attack down to an IP address in Israel. They confirmed it wasn't malicious. (Why would someone want to remote login to a computer and watch porn on it?)

Moral of the story: Logout of your computer when you leave.