You are currently browsing the monthly archive for April 2008.
Philip M. Parker currently has more than 86000 of his books currently listed on Amazon. They are written mostly by an algorithmic process and the one about Japanese air conditioners is generally recognized as one of his best. One Amazon reader has the following to say about this apparent masterpiece:
I thought that Philip M. Parker had reached his peak with his outlook on Electric Tea Kettles, but he has reached a new summit with “The 2007-2012 Outlook for Year-Round Unitary Single Package and Remote-Condenser Air Conditioners with at Least 640,000 BTU Per Hour Excluding Heat Pumps in Japan”, even though I wish that he would’ve included the Japanese heat pumps.
There is so much to be said for the joys of unitary single packages, but that is better left for you to read and to experience on your own.
I look forward to reading more by Parker!
Read this interesting article on the man from the International Heral Tribune. I really like the concept…
Check out the video of French band The Outrunners. Love the 80′s vibe, very Miami Vice-like.
Muxtape is a website that lets users share mixtapes online, which is pretty cool, as is their minimalistic design. This Ruby tutorial will show how to use Mechanize (together with Hpricot) and rb-appscript to scrape the site, download the mp3′s and add them automatically to an iTunes playlist. The last part will only work on Mac OS X (because of rb-appscript).
First download the gems :
gem install mechanize rb-appscript
For clarity, I will do it in 4 steps : crawling (ie download the HTML pages), analysis (ie get the URL’s of the mp3′s), download (ie download the mp3′s) and iTunes integration (ie playlist creation and file addition).
Crawling
def crawl(n = 10)
FileUtils.mkdir_p("crawl")
agent = WWW::Mechanize.new
page = agent.get("http://muxtape.com/")
page.save_as "crawl/main.html"
new_mixtapes = {}
(page/"ul.featured li a").each_with_index do |mixtape,i|
link = mixtape.attributes['href']
name = mixtape.inner_text
file_name = "crawl/" + name + "_mixtape.html"
break if i + 1 > n
next if File.exist?(file_name) #don't crawl/analyze... if already crawled
new_mixtapes[name] = file_name
agent.get(link).save_as(file_name)
end
new_mixtapes
end
What I did here was instantiate a Mechanize object to download the first page. When I have that page, I use the Mechanize integration with Hpricot to search for all links that match the CSS expression : “ul.featured li a”, which I got by looking at the HTML content of the Muxtape page. These links point to individual mixtape pages (which in turn will contain the links to the mp3′s). Then I download the pages pointed to by the links and return a mapping between the mixtape name and the location of the downloaded HTML file.
Analysis
Once I have the individual mixtape pages, I can try to find where the mp3′s are. One caveat here is that the HTML pages do not have it in plain. The “embed” elements which do are created in JavaScript, after the page has loaded, which is no good for Mechanize. To overcome this, it could be possible to use something like Watir, which uses an actual browser instead of only emulating one, but I won’t, since the JavaScript to parse is not all that hard. To do the parsing, I briefly considered RKelly, which looks very interesting, but there is no gem yet, so I passed. I just use Hpricot to parse the page and give me the list of script elements, the content of which I will parse by myself.
def analyze(mixtapes)
mixtape_songs = {}
mixtapes.each do |name,file_name|
page = Hpricot(open(file_name))
songs = []
(page/"script").each do |script|
src = script.inner_text
if src =~ /new\s+Kettle\(\[([^\]]+)\],\[([^\]]+)\]/
ids, codes = [$1, $2].map {|a| a.gsub("'",'').split(",") }
ids.zip(codes).each do |ic|
songs << OpenStruct.new(:sid =>"#{ic[0]}.mp3",
:url => "http://muxtape.s3.amazonaws.com/songs/#{ic[0]}?#{ic[1]}")
end
end
end
mixtape_songs[name] = songs
end
mixtape_songs
end
Here again, I looked at a mixtape page to learn how the URL for the mp3′s are formed. It turns out they are stored on Amazon S3. The JavaScript that creates the “embed” element for the Flash mp3 player is a one-liner that looks like this :
var x480bc8459f377 = new Kettle(['62f2c1f39e0'.......],[...])
After using a regexp to parse this, I now have all the info needed to determine the URL’s of the mp3′s for the mixtape. The list is returned for each mixtape considered at the crawling step.
Download
def download(mixtapes)
FileUtils.mkdir_p("dl")
mixtape_songs = {}
mixtapes.each do |mixtape, songs|
dl_songs = []
songs.each do |song|
song_file = "dl/#{song.sid}"
open(song.url) do |f|
open(song_file,"wb") {|mp3| mp3.write f.read }
end
dl_songs << song_file
end
mixtape_songs[mixtape] = dl_songs
end
mixtape_songs
end
This step is pretty straightforward: Just go through all the URL’s determined at the previous step and download them. This uses the “openuri” standard library to open the remote URL. A mapping between the mixtape and the locations of the downloaded mp3′s is returned.
iTunes integration
While the previous code could be used on other platforms, this step is restricted to Mac OS X. It uses the rb-appscript library, which is a bridge between AppleScript and Ruby. There are other libraries that do this (Apple’s own Scripting Bridge, or Ruby OSA), but this one is well documented so I use it. This step will create an iTunes playlist for each considered mixtape, add the mp3′s to the iTunes library and to the corresponding playlist.
def itunes(mixtapes)
i_tunes = app('iTunes')
mixtapes.each do |mixtape,song_files|
next if i_tunes.playlists[its.name.eq(mixtape)].exists #skip if exists
pl = i_tunes.make(:new => :user_playlist, :with_properties => {:name => mixtape})
song_files.each do |sf|
i_tunes.add(MacTypes::FileURL.path(File.expand_path(File.dirname(__FILE__) + "/#{sf}")), :to => pl)
end
end
The first line gets a reference to the iTunes application. If it is not launched, it will be as soon as it is asked for info or an action is called on it. Then for each mixtape/playlist, I first test if one of the same name already exists, using a filter expression (the “its” referring to each considered playlist in turn). If there isn’t, I create one, with the “make” method. I then add all the mixtape songs to the newly created playlist. For this, I use “add” on the iTunes reference, to which I pass a URL created with the “MacTypes::FileURL.path” method, which takes as argument an absolute path to the mp3 I want to add (a relative one does not seem to work). And that’s it! New playlists have been created in iTunes, with an iPod synchronization for offline listening just a step away.
Code
Here is the file with the complete code. It will only download the first muxtape. Run as :
ruby crawler.rb
I was in Chicago on Saturday. What a great town! It was just a day trip so I just walked around for a few hours, but I really liked the feel of the city.
Update: More photos of Chicago on my Flickr page.









Recent Comments