An OSX Service to get a web page title

The issue: I have a bunch of services that I use to drop URLs into a journal-type text file that lives in Dropbox, which I then go through to write blog posts, newsletters and the like.

Going through each link (opening up in a web browser, then copying the relevant details from the web page back into the text file) is a boring task. But the real problem is that its a boring task that I only do when I'm in the right mood to be doing the more creative task of writing up whatever it is that I'm writing.

The idea; I want a service, where I can just click on a URL and automatically convert it to a (MarkDown) link, automatically looking up the web page from the URL to get the title of the page.

Turns out that its pretty simple. I set up a Service in Automator, which receives selected text, and output replaces selected text.

All the Service does is run the following Ruby shell script;

require 'open-uri'
require 'nokogiri'
    
ARGF.each do |f|
  doc = Nokogiri::HTML(open(f))
  print "[" + doc.at_css("title").content.gsub(/\s{2,}/, "") + "]" + "(" + f.strip + ")"
end

To make it work, you will need the Nokogiri gem installed in your System Ruby. (Nokogiri can be straightforward to install – it can also be a complicated mess, so the instructions are outside the scope of this blog post.)

Obviously, there is room for improvement on this. For starters, it seems like overkill to pull a whole web page HTML and then to use a whole HTML/XML parsing tool like Nokogiri just to get a page title. (readline seems like it could be useful here.) It would also be nice to extract a URL from a selected piece of text – that is, turn it into a service that could be used on a selection of text with multiple URLs in it. And it would also be nice to detect URLs that are already either HTML or Markdown links and ignore them.

But as a starting point, it does the job.