Backing Up Your Pinboard Bookmarks
I’m a bookmark kinda guy.
I’ve heard people say they just use search whenever they want to get back to that page they remembered seeing then but which might be useful now. But I don’t want to rely on search. I’ve been burned too many times when I site I know exists no longer shows up in the search results no matter what combination of search terms I use. So now I’m a fiend for bookmarking.
All browsers allow bookmarks to be created and stored locally – for that user in that browser on that machine. Storing the bookmarks in the local browser does not scale. I’ve got my personal computer, my phone, many work computers; and on each there are several different browsers. There is no good way to keeping all those browser bookmarks synced up, or remembering which browser on which machine holds the bookmark I want.
What if I’m at work and I need that bookmark I created at home last night? What if I’m on vacation and using a friend’s computer, or accessing the internet in an internet cafe or a library. What help would my browser bookmarks be then?
The search for a solution
I recognized the problem at least 10 years ago. Occasionally I looked for a web hosted bookmarking service that will solve my bookmark access and availability problems. Many of these services are browser extensions or browser synchronizers; Xmarks is one such service. But browser sync services are tied to specific browser implementations; as such, they cannot fully satisfy my availability requirements.
Two other services were primarily web bookmarking services: Delicious and Pinboard. Delicious is free but supported by advertisements. Delicious has a full complement of staff but has been through three different corporate incarnations, nearly dying at least once.
Pinboard is a fantastic web hosted bookmarking service that lets you organize links using tags. The company that owns the service is essentially composed of one guy, Maciej Ceglowski, who makes his living off the service.
Both Delicious and Pinboard use a tagging model (similar to Gmail labels) to provide organization to your bookmark collection.
I tried Delicious in the summer of 2011. After about a month using the service I confirmed that such a service would solve the access and availability gaps. But rather quickly I got concerned about the dramatically evolving UI, the business direction (Delicious is apparently trying to turn bookmarking into a social networking service), and the lack of obvious revenue stream (other than selling my attention).1
Pinboard, on the other hand, has an obvious revenue stream. You buy into the service for a one time charge (right now this is still under 10 dollars US). You can purchase a subscription to access additional services (something I’ve not found necessary). Pinboard doesn’t search your bookmarks for advertising context—it doesn’t have ads!
After trying Delicious for a while in early 2011, I switched to Pinboard and have never regretted it.
Backing it up
I expect Pinboard to be around for a long time, and Maciej Ceglowski is an attentive administrator and developer, but what if he gets hit by a bus? What if there is an extended outage? My bookmarks are really my field notes for the web, my map of the important parts of the web. I rely on them to help me navigate the web. They are my data. So, I want to maintain backups of my Pinboard bookmarks.
Pinboard has a RESTful web-service API (based on the original del.icio.us API) that allows me to write my own apps against my store of bookmarks. In particular, it allows me to use curl to capture all my bookmarks (including the tags assigned to each).
As a first step, I’ve created a Bash script that runs each evening and downloads a copy of all my bookmarks. I keep the last five backups.
I’m doing this on OS X 10.7, but it could be easily ported to Linux or other Unix variants. Without too much difficulty could be ported (in concept) to Windows.
Overview
I’ve posted the backup script I use to GitHub as part of my collection of useful Bash scripts. (Yes, at the moment (2012-06-16) there is only one script in the collection.) The script has an installer and uninstaller.
The script is scheduled via a LaunchAgent to run each evening. It is installed into my home directory and to execute as my user account. That means it will only run when I’m logged in. For me that is not a problem because I stay logged into the Mac all the time. I could, of course, configure it to run when I’m not logged in, but that’s not a priority for me (see Research Topics).
Each time the script runs, it pulls all my bookmarks; once in JSON format and once in XML format. The script makes a copy of each, giving them a date stamped filename, and then clears out the oldest backups (to avoid filling up the disk with backups).
As they are, these backups are only useful to search in a text editor. I intend to convert them to a HTML format that can be imported by the browser, and also to make a web page that will directly read the backup files and search by tag. See Research Topics.
Installation
To install the script, just cd
into the source directory and run the command:
./install.sh pbuser pbpwd "backupdir"
pbuser
is your Pinboard user ID, pbpwd
is your Pinboard password, and backupdir
is the path to the directory that will hold the backups; that path is relative to your home directory (e.g. "Documents/Backups"
corresponds to ~/Documents/Backups
).
To uninstall the script, cd
to the source directory and run the command:
./uninstall.sh "backupdir"
backupdir
is the path to the backup directory that will hold the backups; that path is relative to your home directory.
The installation will put your Pinboard userid and password in plain text in the backup-pinboard.sh
script, and store in the backup directory; it will set the permissions on the script file so that only your user account can read it. Still, it would be nicer to put the password in the KeyChain, but I don’t know how to do that (yet – see Research Topics).
To check that the script will be scheduled, run the command:
launchctl list | grep net
You should see listed there net.localhost.PinboardBackup
. If you then enter the command:
launchctl list net.localhost.PinboardBackup
you should see some details about the launch agent, including when it is scheduled to run. If you want to change the time at which the script is to be run, just uninstall the script, edit the net.localhost.PinboardBackup.plist, and change the Hour and Minutes values to correspond to the start time you want.
Uninstalling the script does not remove any backups, just removes the script file, removes the Launch Agent plist files, and unloads the Launch Agent from launchd.
How the script works
Pinboard provides a RESTful web api that allows applications to be built to use Pinboard. Anything that can act as a web client can interact with Pinboard. The script uses curl
to interact with the Pinboard web service.
The script retrieves the XML formatted list of bookmarks by doing an HTTP GET with the URI api.pinboard.in/v1/posts/all
, and then again to get the JSON formatted list using the URI api.pinboard.in/v1/posts/all?format=json
.
To avoid misbehaving clients swamping the servers and bandwidth, Pinboard implements a throttle that requires a minimum 5 minute window between api.pinboard.in/v1/posts/all
requests. Therefore the script sleeps for a bit more than 5 minutes prior to making each request. The script also permits curl
one retry, in case of errors, and imposes the same delay between retries.
launchd notes
Launch Daemons and Launch Agents have a session type. It is a bit unclear what the session types mean. When I first created this utility, I used a session type of Background because I wanted the utility to run in the background and not require any UI services or support. But I ran into a problem in which the LaunchAgent would not get loaded when I restarted the system and logged back in. I could not see any errors in the system logs and couldn’t find any documentation that clearly explained the problem.
I switched the session type to Aqua and now the LaunchAgent is always loaded when I log in.
I’ve come to believe that SessionType is used by the system to mean a kind of load context. The system will invoke launchctl to load agents and daemons, and it will do so multiple types in different contexts. For example, when the system starts up there is a point when launchctl is used to load agents and daemons that have a session type that corresponds to system startup; when the login window is displayed, there is a session type that corresponds to login or prelogin that is used to load other agents or daemons. Once a user is logged it, I believe that the system loads agents and daemons that have the Aqua session type.
However, I don’t know when they system loads agents and daemons with the Background session type. I’d like to find out.
Final word
I hope you find the script useful, and that you let me know of any improvements you make yourself, or bugs you find. It is probably best to use GitHub to communicate on either topic.
Research topics
These are some next steps I’d like to take to improve the utility of the backups, and to improve my utilization of Pinboard.
-
Run the launch agent whenever the system is running, not just when I’m logged on.
-
Wake the system up if is asleep when it’s time to do the backup.
-
Translate the XML into HTML that browser can import.
-
Create a local webpage with basic bookmark lookup by tag that uses the JSON data.
-
Understand what the subscription to extra Pinboard services would buy me.
-
Store Pinboard password in the Keychain.
-
Delicious might have changed its business model and revenue strategy since I last looked at them. If you are interested, I recommend you take a look and assess them for yourself. ↩