Creating a Google Custom Search Engine (CSE) from ReadItLater bookmarks

Photo by See-ming Lee on Flickr licensed under Creative Commons
The following is a 10 step process for creating a Google Custom Search form (CSE) from your personal list of saved bookmarks at ReadItLater.

This will enable you to search the full text of articles you have previously read and bookmarked assuming the web page is not removed, and automatically pick up future articles.

It takes about 20 minutes to do and requires merely good familiarity with Office apps and no programming skill.

The process should also be fairly similar if you are using another social bookmarking service. 

  1. Obtain a free API key from ReadItLater: there is an  API request form on ReadItLater and it will be sent to you by email. You only have to do this to get your complete archive.
  2. Export your entire bookmark list archive from ReadItLater using the command https://readitlaterlist.com/v2/get?username=name&password=123&apikey=yourapikey.  With the exception of the first time you do it you will normally just export your bookmarks since a certain date using the additional parameter &since=1245638446 where the number is a Unix style date-time format. For example: 1290795659 is 2010-11-26 18:20:59Z.
  3. Save the resulting file as a text file to your desktop (which will be one continuous string with no line breaks).
  4. Insert line breaks by loading the exported file into Word and search and replace “item_id” with ^l (the special linebreak character in Word).
  5. Load the resulting file into Excel as CSV using the comma values in the files as separators.
  6. Fix up the url slash characters in Excel by deleting other columns and then converting \/ to / again with a search and replace
  7. Remove quotes on the urls by search and replacing :” with nothing and then replacing ” with nothing.
  8. Assuming you have a Google user id go and create your free Google CSE at http://www.google.com/cse/manage/create
  9. Automatically adding bookmarks from Read It Later to a Google Custom Search Engine using RIL's Digest functionEnable automatic adding of future links by registering for ReadItLater’s digest service (current $5 p.a.) which puts all your recent bookmarks on one page (for an example of a Digest see Drively’s digest here at http://readitlaterlist.com/d/Drivelry) and set your Digest on ReadItLater to Public (if you don’t do this the Google crawler can’t see it). Add this digest url to your CSE using the Sites link in the Google CSE Control panel (CSE settings screenshot to right – click to expand) and select “Dynamically extract links from this page and add them to my search engine” and “Include all pages this page links to”.
  10. Paste the resulting HTML generated into a web page you control. That’s it! You can see an example of what a Google CSE looks like here.

You can of course also export your personal bookmarks from your browser and also add them to your CSE. And you can make your CSE private or public as you wish.

Steps 1-1 are basically about getting your old RIL archive. If steps 1-7 look a bit daunting you can even skip them by just adding your Read It Later Digest page to the Google CSE and you will have all your articles added from that point in time (just not your older RIL archive).

Google’s limit for the number of individual urls you can add to a free CSE is 5000. For what it’s worth I add about 900 or so new urls on ReadItLater per year (which sounds a lot in aggregate but sounds less at 75 new urls per month) most of which are interesting links which come from people I follow on Twitter. 

For some other ideas and techniques see this article on curating web content.

This article filed under the following 'Interest' categories (click category for more) Kewl

Like or dislike the work we're doing?   Please let us know by making a micro donation or just give us feedback by commenting. This blog implements a DOFOLLOW policy ('NoFollow Free') i.e. links are welcome in the text of the comment assuming they relate to the post (comments moderated).

Make Drivelry come to you. Email, RSS, Kindle and Twitter versions available on the right hand side HERE.

Article posted by @Drivelry on November 27, 2010

Filed under topics (click for more articles on that topic): , , ,

More Drivelry articles