Fetch all tags for a specific tumblr blog

January 7, 2015

Lets start off with me admitting that I haven’t used tumblr a lot, and so I’m not overly familiar with the tumblr-api. Anyway, in a client project I was tasked to build a theme in tumblr. One of the requirements was to list all the tags used in the blog. Because in all of my infinite wisdom I knew that one thing a blog provider should provide is a way to list all the tags used.

As I looked through the custom-theme documentation, I could find no reference to listing out tags. A couple of google searches confirmed that this is not available in the theme-api, and if you wanted to list all the tags you’d have to go through the API and load each and every one of the posts in the blog, fetch the tags and display them.

I came over a couple of tag-cloud scripts, sadly these scripts came as a package deal, you get the styling and markup that the provider chooses, I guess the reason why they’ve done it this way is to provide an easy copy-and-paste solution for the average non-developer Joe. I pretty much came to the conclusion that it was easier to build one myself, a tag-fetcher that doesn’t implement any design, the only thing it does is return an array of tags.

After briefly skimming through the documentation of the Tumblr API, I decided to go with the JSONP request. It’s pretty straightforward and these were the parameters that I needed.

http://
[Tumblr username]
.tumblr.com/api/read/json?
callback=[callback function name]
&num=[Limit, max 50]
&start=[Post offset]

I started out building the script, the only parameter that the user had to send in was the username. After the class is initialized, you can hook in on the event ‘ready’ and then call the method ‘load()’. The script will add a semirandom global method as a callback for the JSONP script, load the first batch of 50 posts in order to get the total amount of posts, and then add all the other pages once we’re able to calculate the amount of pages (totalPosts / postsPerPage = amountOfPages).

Every page is concatenated to a single array, once the length of this array is equal to the amount of posts, all the script-tags are removed, and we fetch out the tags from the post objects in the array. Seeing as I have to go through each of the posts, I decided to make a method that checks if a tag exists, if it doesn’t it’ll be added to the array, if it already exists, I bump up a hits-counter for that tag.

/**
* Adds a tag to the taglist (if tag already exists, add to hits counter)
*
* @param  {string} tag The tag
*
* @return void.
*/
this.addTag = function(tag) {
    if(this.tags[tag]) {
	    this.tags[tag].hits += 1;
		return;
    }

    this.tags[tag] = {
        'tag': tag,
        'hits': 1
    }
};

Once the tags have been sorted I run all the callbacks in the ‘ready’-hook, sending in one parameter, an array of tag-objects. The objects look like this.

[
	{
		"tag": "Javascript",
		"hits": 3
	},
	{
		"tag": "php",
		"hits": 1
	},
	{
		"tag": "coding",
		"hits": 5
	},
	[...]
]

The biggest downside of running the JSONP is that there will be a lot of requests, seeing as every batch of 50 posts will be a single request, and every request takes about a second. If I fetched all the pages in the correct order (i.e. fetching the next page once the first has been loaded completely) a blog with 1000 posts could potentially take 20 seconds to load — which is far to slow.

This is the reason why I add all the pages at once to the document instead of waiting for each page to respond before I fetch the next, the browser can make multiple connections at once, and seeing as I don’t care about the order of the posts, I only care about the tags, the asynchronous loading of batches doesn’t matter (in other words, the posts in the internal post array is not sorted by date, but instead sorted in whatever order they got loaded). As I earlier stated, the first page, items 0-50, will be loaded separately, because in the first request we get the total amount of posts, once we have that figure, we can load all the other pages at once.

How to use it

Fetching all the tags for a blog is pretty easy, and can be done in only a couple of lines of code.

// Create a TumblrTags object, set the username of the blog.
var tagFetcher = new window.TumblrTags(username);

// Hook in to the ready event.
// The anonomous function will be fired when all the tags are fetched.
tagFetcher.on(
	'ready',
	function(tags){
		// Do something with the tags
		console.log(tags);
	}
);

// Load posts
tagFetcher.load();

In order to keep the script as portable as possible, I had to pollute the window scope. It’s also worth mentioning that the script doesn’t have any third-party dependencies (i.e. no need for jQuery).

Here’s all the code on Github and here’s the demo, test it with the tumblr username “engineering”.

Tags