Archive for the ‘Clicky’ Category
Count your chickens while they’re still hatching
About 3 months ago, Google announced that all users logged in to their Google account, their search terms would be hidden from the referrer string when they clicked through. They say this is for privacy reasons. That’s great, but I’ve said a thousand times, search analytics is one of the best reasons to run analytics in the first place so it sucks for site owners.
Anyways, the way our code worked, this resulted in a blank search string:

This has led to a lot of confusion and many people thinking Clicky is broken, because they weren’t aware of this change on Google’s end (understandable).
Google updated their Analytics product to show the search as (Not provided) if it was blank. We’ve decided to do the same thing, although we are labeling it as [unknown] instead:

The main difference though is that this will now show up in your main searches report too. Previously blank searches just weren’t logged in there. Now you will see an item for [unknown], and unfortunately it’s probably pretty high on the list. For our own stats on getclicky.com, it’s the #3 search term, representing about 1 out of every 6 searches.
I wouldn’t be surprised if Google made this the default for all searches on Google sites within the next year or two. Hence I say to you, count your chickens etc, because that will be a horrible day for site owners the world round.
At the very least however, I am happy to know that Google is not providing a back door to their own analytics product that allows them to log the searches but not anyone else. That would just be evil, not to mention anti-competitive.
Load balancing the load balancers
Load balancing your load balancers sounds a bit ridiculous but yes, that’s what we have to do to scale.
For about three and a half years, we’ve had a single pair of load balancers. One active, one in hot standby mode to take over if the active one goes offline. It’s worked really well, but they were reaching capacity. I’ve pulled all sorts of tricks out of my hat to reduce excess load, but the time finally arrived where new hardware was the only option left.
So, we just added two more load balancers to the equation. But these new ones are twice as powerful as the old ones, and instead of active/passive pairing, they’re both active. This means we have a total of three active load balancers, and these are load balanced with DNS round robin, with DNS monitoring/failover on top of that to quickly and automatically remove one from the pool in case it has a problem. The old ones act as a front for our entire service with a bunch of virtual services running on them, and hence remain paired (keeping them in sync would otherwise be a nightmare). The new ones are just for additional tracking capacity and nothing else though, so this setup is great for us.
Speaking of tracking capacity, this just quintupled it to ~25,000 pageviews/second. Just in time too, because last year, online activity on Black Friday and (cringe) Cyber Monday sent a ton of extra traffic our way and knocked us offline for a bit. This year, we’re ready, so bring it!
Something else we just changed today is that the static resources for our web site are no longer hosted on static.getclicky.com. A problem common with many trackers is that their domains are on blacklists of ad-blockers and other privacy software. These days, almost weekly we get an email from someone saying our site looks funny. This is because their browser, or their OS, or their ISP, or who knows, is blocking static.getclicky.com from loading anything, so the stylesheet doesn’t get loaded (nor any of the images, javascript, etc).
But we’ve had a secondary domain, staticstuff.net, that we’ve used as a generic CDN domain for our white label service for a while now. I had to change a few things around and get SSL running on it, but now that that’s done, all assets for our web site (except the tracking code itself) will load from this domain instead. It points to the same servers, but it should bypass 99% of these blacklists because they are almost always based on domain name.
First world problems
Load balancing your load balancers sounds a bit ridiculous but yes, that’s what we have to do to scale.
For about three and a half years, we’ve had a single pair of load balancers. One active, one in hot standby mode to take over if the active one goes offline. It’s worked really well, but they were reaching capacity. I’ve pulled all sorts of tricks out of my hat to reduce excess load, but the time finally arrived where new hardware was the only option left.
So, we just added two more load balancers to the equation. But these new ones are twice as powerful as the old ones, and instead of active/passive pairing, they’re both active. This means we have a total of three active load balancers, and these are load balanced with DNS round robin, with DNS monitoring/failover on top of that to quickly and automatically remove one from the pool in case it has a problem. The old ones act as a front for our entire service with a bunch of virtual services running on them, and hence remain paired (keeping them in sync would otherwise be a nightmare). The new ones are just for additional tracking capacity and nothing else though, so this setup is great for us.
Speaking of tracking capacity, this just quintupled it to ~25,000 pageviews/second. Just in time too, because last year, online activity on Black Friday and (cringe) Cyber Monday sent a ton of extra traffic our way and knocked us offline for a bit. This year, we’re ready, so bring it!
Something else we just changed today is that the static resources for our web site are no longer hosted on static.getclicky.com. A problem common with many trackers is that their domains are on blacklists of ad-blockers and other privacy software. These days, almost weekly we get an email from someone saying our site looks funny. This is because their browser, or their OS, or their ISP, or who knows, is blocking static.getclicky.com from loading anything, so the stylesheet doesn’t get loaded (nor any of the images, javascript, etc).
But we’ve had a secondary domain, staticstuff.net, that we’ve used as a generic CDN domain for our white label service for a while now. I had to change a few things around and get SSL running on it, but now that that’s done, all assets for our web site (except the tracking code itself) will load from this domain instead. It points to the same servers, but it should bypass 99% of these blacklists because they are almost always based on domain name.
Google search rankings :D
Considering how much Google has been playing the privacy card recently, I’m surprised they do this, but it turns out the referrer string for Google searches typically includes a variable in the URL, cd, which signifies the approximate ranking of the link someone clicked on to get to your site for a search term. e.g. 1 would mean your page was the top result.
We’ve had requests to parse this data. As of about 20 minutes ago, we are now doing just that! (Pro+ account required).
Note that this is passive, in other words, we only have the data that Google gives us as people click through on search results. We’re not scraping SERPs, and we only have this data from Google because they’re the only ones who do this. (If any other major engines do it too, let us know and we’ll try to add it). If you want scraped results, you should use our SheerSEO integration.
The screenshots below show how we report this data (5 different places – Spy is my favorite). Keep in mind the data shown in these screens is from our own stats and it’s only been live for a short time so there’s only a few results logged so far. But I took a peak at some super high traffic sites to see what the reports looked like there, and it’s awesome.
Also, the ranking numbers you see for any given search are the average of all searches for that term. Not all visitors see the exact same search results! So if you had two people with a search and one saw it at position 4 and the other at position 5, the number reported would be 4.5.
First, rankings integrated right into the main search report:

But of course there’s also a report dedicated to just rankings, and in this case, they are sorted from best to worst rank (menu included in the screen so you can see where to go to get this):

We integrated it into the keywords report too, so you can see your average ranking for any given single word. Note that the ranking numbers in this shot aren’t accurate, since we’re dividing the sum of all rankings for any word by the total number of searches for that word on that day, and since we only have ranking data for 20 minutes so far today, the divisor is out of proportion. Come tomorrow, it will be correct.

We also added it to the dashboard search module:

And last but certainly not least, we put it in Spy, which is my absolute favorite. As live searches from Google stream in, we’ll show you the ranking right there (if it’s included in the referrer string). You might also notice we’re now showing the search in the same way we used to with the old version of spy, where it’s a separate string instead of just the full referrer string which is harder to read searches from:

I freaking love this and hope you do too!!
Here’s what’s been happening
It’s been a rough week. I wanted to explain what has been happening recently with our CDN, and talk about all of the problems we’ve had with CDNs in general. If you can stomach a novel, you’ll discover the good news that it’s been resolved to the point where we don’t foresee any further issues.
The quest
In June, we decided to move away from our home brew CDN and get a real one, because we were outgrowing it and it was becoming a real pain to manage amongst other things.
The main requirement was that we needed support for HTTPS with our own domain name (static.getclicky.com). There are surprisingly few CDN’s out there that offer this service. Most of them only let you use a generic sub-domain of their CDN’s domain, such as omg-secure.somecdn.net. This is fine the assets on the CDN are only for your web site, but that obviously is not the case with us.
Literally the only two we could find that offered this feature without signing over my soul and first born child were CloudFlare and MaxCDN, so we decided to test these out. We also wanted to try one of the enterprise level ones, just to see the difference in performance. For this we chose the 800lb gorilla that is Akamai.
MaxCDN offers HTTPS for $99 setup + $99/month, on top of the normal bandwidth costs. Very reasonable. The service was perfectly fine, but they only have locations in the US and Europe. This is definitely a majority of our market but we wanted Asia too. Well, they do offer Asia, but you have to upgrade to their enterprise service, NetDNA, for considerably more money. It was still less than what we were paying for our home brew CDN though, so I decided to try it.
This was one of the worst days I’ve ever had. I didn’t know when the transition was occurring, because I had to submit a ticket for it and then just wait. When they finished it, they let me know, but they messed up the configuration so the HTTPS didn’t work. (They forgot the chain file. If you know how certificates work, that’s kind of important). It was several hours before I realized this however, because DNS hadn’t propagated yet – I was still hitting their old servers for a while, which were still working fine. Once I realized there was a problem, the damage had already been done to anyone who was tracking a secure site. Not to mention it completely broke our web site for our Pro+ members, since they get HTTPS interface by default and none of the assets were loading for them. I immediately emailed them to get it fixed, meanwhile I pointed the domain back to our old CDN so HTTPS would work in the meantime. But they never actually got it fixed. I don’t know what the problem was, we had a lot of back and forth, but it was clear this was not going to work.
Next was Cloudflare. I’d met the founders at TechCrunch Disrupt the previous September, they’re great. Thing is, they’re not technically a CDN. You point your DNS to them, and then all of your site’s traffic passes through their network. They automatically cache all of your static resources on their servers, and then accelerate your HTML / dynamic content. Accelerating means requests to your server pass through their network directly to speed them up, but they don’t cache the actual HTML – it just gets to you faster.
All in all it’s a fantastic service, and I’d be all for it, but they didn’t (and still don’t) support wildcard DNS – which is another do-or-die feature for us because of our [white label analytics] service. But their rock star support guy, John, told me they could setup a special integration with us where we could just point a sub-domain to them to act as a traditional CDN. Well, it was worth trying because there weren’t any other options at this price level, especially since HTTPS only costs $1/month, and they have servers in Asia too. It seemed too good to be true really. How could they be doing this for such a great price and have such good support? I’m pretty sure John doesn’t sleep, no matter what time I email him I have a reply in minutes it seems.
Anyways, the service worked great. We had it live for a week or two. At some point there was a problem that caused us to move back to our home brew CDN, although I don’t recall what it was exactly. But overall I was happy and planned to test it again in the future, but I still had Akamai to test.
Akamai is what the big boys use. Facebook, etc. I knew it was good, but also expensive. However, I figured it was worth it if the service was as good as I expected it to be. They literally have thousands of data centers, including South America and Africa which very very few CDN’s have, and my speed tests on their edge servers were off the charts. Using just-ping.com, which tests response time from over 50 locations worldwide, I could barely find a single location that had higher than 10ms response time. Ridiculous to say the least.
They gave us a 90 day no commitment trial to test their service, which was appreciated. Their sales and engineer team were great. Very professional, timely, and helpful. But man did I hate their control panel. It was nothing short of the most confusing interface I have ever laid eyes on. I had no idea how to do anything, and I’m usually the guy who figures that kind of thing out.
They walked me through a basic setup, but then the next thing I didn’t like was discovered – any changes you want to make take 4 hours to deploy. What if you screw something up? That’s gonna be a nail biting 4 hour ball of stress waiting for it to get fixed.
I never actually got to really test their service because I was just too scared of screwing it up. A few weeks had passed and I had forgotten how to configure anything. My patience was wearing thin, as our custom CDN continued to deteriorate and I was dealing with other junk too. There’s always a thousand things going on around here.
John from Cloudflare continued to email me to ask how our testing was going with these other services. He was confident Cloudflare would meet our needs. I was pretty sure too, just hadn’t made up my mind yet. But I decided to go back to them because I didn’t have much other choice.
That was early August and, well, we’ve been with them ever since. No problems at all. Great service. Overall I have nothing but good things to say.
But then…
Well, it turns out there was problem. About a month ago, our pull server (that they pull static files from) crashed, and at the same time our tracking code stopped being served.
How could this be? They should be caching everything from this server…?
I emailed them about it and they weren’t sure how the server crashing would affect cached files being served. But unless the cache expired at the exact same time as the crash, something was definitely up.
I did some digging and finally ended up watching the ifconfig output on the pull server, which shows bandwidth usage. We were pushing almost 3MB per second of data out of that thing. Hmm, that doesn’t seem right.
I moved the tracking code file as a quick test, and sure enough, suddenly Cloudflare wouldn’t serve it. Put it back, bam, it worked.
Clearly this file was not being cached. But why? Well, it wasn’t their fault. The problem was the rather strange URL for our tracking code. Instead of e.g. static.getclicky.com/track.js, the URL is just static.getclicky.com/js. This is one of those Why the hell did I ever do that type things, but is too late to change now with almost 400,000 sites already pointing to it.
I emailed them about this and only then discovered that they cache based on file extension, not mime type or cache headers, which we of course properly serve. I obviously wish I knew this beforehand, but wish in one hand shit in the other, see which one fills up first.
The same day I discover this, the server crashed again. I got it fixed quickly, and as a precaution I setup another server and setup round robin DNS to serve both IPs so in case one crashed, there’d be backup. However there was not monitoring/failover on this config, but if DNS serves multiple IPs for a domain, theoretically the requester is supposed to fall back on the second one if the first one fails. I had never actually tested this scenario, but it was just an intermittent fingers-crossed fix until we got something better going.
And then the server crashed again… and I discovered this did not work as I hoped (surprise).
Ok, so we need failover on this. Well duh, I just hadn’t done it yet because I was thinking in my head the best way to accomplish it. Our DNS provider, Dyn, offers this feature, but what I hate about their implementation is the restrictions they place on the TTL (time to live), which is how long DNS will cache a query for. Obviously the TTL should be fairly short for maximum uptime, but the max they allow you to set with failover is 7.5 minutes. And with our level of traffic, this increases our bill several thousand dollars a month which is a bit steep for my liking. Not to mention the expensive monthly base fee just to have this feature enabled in the first place.
The plan
I came up with a plan though. I found another DNS provider, DNSMadeEasy.com, that offers monitoring/failover for very reasonable pricing and no restrictions on TTL. I specifically emailed them about this like 4 times to confirm it would work as I expected. However I can’t just transfer getclicky.com to be hosted there, because we’re in a contract with Dyn (sigh). So I was going to setup a different domain on their servers, and then using CNAME’s, point Cloudflare to pull files from that domain, instead of the sub-domain we were using for getclicky.com.
That was yesterday. Great!, I said to myself. I’ll set it up first thing tomorrow because it’s almost midnight!
And then this morning………. that’s right, the freaking server crashed again. My phone was on silent by accident and I slept in, so for almost 2 hours our tracking code was only being served for about 75% of requests (because DNS IP fallback does work some of the time, it seems). Hence, more problems this morning.
ARGH. I screamed at my computer and just about burned down my house I was so mad. I had come up with a plan that I knew would work and was going to implement it first thing the next day, but the server crashes in the meantime and here I am in bed, blissfully dreaming of puppies and unicorns, unaware of any problems because my STUPID PHONE IS ON SILENT. WHY. ME.
The fix
But the good news is, today, I got this all setup. Monitoring/failover is now live on our pull servers, and they are checked every 2 minutes – so if there is a problem with any of them, DNS will stop serving that IP within 2 minutes at the most. And the TTL is only 5 minutes, so the maximum amount of time there could potentially be a problem for any individual person is 7 minutes. And we added a third pull server, so at the most this would only affect 1/3 of anyone, and even then, for a maximum of 7 minutes.
(Note: Above I was complaining about Dyn’s 7.5 minute max TTL, and here I am with a 5 minute one. Well, this one’s a bit different because only Cloudflare’s servers talk to it, so the total queries generated are quite small. The real issue is we’re also going to be doing this same thing in order to load balance the load balancers (really?), because we’re adding two more of them this week. Using failover on this is what would be really expensive, so we’re avoiding that by using another DNS provider for it, and we figure we might as well do all of that in one place. Load balancers are stable and reliable, so the TTL will be a bit higher – and even if not, dnsmadeeasy’s pricing is about 10x cheaper than Dyn, so it’s all good).
On top of all that, Cloudflare desperately wants to fix this caching problem on their end too. They are working on a solution that will allow us to rewrite URLs on their end so that their servers will see the tracking code file as something that ends with a .js file extension and hence cache it properly. Once that’s live, even if all 3 of our pull servers were offline, it should have zero impact because that stupid legacy URL file will be actually be cached.
In conclusion
So that, my friends, is as short a summary as I can write about everything we’ve been through with CDNs.
No matter what, know that I value the quality of our service above anything else and will always do everything in my power to make sure it works flawlessly. This has been a horrible week, but as of now the CDN should be near flawless.
I don’t feel like we have earned your money this month (and to think, it’s only the 8th…) If anyone wants a refund, send us an email we’ll happily refund you a full month of service.
Thanks for reading and (hopefully) understanding.
iframe tracking, copying dashboards, Google search encoding, etc
It’s new feature Tuesday!
Better iframe support
A common problem we have is that people can only install the tracking code inside an iframe, but they want to track the parent document, not the iframe. Now I know there are plenty of people who want to track the iframe specifically, but there are way more people in the other camp. So now, by default, our tracking code will detect if it’s in an iframe and use the parent documents URL and title instead of the iframe’s. This is already what most other services do by default.
There is a way to override it though if you are actually wanting to track the iframe on purpose, via the new clicky_custom.iframe property.
Copying dashboards between sites
If you have a bunch of sites, you may have created the most awesome amazing customized dashboard ever. And then you have to recreate it for every site in your account. So fun!
Well, now when you go to your customize dashboard page, there will also be a list of all the dashboards you’ve created for your other sites. One click and bam, that dashboard is now copied into the new site. After it’s copied you can edit it if you want, or just leave as is.
Google search encoding
Some change Google made to their URL structure is resulting in double URL encoding, and you might be seeing searches+like+this instead of searches like this. It’s not just affecting us either, I checked my Google Analytics account (gasp!) and was seeing the same thing. As of today, we now just double URL decode all searches before storing them, so this problem is history. I imagine Google will fix it on their end eventually but patience is not one of my virtues.
Black nav bar
We know some of you don’t like this but we feel it’s important to have these links highly visible and easy to find. If you stick something in a footer, no one clicks the links because no one sees them. Two designs ago, when we actually had a footer, as soon as we moved a bunch of those links into the sidebar we added, the number of clicks each one was getting skyrocketed. I’m talking 10-20x as much activity. That’s a good thing.
But anyways, today, I reduced the padding so it takes up a bit less space, and also removed the position:fixed style rule so it’s not always on the screen, instead it’s only visible when the page is scrolled up all the way. I hope that appeases some of you to a small degree at the very least.
More ways to view hourly data… and more!
We just deployed a bunch of changes to hourly data, amongst other things:
- Goals and Revenue now support hourly data, so you can more easily see your best converting and most profitable times of day. However, we just started doing this today, so earlier dates will not have hourly data.
Anytime we add hourly support for something, it essentially require 24x as much storage space, which is why we only do it for a few types of data (currently: visitors, actions, tweets, clicky.me short URLs, goals, and revenue). If you saw how big our databases were already, you’d cry and then realize why this is necessary.
Hourly averages – there are three new options in the drop down menu for hourly graphs:
- Same day of week average – example, this Monday vs the average of the last 4 Mondays
- Weekday average – Today vs the average of all weekdays (Monday-Friday) from the last 4 weeks
- Weekend average – Same as Weekday average but for Saturday/Sunday only
These are all insanely useful, particularly the first one!
- You can set same day of week average as your default trend comparison in your dashboard preferences, in which case your hourly graphs will also default to displaying this mode. We initially coded in support for weekday/weekend stuff too, but they generated WAY too many queries; there were up to 20 extra pieces of data that needed to be pulled from the database for each item in any given report, and it couldn’t be optimized since there are holes in the date ranges.
Daily graphs default to 28 days instead of 30 days, to more cleanly fit week boundaries. We think you will find this especially useful when comparing vs previous period. An example is shown on the right.
- Compare menu fixes/additions – When viewing daily graphs, the Compare… menu has been broken for a while now. Not sure when it happened but we finally got it fixed. We also added some more options to it that were much needed (revenue, goals, campaigns, pages, and tweets, to name a few).
24-hour time formatting and smarter defaults for new sites
We have finally added a 24 hour time formatting option, so you’ll see e.g. 15:30 instead of 3:30pm. This change should affect everywhere you see time within a site’s reports, but if we missed something, let us know. You can change this setting in your site preferences.
The defaults when registering a new site also just got a lot smarter. Those of you with lots of sites probably get annoyed with how many preferences you have to change every time you register a new site – particularly if you are not on the west coast of the US. Now, any time you register a new site, we’ll grab the following preferences from the last site you registered and make them the default for the new one (which you can of course change if desired):
utm_custom: a new URL parameter to attach custom data to visitors
One feature that gets requested a lot is to be able to set a variable in the URL that could then be attached to the visitor as custom data. This would be particularly useful for things like email newsletters, so when someone clicks through, they can be identified automatically.
The variable name needed to be generic because of our white label program, and since we pictured this being used with campaign activity more than anything else, we decided to call the variable utm_custom (related to Google/Urchin’s utm_campaign etc variables).
You can see full documentation here. Because custom data requires a Pro or higher account (upgrade), this variable will also only be processed if you have a Pro or higher account.
utm_custom is an associate array so you can set multiple key/value pairs on a single page. (It must be an array with at least one key/value pair, or it will be ignored). For example, if you sent a visitor to this page:
http://yoursite.com/landing/page?utm_custom[username]=Bob+Jonesutm_custom[email]=bob@jones.com
utm_campaign=Email+blastutm_content=Oct+20+2011
You would see this in your visitor’s list:

And this when viewing visitor/session details:

We’ve had requests for this countless times over the years so we know many of you will find it quite useful
SHUT. DOWN. EVERYTHING.
- Sites like Clicky will soon be out of business
- Is it time to bid farewell to Clicky?
- With Google Analytics now offering real-time, are startups like Chartbeat and Clicky dead in the water?
- getclicky and chartbeat ought to run for the hills
- Google killing off Chartbeat
- Does anyone think the real-time Google Analytics will … destroy Chartbeat?
- Is this the beginning of the end of Woopra?
- Google attacking realtime analytics services like @chartbeat @woopra with a new free offer
Hmm… did anyone actually read the announcement that Google made today? This isn’t real time Google Analytics, this is a single report in GA that is real time. The rest of GA remains the same. This is more akin to Chartbeat, to be used as a real time compliment to a standard analytics package, rather than a full standalone real time service like Clicky is. But I guarantee you Chartbeat will be just fine, as will everyone else. We’ve all had, and continue to have, plenty of advantages over GA other than real time data.
If anything, I’m glad Google has done this, as it will bring more awareness to the concept of real time web analytics in general. This will inevitably lead to more people searching about it, and we just so happen to have the #1 organic result for this search on both Google and Bing (and hence Yahoo). Everything is going to be just fine!
Recent Comments