Search

mpexo

  • This site is proudly listed as a mobile blog on mpexo.

Count per Day

  • 47Visitors today:

Recent Comments

    Testimonials

    “I just wanted to let you know how much we appreciate the work that you guys have done in assisting us… not only in working through our issues but in keeping us informed along the way. It really is a pleasure to work with such a professional organization that obviously takes pride in their work. I look forward to our continued successes.”



    David Bloom
    Big Apple Blog

    UserOnline

    Todays Clicks

      Posts Tagged ‘Clicky’

      Nothing exciting for a while

      posted by clicky 5:33 PM
      Monday, April 5, 2010

      We’re working on massively improving our infrastructure for the next month or so, which we hope will greatly improve the speed and reliability of our service. During this time, there will likely be few, if any, new and exciting features.

      There are so many awesome ideas we have for Clicky but we’ve reached the point where our existing setup isn’t quite cutting the mustard anymore. Nothing is more important to us than the quality of our service, so we’re going to be focusing on that for a bit to ensure we can continue growing well into the future with as few problems as possible.

      We’ll be upgrading our tracking servers with a bunch more RAM and super fast hard drives, which will help to eliminate the lag that occurs sometimes during peak times (8am-2pm USA PST) when these servers are getting blasted with over 1000 hits per second. We’ll also be adding more redundancy to our main database and web servers, and splitting off Spy onto its own dedicated server to speed up the web servers even more. You wouldn’t believe how much load Spy adds to our entire system – if you knew how much, you would probably cry.

      We’ll also be doing some more work on our database servers, as I mentioned in our last post. I didn’t quite finish everything I wanted to last time I was in our data center, so some db servers may go offline here and there. The downtime should never be more than an hour or two, however. We always tweet live updates during server maintenance, so be sure to follow us on Twitter for up to the minute updates.

      So that’s what we’ll be doing the next month or so. It’s a lot more work than it sounds like, but when all is said and done, I think everyone will be really happy.

      Problems this morning

      posted by clicky 5:32 PM
      Monday, April 5, 2010

      At approximately 4am PST, two separate database servers (db1 and db16) had RAID failures that caused file system corruption. They kept trying to process traffic but Linux had switched part of the file system to read only, so no traffic data was actually being written to the hard drives. This problem lasted from approximately 4am to 7am PST. Unfortunately, this traffic data is gone and unrecoverable.

      We have alert systems setup so that when a significant event occurs, such as a server going offline or a RAID failure, we are alerted immediately. Unfortunately, the RAID notifications on a few servers were recently disabled while we were performing some maintenance, and wouldn’t you know it, db1 and db16 were among those servers. Because of this, we weren’t notified of the problem, and didn’t discover it until we woke up to a flood of emails in our inbox this morning.

      There were no problems on other servers that we could find, but if you have a site on a server other than db1 or db16 and it’s experiencing issues, please leave a comment here explaining what’s happening. Be sure to include the site ID.

      We apologize for this issue, which we take very seriously. The RAID notifications are all back online, and we will be sure to always re-enable them immediately after this kind of maintenance in the future. Leaving them disabled was just an honest mistake.

      One final note, these RAID failures occurred at the exact same time on two different servers. This happened once before as well, although it was three servers instead of two, and it didn’t cause any corruption last time. This seems like very strange behavior to us, and we’re not sure what could possibly cause such a thing to happen to separate servers (that don’t talk to each other) at the exact same time. If any sysadmins out there have any ideas, please share.

      Blog WebMastered by All in One Webmaster.

      Switch to our mobile site