I elected to take the earned credit because I had a feeling the servers would just not be up to the load. As it turns out, I was right, as it took numerous tries just to log in, get to the page with the button and then many more tries to get the button to register my selection. But, as I do web system programming for a living and I’ve spent considerable time dealing with high traffic issues, I wanted to offer some observations that might be useful for next year.
First, while I had a hard time connecting, when a request did connect, the response from the server came back very quickly. This tells me that bandwidth to/from the server was probably not an issue. Most of the time, the error was a timeout of some sort. I saw both connection timeouts and what looked like server request overload issues (the funny little “something happened, here’s an error code” responses I’m sure many people saw.
Based on what I saw, my suspicion is that the problem this time was more related to connection and request resource starvation than with bandwidth, or server CPU capacity. Both of these things are generally tunable parameters in most HTTP servers, but the default, out of the box settings are usually very low. I’m not familiar with SparkFun’s software configuration so I can’t make specific recommendations but, typically, inbound connection requests are limited by the number of threads set aside to process them. There are similar issues with incoming connections that relate to the amount of RAM buffer space set aside to manage each connection.
Of course, I could be off base here and perhaps the web staff at SparkFun had already configured all these setting the max possible values. But, in my experience, I’ve found that it can be fiendishly complex to get this right, as there are numerous interdependencies that have to be balanced out, such as deciding how to budget available RAM (for threads, or for connection buffering?) So, before any of you jump on the poor admins at SparkFun, just realize that what they’re trying to pull of with free day is not something I’d relish trying to do.
Wayne
wholder:
First, while I had a hard time connecting, when a request did connect, the response from the server came back very quickly. This tells me that bandwidth to/from the server was probably not an issue. Most of the time, the error was a timeout of some sort. I saw both connection timeouts and what looked like server request overload issues (the funny little “something happened, here’s an error code” responses I’m sure many people saw.
My experience was slightly different. I noticed that when it was sending, it came very quickly. This doesn’t mean the entire connection, several times I saw a file come in spurts, or I would very quickly get the first 2kB of a page (roughly, through the tag). This tells me that the problem was generation of dynamic content. The static, cached stuff tended to come quickly, when you could connect, but the dynamic stuff timed out.
Also, there were times when it would appear that I was unable to connect, but in fact the entire HTML document had been sent, yet I was staring at a white screen. Per some braindead spec, there are places where a browser will halt rendering and wait for JS to run. In this case, even though it has the markup, the browser won’t render until it has an external JS file, which it can’t access.
As a professional web application developer I would have off-loaded the JavaScript and CSS files to a CDN. This way the SparkFun servers would not have to carry that load serving those files. A CDN is more efficient at distributing files like this.
Next year you might also want to consider putting the Free Day quiz portion in the cloud. This will allow you to use resources of large companies like Amazon (EC2) and Microsoft (Azure).
I can confidently say that none of the issues experienced during Free Day 2011 would have happened if the two methods described above were used.
Except that one of the points of “Free Day” was to test the SPARKFUN servers…not L3/Comcast or A.W.S. (EC2 for instance, which I see is eluded to earlier).
All of this is/was mentioned on the “Free Day” page. If only the bulk of the whiners (i.e. newbs, single-posters, etc.) would’ve read that far down the page, they’d have known that, but noooooo…
I’m not whining as I made out with a lot of free day cash.
I was merely pointing out that there are better ways at making your infrastructure better. I think this test showed what they’ve done so far resulted in failure.
Honestly, if they want to make their software and website better when it is running normal, they should at least off-load the static content (JavaScript and CSS references) to a CDN. It will save them on compute time and bandwidth. Not only that, but their customer’s page load time will be faster. I see their images are hosted from a separate “static.sparkfun.com” server, but that is still at their location. If they choose a CDN, the files are distributed to multiple locations around the world closer to their customers. If you don’t want the CDN, at least put the JS and CSS files also on the “static.sparkfun.com” server too. Also, I’m not sure if you are using HAProxy, but you might want to consider it. Stack Overflow uses it (http://blog.stackoverflow.com/2010/01/s … iguration/)
I don’t expect anyone here to know what I’m talking about because most of the people here probably don’t do enterprise level software development, but I thought I’d put in my two cents incase their programmers didn’t know about this.
I understand that a CDN or a cloud would provide better performance, but there’s something to be said for having control over the servers you use. Beyond that:
The cloud option: offsetting the quiz to the cloud isn’t a test of how well your shop works, even if in the cloud, it’s a test of how well your quiz works.
As for a CDN, most of the time they don’t have a problem hosting their own static files, so it may not be cost-effective to pay for a CDN.
I’m guessing the reason they don’t shove the JS and CSS on static.sparkfun.com is because they use HTTP GET strings to control versioning. There are a lot of different ways to do this, but most of them rule out static. (Personally, I’d use a reverse-proxy set-up, that uses HTTP Location: to direct to a static file, or just embed the version in the path, and simplify the thing a great deal).
And I’d be surprised if they’re not using HAProxy or something similar.
… Oh God. I’m so ashamed of myself. I’ve failed to become involved in the community here. But worse than that, instead of learning electronics, I’ve learned web-adminery. I feel like I should go build a robot.
… also, my personal site isn’t proof at all that I know what I’m doing as an admin. It’s in that ``It sucks right now, but I was going to add X and do Y, but got busy’’ state that things tend to be in for way too long. That it’s hosted on a residential AT&T connection doesn’t help.