Jump to content

  • entries
    15
  • comments
    69
  • views
    38,793

System Woes and Warts - A little background

Sign in to follow this  
Tom Allensworth

1,144 views

Over the last six months there have been a number of days that resulted in our community members pulling their hair, gnashing their teeth, and calling us (and me specifically) all kinds of names because of system outages, time outs, ad nauseam. I will be the first to admit that we have been struggling from time to time with the system. But, I am getting ahead of my self. Let me review a little history.

 

When we first setup AVSIM in our collocation service (known as a COLLO service), the architecture consisted of a single MySQL database. That serviced primarily the file library and was installed on JUPITER, our library and email server. When we brought online our new forum system in 2006 (the first generation of the one you see today), we decided to host it on a second server, MARS. But because we needed a MySQL database to run it on, and our previous forums were simply text based systems, we decided to simply connect the forum to our existing MySQL database on JUPITER. That worked for a long time; until the last six months to be exact. Seven years of successful service via an architecture that we did not anticipate would have a load placed on it that we see today.

 

Let's set that aside for a moment and address another issue. As I write this, over the last 24 hours, we have had over 670 spam attempts from one country alone, and not even one of the "biggies" in terms of spam sources. We have a number of functions in place that block spammers, including the service known as "Stop Forum Spam" or SFS for short. But, that doesn't prevent attempts to register - in order to block spam, we need to at least get a bit of information before they are blocked (email, IP, etc.). That takes server time to accomplish - server time that is taken away from you as a community member. Add all the spam attempts over a 24 hour period and that is a tremendous amount of server time being used to protect you and this community.

 

Okay, so, one more... if you look at the bottom of the main forum index page, you will see a breakdown of members and "guests" online. "Guests" take three or more forms. The first is that they are indeed users who have not registered or have not logged in and are here viewing the forums and its content. The second form of "guests" is that of search engine search attempts. Every time someone uses Bing or Google or any of the couple of other dozen search engines are used to find something that can be found on AVSIM, the system registers the search effort as a "guest" visiting AVSIM, which indeed it is.

 

The third form is that of "spiders" or "bots" that sweep through our servers looking for data to put in search databases, or for other, not so positive acts; like collecting email addresses and any personal information they can find. That process consumes huge amounts of server time and Apache connections - again stealing those from our legitimate members. There are hundreds of these and many of them are not "friendly" like Google and Bing are. The fact of the matter is that we have processes in place to prevent the bad bots from dragging the system to its knees. However, as in warfare, there are bots that masquerade themselves, use well thought out "spoofing" methods and successfully avoid or circumvent our protective measures.

 

Finally, there are features that we employ that do add significant load to our servers. Take TapaTalk as an example. It multiplies server load by 5 times! We have been working with TapaTalk on this, but if no solution is found, we might have to remove it entirely, and never look back again.

 

Now let's go back to the architecture... Because we rely upon one database between two servers, we have a bottle neck that exists between the two. That bottle neck is the interface required to go from MARS to JUPITER in order to connect the forum with the MySQL database server. In 2011, 12 and most of 2013, that was adequate. In fact, we rarely, if at all, suffered problems because of the single MySQL implementation. That has obviously changed. I guess you could say that we are suffering from our own success. We have out grown our configuration and we need to change it to prevent further "throttling" of performance for our registered community members.

 

So, how are we going to fix this? We are working to bring on MySQL experts and Linux / Server / Apache gurus to focus on the issues and we are daily taking remedial steps (like blocking all but a handful of guests at any one time) to make the AVSIM experience as positive as it can be for you.

 

It is our hope that your patience will withstand the time outs and outages, and that the outcome will be more than sufficient to make all these to be nightmares of the past.

  • Upvote 18
Sign in to follow this  


16 Comments


Recommended Comments

Tom, good luck with all the upgrades and I hope you find a suitable load-balancing solution that will serve AVSim well.

Share this comment


Link to comment

Thumbs up.  A little explanation (yes, I was one of those tearing their--rapidly diminishing--hair) goes a long way towards making for calmed nerves and increased patience.  Hopefully, one day we will look back on this and laugh.

Share this comment


Link to comment

Glad I didn't add to the woe this time and bleat about the issues before understanding the problem.

Ickie over at SOH also is wailing about this - they also had a DDOS attack based in France.

Share this comment


Link to comment
"It is our hope that your patience will withstand the time outs and outages"

 

Some days* its great, some days its blooming awful... but, its always worthwhile, so I take what I'm given - with thanks.

 

Any improvements will be most welcome however.

 

* That can actually be hours, or minutes.

Share this comment


Link to comment

Just don't give the Apache and Linux gurus the root password!  ;)

 

That ain't going to happen. Been there, done that. Remember?

Share this comment


Link to comment

Sorry Tom, but one more question.  Will the RSS system get up and running anytime this year?  I really like that feature, especially when one can tease thru topics w/o needing to do a full login for prescreening. Ultimately might help w/ high demand on your servers?

Share this comment


Link to comment

I server admin some very large sites with a huge amount of traffic, Have you considered implementing some clever cache techniques? If the site is based on a resource intensive server framework such as Plesk of Cpanel then consider nginx instead. I have also implemented the Varnish cache which serves up cached content direct from memory and is refreshed every 5 minutes.

 

Cloudfare is also another unique service which can have some dramatic results on speed and content deliver. If not already then the site should be using a content delivery network to deliver images, javascript and CSS from local unique servers based in the cloud.

 

Using some clever techniques we have managed to keep a social network which has some 100,000 unique visitors a day humming along at an impressive pace. If you need any help happy to assist :)

Share this comment


Link to comment

Yea, I've noticed the degradation as well but I appreciate all the hard work and explanations you put forth.

 

FYI - I was trying to download from the library today with no success after several attempts.  Could this be related in some way?

 

Good luck.

Share this comment


Link to comment

 

FYI - I was trying to download from the library today with no success after several attempts.  Could this be related in some way?

 

No, totally different servers. What was the response from the system?

Share this comment


Link to comment

Until I started running my own forum, I had no idea how much traffic comes from spambots and content scrapers, it accounts for about 70% of all traffic! And I now realise you can spend all day trying to combat it.

 

I've been forced to block large IP addresses ranges as the .htaccess level (mainly from Chinanet Fujian it has to be said, lately)  - is that something that Avsim can consider or is already doing (btw I know lots of .htaccess blocking creates a large server load, but I am sure there are alternative methods available since you have dedicated servers)?

Share this comment


Link to comment

Layer 4 blocking makes more sense than htaccess, if you have access to it, which most larger sites should.  There's a lot of options here still.

 

-stefan

Share this comment


Link to comment

Whatever it is you have done Tom, for the past 4-5 days AVSIM has been working like any other website/forum. Well done.

Share this comment


Link to comment

Yes, but once we are in, it is good.  Thanks.

 

Ray

Not very good at all here and hasn't been for a very long time.  I have a very high bandwidth connection but I'm afraid this site is completely 'server bound' at this point and I hope it's restored sometime soonish.  I'd rather see less dedication to new features like blogs and advertising and get the blithering thing working quickly once again.  I click on a post and it's what 15-20 seconds before it loads?  Never used to be this way ;o(

Share this comment


Link to comment

Not very good at all here and hasn't been for a very long time.  I have a very high bandwidth connection but I'm afraid this site is completely 'server bound' at this point and I hope it's restored sometime soonish.  I'd rather see less dedication to new features like blogs and advertising and get the blithering thing working quickly once again.  I click on a post and it's what 15-20 seconds before it loads?  Never used to be this way ;o(

 

+1

 

I too think until AVSIM is as close to 100% stable and fast in it's most basic form other new features etc could wait.

 

IMHO there has been far too many issues with AVSIM lately either with response times or like the last couple of days the site is down.

 

Hope you'll be able to find a permanent solution soon to get rid of these issues once and for all and for you guys looking into these issues to be able to focus on more fun stuff rather than performance and stability issues.

Share this comment


Link to comment
×
×
  • Create New...