|
WHSP-Mark M
|
 |
« on: July 01, 2010, 06:30:05 AM » |
|
Hello,
At 10:40PM MST last night our technicians noted that sites on our defender.whspn.net server stopped responding. Upon a reboot of the server our data center team noted the box was not coming up properly. It was determined that there had been a 'kernel panic' on the server which makes the server inoperable.
We are currently hard at work restoring our client data from our most recent backups (roughly 24 hours old) onto a new server in efforts to get the websites back online as quick as possible. Further updates will follow this post when they are available.
We thank you for your patience and understanding during this difficult time.
The WebHSP Team.
|
|
|
|
|
Logged
|
|
|
|
|
elytradesign
|
 |
« Reply #1 on: July 01, 2010, 10:19:52 AM » |
|
Hi Mark, I'm sure you are all working full on to get the sites back up, but can we get a new time estimate for resolution? The voicemail says 6 hours, but there is no reference to when this message was first left - so relative to when? 7am, 8am, eastern, central, pacific?  ? Many thanks - as always. Andrea
|
|
|
|
|
Logged
|
|
|
|
|
WHSP-Mark M
|
 |
« Reply #2 on: July 01, 2010, 10:30:08 AM » |
|
- Update -
Our new server has been racked and the raid array has been configured and we are ready to start the restoration at as fast as the hard drive can write to itself. We are estimating this process to take another 7-8 hours however due to the amount of data that is required to copy.
Our network team determined the kernel panic was brought on by not just one but two failed hard drives in our raid array. These drives were brand new when they were placed in the server one month ago and to have them fail one month into production is unheard of and could not have been foreseen.
When we have further information we will provide another update.
Thank You
The WebHSP Team.
|
|
|
|
|
Logged
|
|
|
|
Tim
Newbie
Offline
Posts: 5
|
 |
« Reply #3 on: July 01, 2010, 01:47:09 PM » |
|
Update? Eta? I told my clients 6 hours at 7:30 this morning. Is this going to be resolved tonight?
|
|
|
|
|
Logged
|
|
|
|
|
WHSP-Mark M
|
 |
« Reply #4 on: July 01, 2010, 02:40:39 PM » |
|
Hello,
Firstly thank you for your patience during this diffuclt time, we know it isn't easy being down.
We have been restoring the data to the new server that we setup to replace our recently corrupted defender server, however during the restore process we noted that the program was erroring out and wasn't going to complete properly. To avoid lengthier downtime we have halted this restore process and are now proceeding with another way of getting the data to our new server.
While this new method pushes our restore time frame even further, it will ensure that all of our clients data is safe and 100% complete.
Again we thank you for your patience on this matter.
Thank You The WebHSP Team
|
|
|
|
|
Logged
|
|
|
|
|
junaid@payandgo.biz
|
 |
« Reply #5 on: July 01, 2010, 02:57:35 PM » |
|
hi
please try to minimize downtime as its very longer downtime and my 150 clients are very frustrated because just few weeks back server upgraded and now this happened
|
|
|
|
|
Logged
|
|
|
|
|
greeny
|
 |
« Reply #6 on: July 01, 2010, 03:24:40 PM » |
|
Actually, infant mortality is a well-known and quite forseeable phenomenom for hardware generally, and disks in particular. Nor is it surprising if disks from a common lot have correlated failures, or if large disks arrays in a RAID 5 configuration experience a recovery failure after a single disk failure, because of a single unrecoverable read error on one of the other disks.
Needless to say, webHSP clearly needs a better catastrophic failure recovery plan. It's surprising to me that you apparently don't have at least one spare racked and ready to begin restoration from backup, to recover a failed server. From the posting below, it looks as if it took more than 12 hours just to get to that point.
In any case, an updated ETA on the replacement server coming online would be useful.
|
|
|
|
|
Logged
|
|
|
|
Tim
Newbie
Offline
Posts: 5
|
 |
« Reply #7 on: July 01, 2010, 03:53:08 PM » |
|
Sites and more importantly email is down, and I still do not have an ETA to tell my clients. You guys have never let me down, but this is very concerning that I have no answers and it is starting to look like a 24 hour turnaround for a crashed server.
|
|
|
|
|
Logged
|
|
|
|
jann
Newbie
Offline
Posts: 12
|
 |
« Reply #8 on: July 01, 2010, 04:29:48 PM » |
|
This is what it will be like when our civilization ends Considering all that is involved, I amazed that this stuff works as well as it does.  Also, a good way to get us to join the forum. Lets show tolerance, I'm sure the engineers are stressing enough.
|
|
|
|
|
Logged
|
|
|
|
|
Adel
|
 |
« Reply #9 on: July 01, 2010, 04:51:25 PM » |
|
Time check
|
|
|
|
|
Logged
|
|
|
|
|
Adel
|
 |
« Reply #10 on: July 01, 2010, 05:36:24 PM » |
|
How are we going with this? Any chance of a progress report and ETA please? Our clients are just starting their day and a little antzy about the server still being down this morning.
|
|
|
|
|
Logged
|
|
|
|
|
greeny
|
 |
« Reply #11 on: July 01, 2010, 07:25:53 PM » |
|
Status? ETA?
|
|
|
|
|
Logged
|
|
|
|
|
greeny
|
 |
« Reply #12 on: July 01, 2010, 07:31:29 PM » |
|
Had to post the prior message to figure out the time zone of the datestamps. So it appears to have been 5 hours since the last status update. And we're now at 20 hours of downtime without any ETA *ever* being posted here.
How can that be?? I mean, 20 hours of downtime for a server crash is an enormous issue in itself...but how can any serious service just leave everybody hanging with no info, for so much time??
|
|
|
|
|
Logged
|
|
|
|
|
greeny
|
 |
« Reply #13 on: July 01, 2010, 07:59:34 PM » |
|
And another thing, here we are more than 20 hours in, and every web page for every domain hosted on the server is still just timing out, when anybody tries to load it.
Why haven't you at least redirected any http request to any defender IP address, to a splash page advising that there is a server problem being actively worked?
|
|
|
|
|
Logged
|
|
|
|
|
john
|
 |
« Reply #14 on: July 01, 2010, 08:17:21 PM » |
|
Hello,
The restore is continuing. There is a lot of data to be restored and that is one reason it is taking so long.
As for an ETA. I think we are looking at 6 to 8 hours from now. Although it may be longer or shorter from that time.
We are working on this non-stop until restores are complete and all sites are working normally. We did have one false start noted in an earlier post from Mark and that set us back, but we are on track now to complete this as fast as possible.
As for the coincidence of two new drives failing on a new server. Very disconcerting and we are working with our hardware vendor to see if there are any issues we need to be aware of. So we are taking that very seriously and will be pursuing that aggressively once we get this server back online.
~John Burns
|
|
|
|
|
Logged
|
|
|
|
|
Adel
|
 |
« Reply #15 on: July 01, 2010, 08:27:49 PM » |
|
Another six to eight hours?
I was just told two to three hours over the phone and have passed that on to my clients.
|
|
|
|
|
Logged
|
|
|
|
ecs
Newbie
Offline
Posts: 2
|
 |
« Reply #16 on: July 01, 2010, 08:35:47 PM » |
|
At least the billing server is still up and running !!!......just got July's invoice.....luckily I used an email address that is seperate to my hosting, otherwise it would still be bouncing !
|
|
|
|
|
Logged
|
|
|
|
|
junaid@payandgo.biz
|
 |
« Reply #17 on: July 01, 2010, 08:38:42 PM » |
|
Hi
it is more than 24 hours ago to server offline.Business day is near to start here after 90 min. what is the status?when server will get back online?please post updates our clients will get annoy if today passes away like yesterday. how much time remains any ETA?
|
|
|
|
|
Logged
|
|
|
|
jdcarr
Newbie
Offline
Posts: 1
|
 |
« Reply #18 on: July 01, 2010, 09:02:08 PM » |
|
I've been with WebHSP since May of 2002, when you used to be Ventures Online. Shouldn't that earn me something, maybe just a simple email letting me know you were having a problem and were working on it? I didn't get that. The first I heard about this problem was from my users. It's been 24 hours (like dog years, that's a week in internet years) and the only communication I have received from you is A FUCKING INVOICE! Are you kidding me?!?!
Who is running this show?
|
|
|
|
|
Logged
|
|
|
|
|
junaid@payandgo.biz
|
 |
« Reply #19 on: July 01, 2010, 09:10:27 PM » |
|
Hi
i have been with webhsp since 2003 when it was ventures online. please post updates regarding server.
|
|
|
|
|
Logged
|
|
|
|
|