R1Soft – CDP 2.0, CDP 3.0, cPanel Integration, Delays, and Poor Support (Updated)

R1SoftLet me start by saying that R1Soft, when it works, is an excellent solution that has on a few occasions saved us from partial or complete data loss in the event of an unexpected hardware failure or other unexpected data issue.  That is about the extent of what I have to say good about R1Soft which is extremely unfortunate.

My experiences with R1Soft formally began on March 12th, 2009 when we first obtained our trial R1Soft license.  I was in contact with David Wells from R1Soft after having faced an issue with some lost MySQL databases due to a mistake made by a technician when performing some maintenance on a server.  When discussing the issue with David he made it clear that with R1Soft backing up the server not only could we have restored those destroyed databases but we would also be protected against total data loss such as a catastrophic server failure.

We used the original trial license until March 30th, 2009 when we purchased the R1Soft Linux CDP starter pack for $500 including 5 Linux CDP agents, 1 MySQL addon, and 1 Archiving addon.  This was an excellent deal and up until this point R1Soft had lived up to every promise and we were very happy with the software.

David at R1Soft mentioned CDP 3.0 was coming out soon and mentioned, if memory serves, within the next quarter.  Promises were made such as faster backups, faster restorations, more reliable operation, cPanel integration, a more streamlined interface, and a lot of other features that surely anybody else running R1Soft 2.0 would love to see.  I’ll cut straight to the point – it was nearly 2 years before the 3.0 version of the R1Soft CDP was released in beta and even then it didn’t include all of the features that were promised and was even missing many of the key features of 2.0.

R1Soft 3.0 beta was released as a “standard edition” which only allowed backing up to the same server on either a secondary disk or network mounted storage.  There was no centralized backup server and when I looked into it no bare metal restoration – both features that had been available in 2.0 for over 2 years.  When the “enterprise edition” was finally released in beta, years after it was promised, it did not include bare metal restore or cPanel integration.  You are reading this correct – basic 2.0 features were totally absent from the 3.0 version of the R1Soft CDP even with it having been released nearly 2 years after it was originally promised.

With version 2.0 over the years and 3.0 the short period that we’ve been using it – we’ve always had strange issues that we’ve reported that have been entirely ignored.  We’ve had issues where 2.0 would cause kernel panics, would simply fail to back up, or would fail to restore that support either stated they could not replicate, that they claim don’t exist, or that they claim will be fixed in “the next release” or “sometime soon” that still happen.  We’ve had tickets where we gave them full and complete access to a server to diagnose and reproduce the issue, view and download logs, or anything else they needed to do and gotten back canned responses telling us how to do things on WINDOWS R1Soft backup servers or agents when we run exclusively Linux and made it clear in the ticket.

One issue that we, as well as several providers I network with, have faced with CDP 2.0 is when an end user starts a restoration and does not select the “overwrite files” option and “overwrite failed” errors cause the agent to fail on the restoration.  Now it wouldn’t be quite as bad if the agent just failed out and quit – but it actually fails “on” and consumes a full CPU core indefinitely until an administrator manually kills the restoration process.  There are several valid reasons to do a restoration without overwriting such as if you wish to replace any deleted files inside of a directory tree from a backup without overwriting any files not deleted – this error makes this impossible.  One provider I speak with opened a ticket about this on or around April 9th, 2009 and this issue has yet to be resolved.

Right now R1Soft 3.0 is extremely unstable on OpenVZ kernels, and has been since it has been released.  I know of approximately 5 kernel issues that R1Soft 3.0 has on various CentOS, CloudLinux, and OpenVZ kernels that cause the backups to either simply fail, or worse, for the entire server to hang or lock up.  Not only are R1Soft agent licensed extremely expensive, but they will take your production servers OFFLINE which is unacceptable.  I’ve gotten word that these issues have been resolved and will need to go through 10 business days of QA testing meaning, at minimum, 2 weeks before these serious kernel incompatibilities are resolved assuming all of their testing goes well.

With what R1Soft charges for new licenses, charges for maintenance, and the number of licenses they’ve sold, I simply do not understand how they cannot have a solid development team that can resolve issues in a timely fashion as well as building the new revisions of the software that have been promised.  Why does it take R1Soft 2 years to go from 2.0 to 3.0 when it was promised and why is the software so ridiculously unreliable, buggy, and incomplete when it finally makes it to the market?

Update 02/17/2011

R1Soft 3.0 has some “kernel” issues that occasionally causes a server under specific circumstances to lock up entirely forcing a reboot.  Their development team apparently has recently just finished finding and fixing all of these issues however the next step is to push those updates through “Quality Assurance” and then they’ll be available to those using the software.  While I am all for “QA”, I’m more for us not having to reboot servers twice a week due to the backup process causing the server to hang.

Last night the R1Soft process killed one of our servers at around 5 AM EST and unfortunately it occurred during a period of time that did not have staff coverage. The staff member who was supposed to be watching the process that decided to go awol which meant that bringing the server back online quickly fell onto my shoulders.  Coincidentally, knowing that we had staff coverage, I set my phone on silent for the first time in over two years and it just so happens that this is the night that R1Soft causes the server to hang up, while a staff member is not working like they should, and my phone is on silent.  I’m not trying to make excuses as the staff member should have been doing their job and that is definitely a failure on our part.

It is however a bad situation that should never have happened – if the R1Soft CDP Agent and Kernel Module would work like they’re supposed to the server never would have crashed and been offline.  While yes, the staff member certainly should have been doing their job which would have involved rebooting the server within minutes and avoiding extended downtime, ultimately the failure is due to the R1Soft Software.  I hope for R1Soft’s sake that no competitor brings a quality product to market although I dream about it every night.

It’s bad that a backup system that we rely on to protect our customers’ data also causes us to have to make sure we have somebody awake and watching in case the backup system takes one of our servers offline.  Hopefully R1Soft will have the fixes for these issues pushed out within the next two weeks although I won’t be holding my breath.

Share