SDA India is an online resource for Software, Development,IT, Architecture, Open Source, Mobile, Security, Databases, Delphi, C, OS, Asp, .Net, Php, Xml, Java

Enterprise Solutions Enterprise IT Architecture Information Security Wireless And Mobility Hardware & Networking Data & Storage

Make Data Deduplication Part of Your Backup Strategy


Current Issue
Basant Rajan
Basant Rajan is the Chief Technology Officer - India, Symantec Corporation and the source of innovation leadership and strategic vision. As the CTO, Basant’s role is to ensure Symantec's long-term success and customer loyalty through innovating next generation technologies, architecture and standards; while providing operational direction to R&D and shared engineering services.



Coping with a sea of data

Enterprise backup policies haven’t evolved all that much in recent years. Backup data is still, for the most part, written to magnetic tape each night, duplicated and then sent off-site to meet disaster recovery needs. Of course, disk already plays a role in enterprise data protection– either by providing a temporary stage for data, before it goes to tape, or by supporting snapshots related data protection – but most data still goes to tape.

As more companies move to 24x7 business cycles, and as the amount of data to protect grows rapidly, the notion of a long period of “downtime” for applications and corporate systems appears almost quaint.

Research has found that, among Fortune 1000 companies, average storage capacity grew from 362 terabytes in early 2005 to 1,013 terabytes in 2007 (67 percent compound annual growth per year). New disk-based backup technologies can help companies address the backup challenges that come with rapid storage growth and increased recovery expectations, but companies should also expect to use a combination of disk-based technologies.

Backup has emerged as the leading area of focused improvement for Fortune 1000 storage organizations in 2007. Front-end storage growth, regulatory compliance, and increasing data retention times have created a need for backup innovation to maintain the highest levels of data protection.

Enterprises assessing their data protection infrastructure would do well to begin by asking the following questions:
· Do you know how much money you spend to protect different types of data and whether or not your most important data has the highest level of protection?
· How quickly can you recover different applications as well as data in the event of human error, system failure, or disaster?
· Is data at your remote offices protected consistently?
· Do you frequently test data recovery as well as your state of disaster recovery readiness?
· Can you quickly retrieve data such as files or email from online and offline sources in audit situations or recovery emergencies?
· Can you effectively demonstrate and report on protected data across business units, locations and applications?

While few organizations can answer all of these questions affirmatively, too many negative or ambiguous responses present a strong argument for revising your data protection infrastructure and policies and considering data deduplication technology.


Flexibility from the Remote Office to the Data Center

Data deduplication technology is being deployed in both the remote office and in the data center to help companies centralize their backup data, reduce storage costs, and improve both local and disaster recovery efficiency. Data deduplication eliminates redundant backup data at a block level across multiple backup sets and locations making file names, attributes, and physical locations irrelevant. It can function at the start of the backup process, before data ever leaves the server, to reduce the network bandwidth required, or it can be deployed behind a backup application, as a disk-based storage target. Both approaches can offer storage efficiencies that range from 10-50 times and even greater bandwidth savings.

Where and how to deploy data deduplication technology will depend on the user’s environment and recovery needs. With client-side deduplication (or front-end), you place a client (i.e., small amount of software) on a server in place of a traditional backup agent. This client eliminates duplicate backup data before sending it across the network. This bandwidth efficient approach is ideal for systems in remote offices, virtual environments (e.g., VMware ESX servers), or smaller offices with limited bandwidth. And unlike a traditional backup approach where backups streams flood the network, limiting the scheduling of backup jobs, client-side deduplication enables you to run many jobs simultaneously because a given backup job needs anywhere from 10-500 times less bandwidth. Companies can leverage client-side deduplication for bandwidth constrained backup systems such as those found in remote offices and virtual environments to increase the reliability of recovery, to centralize backup data, and reduce storage costs.

Storage-based deduplication is gaining popularity in the data center as a way to help manage storage growth, recover data from disk, and expand disk-based disaster recovery to lower tiers of data. Unlike client-side deduplication, storage-based deduplication normally does not require any changes to your backup software. It can be deployed as a hardware storage appliance or as software solution that uses your own combination of servers and storage. Both approaches reduce the size of a backup after it has streamed across a network and through a backup server. Most customers have fewer concerns about backup bandwidth in the data center. The storage efficiencies are the same as with client-side deduplication and typically reduce aggregate storage required for a backup (over a given retention time) by 10-20 times, when compared to tape. For data with short retention periods (e.g., one to four weeks), using deduplication can be especially appealing because it can improve the reliability of recovery and help companies control tape related costs in the face of growing data volumes.

With deduplication the traditional concept of a full backup, or a full copy, disappears. Instead every new backup (or copy) relies on the previous copy to ensure that only the unique blocks are stored. Of course, a full backup can be recovered at any time – even if you haven’t written a typical full backup to disk or tape. Hardware redundancy (e.g., RAID 5) protects most systems, but customers need to remember that backups of their “dedupe storage system”, and not just replicate a copy, to add an important layer of protection.


How Can You Get Started with Deduplication?

Should client-side deduplication or storage-based deduplication replace all your current backup methods and media? Most likely not, because most environments require different degrees of protection and not all types of data are good candidates for deduplication. In other words, recovery point objectives (RPOs) and recovery time objectives (RTOs) should be matched to data protection methods. More recovery points might be better served by snapshots or continuous data protection (CDP). Faster recovery might be better served with snapshot methods or a SAN based backup to high-speed disk. And recovery requirements for data often change as the data ages. Finally, not all data types dedupe well, in particular compressed file formats used for music, photos, medicine, or research. For example, if a dedupe storage system cannot recover large amounts of data (or millions of small files) quickly, a user might instead choose to store backup data on high speed disk for the first week and then move this data to a dedupe storage target for the remainder of the retention time. The underlying principle when considering dedupe technology – evaluate the benefit and limitations of each approach alongside your RTO and RPO objectives.

Customers should deploy data deduplication based on specific needs within their environment. Controlling backup storage costs is an obvious imperative for many companies, but eliminating distributed tape-based backups may also present a cost-saving opportunity. Here are a few points to remember as you consider where to use this technology.

· Assess where you can use this technology (e.g., client-side and/or storage-based) across all offices, data centers, and systems – including virtual environments.
· Evaluate how deduplication technology fits within your existing data protection environment (backup applications, storage, and servers).
· Examine the trade-offs between storage appliances and software-based solutions for dedupe storage.
· Determine whether you will need to export data out of your deduplication storage system to tape.
· Remember to factor in the protection of your dedupe storage system (from data corruption) as part of your overall architecture.


Conclusion

According to some experts, data deduplication can reduce total backup storage usage by factors of 10:1 or more (depending on the nature of the data) when compared to traditional backup methods to tape. The bandwidth reductions delivered by client-side data deduplication technology are even more significant. When deployed as part of the overall backup strategy and used alongside other backup methods and media, there is no doubt that deduplication can bring both operational and economic benefits to companies.

  Related Links
None
Post a Comment
Name
Title
Comment
Menu
News Desk
Feature Stories
Articles
Interviews
Case Studies
White Paper
Analyst Corner
Planet SDA-India
SDA Events
INDIA IT Event Calender
IT Jobs
Advertise