- Overview of the State of Indiana Disaster Recovery Plan
- Disaster Recovery Definition
- Agency Responsibilities
- Service Availability
- Disaster Event Possibilities
- State's Disaster Recovery Network
- State DR FAQ
- Glossary of Terms
Disaster recovery is the process of regaining access to the data, hardware and software necessary to resume critical business operations after a natural or human-induced disaster. The complexity of technology systems requires detailed planning and testing to ensure recovery capabilities in the case of a disaster. Disaster Recovery Planning (DRP) is a component of an agency’s Continuity Of Operation Plan (COOP), which is handled by the Indiana Department of Homeland Security (IDHS). Disaster Recovery Planning is managed by the Indiana Office of Technology, which addresses the recovery planning of servers and applications housed in the primary IOT Data Center.
There are many potential disruptive events, and the impact and probability level must be assessed to give a sound basis for progress. If the assessment of the disruptive event doesn’t constitute a need for Disaster Recovery, then the normal SLA process would be invoked depending on the nature of the failure (example: network, hardware, application etc.) To assist with this process, the following list of potential events has been produced (see Disaster Event Possibilities for Indiana Government).
The IOT DR Location
IOT has contracted a state-of-the-art educational institution primary data processing facility to be our secondary data center and recovery site. The distance between IOT’s primary data center and the secondary data center ensures continued state operations during nearly all predictable disasters. The number of efficiencies in the secondary data center also contributes to IOT’s ability to offer various DR Service levels at favorable costs to agencies.
Robust redundant Wide Area Network Connectivity (WAN) between Indianapolis and Bloomington allows unprecedented recovery times for systems bought into the IOT DR Plan. Leveraging the state’s investment in networking infrastructure again drives down costs for state agencies. The state also has Local Area Network Connectivity (LAN) in place at the DR location, and these costs are included in the DR Fee.
Consulting and coordination
IOT has dedicated resources to coordinate facility needs, network connectivity and technical details to ensure disaster recovery can occur within defined timeframes. Agencies enrolling their systems in IOT's Classification/Category/Designation/Service will work with IOT to ensure necessary components are in place, develop a recovery plan, complete DR Testing and document recovery procedures.
Critical recovery of Windows and UNIX systems (Classification/Category/Designation/Service)
Systems designated as Critical have Recovery Time Objective (RTO - Downtime) less than 6 hours, and the Recover Point Objective (RPO-Data Loss) of 60 seconds to 5 minutes up to 23 hours if required. To restore services within this timeframe requires agencies to purchase appropriate processing capabilities and have IOT install them to keep them operationally ready. In addition, data must be replicated from the primary production environment to the disaster environment using SAN Replication technology.
Critical production systems running on a virtual environment have an option to replicate to Bloomington. This option significantly improves Recovery Time Objective (RTO-Downtime) of those critical application production systems.
IOT has devised a strategy to make DR testing part of any new system implementation or system upgrade plan so we have an option to document the recovery procedure before any new system goes live in production. This strategy is applicable only to systems designated as Critical in IOT DR plan.
IOT offers Two Phase DR Testing options. IOT's recommendation for the Agencies is to perform yearly DR tests to keep the recovery documentation maintained by IOT up to date.
Necessary recovery of Windows and UNIX systems (Classification/Category/Designation/Service)
DR necessary classification has been phased out starting FY16 to grandfather only the existing servers currently covered in this plan. We expect these existing systems to be promoted before the end of FY16 to either critical classification on IOT's DR plan or to a new MHA service offering IOT has been piloting for FY16. The reasons behind this decision are as follows:
- Technology that supports the necessary classification already reached its end of life.
- There are challenges with testing these systems for recoverability.
- The only reason we grandfather the existing servers is because they run Windows 2003 Operation system.
IOT has installed a second mainframe computer in the Bloomington recovery facility, and all IOT mainframe systems should be recoverable within 6 hours. There is no separate DR charge for mainframe systems. DR costs are built into current mainframe rates. IOT completed DR testing on the mainframe and documented the recovery procedure.
File and Print recovery
File services provided by IOT include home and shared drives typically used to store Word and Excel documents. IOT has completed migration of all shared file servers to NAS (Network Attached Storage) technology, and they are replicated to Bloomington asynchronously. IOT File Services will be recovered within the Critical (6 hours) timeframe.
Print Services provided by IOT are now part of the pilot testing on the new MHA service offering.
The cost of these capabilities provided by IOT is built into SEAT costs with no additional DR fee applicable.
IOT Shared Citrix, Client VPN, and Site to Site VPN
For Citrix, only the agency systems that bought into Critical Classification/Category/Designation/Service have DR recovery plans for their published application in Bloomington. Agencies should work on their DR plan with IOT if they would want their published application recovered in Bloomington during a DR event. Agencies also have to plan with IOT on the number of Citrix accounts required to access their published application.
Client VPN already is DR ready and is included in the agency current charges for the current active users. Agencies should proactively plan with IOT if they would like to include this as a connectivity option for their COOP plan.
Site to Site VPN for vendor connectivity to agency systems or vice versa already has a DR presence. It would be the agencies' responsibility to plan their third party vendor site-to-site connectivity to Bloomington if the agency bought into Critical Classification/Category/Designation/Service.
E-mail is supported on Critical Classification/Category/Designation/Service
Disaster Recovery fee applies to every individual physical or virtual server dedicated (agency procured server hardware or virtual environment specifically used only for their application/system use) to the agency supporting a production application environment/system hosted in IOT Data Center designated in IOT DR plan as either Critical (6 hours Recover Time Objective-Downtime). This fee includes those incurred by IOT for the dedicated resource, the facilities charges and all network connectivity.
Critical systems might incur additional Server hosting charges, additional SAN storage charges for the data replication and additional Site-to-Site VPN charges for vendor connectivity for the duplicated systems in Bloomington.
IOT will periodically review DR costing structure and may modify the costing scheme to more accurately disperse charges based on changing infrastructure and support costs
(refer to IOT Service Catalog and the Services and Rates Table for details).
- Agencies must categorize systems based on the impact a loss of system availability has on their business.
- Agency is responsible for initiating the DR Plan needs with IOT. This includes the following aspects as it pertains to their DR needs: Design, Planning, Implementation, Testing and Acceptance criteria.
- Agency must be responsible for all DR Fee, Server Hosting, Replicated Storage, Site to Site VPN, and one time hardware procurement charges that are applicable to classification/category/designation/service they bought into for their dedicated system recovery in Bloomington.
- Agencies must also determine the frequency with which their systems need to be tested and also plan/coordinate testing details with IOT.
- Agency must also responsible for communicating any significant upgrades to their system so DR Testing is repeated to update their outdated DR documentation from the previous DR test.
- Agencies are responsible for executing their own COOP under the guidance of IDHS in case their work place also affected by the disaster event by determining where staff will be located and how they will access the systems that IOT would restore in Bloomington.
- Agency must prioritize their application in their COOP plan and work out a plan with IOT to have a DR plan for Recovery.
IOT is pleased to offer a competitive cost-effective disaster recovery solution for all of state government, provided that the agencies have a DR plan with IOT.
- Electrical storms
- Freezing Conditions
- Contamination and Environmental Hazards
Organized and / or Deliberate Disruption
- Act of terrorism
- Act of sabotage
- Act of war
Loss of Utilities and Services
- Electrical power failure
- Loss of gas supply
- Loss of water supply
- Petroleum and oil shortage
- Communications services breakdown
- Loss of drainage / waste removal
Equipment or System Failure
- Internal power failure
- Air conditioning failure
- Production line failure
- Cooling plant failure
- Equipment failure (excluding IT hardware)
Serious Information Security Incidents
- Cyber crime
- Loss of records or data
- IT system failure
Other Emergency Situations
- Workplace violence
- Health and Safety Regulation
Does the state plan protect my agency from all disaster situations?
No. Though it does offer protection from the vast majority of scenarios, there are a limited number of disasters that could affect both the primary and secondary data centers. Most notable among these is an earthquake. Earthquakes are rare in Indiana and damaging ones even more so.
Disasters are not common in Indiana. Why should my agency participate?
Indiana is fortunate that it does not face some of the environmental threats other states do. However, agencies need look no further than recent damage to the Regions bank building in Indianapolis to understand that we are at risk. Similar damage to the state’s data center would have resulted in extended down time. DR capabilities are now available and with the affordable costs should be carefully considered.
What if I don’t sign up for coverage?
Your system will be recovered on a best efforts basis. That time period is at least 45 days and most likely longer. Preparation and planning are the only way to successfully handle disaster scenarios. Facilities, infrastructure and testing must be in place to recover in a timely manner.
Can state agencies split production between the primary and secondary data centers to cut costs and increase protection?
IOT is piloting a new Multisite High Availability Service Offering that would provide split production between primary and secondary data centers. This offering is still at its infancy so agencies may open a HelpDesk ticket to Disaster Recovery Queue for inquiries on details and expected timelines of its availability.
How does an agency contact IOT to work on a DR plan for their systems or any questions relate to DR?
Agency contacts the HelpDesk to open a ticket and requests it to be assigned to the Disaster Recovery Queue with the questions or interest in an IOT DR plan.
What are the other IOT Shared Services other than Exchange, File Services, Print Services, etc., that have DR Recovery Plans?
Shared FTP, Shared Sharepoint, Shared Proxy, Shared SQL (not for all systems so check with your SQL Team), Shared Oracle (not for all systems so check with your Oracle Team), Shared SQL Reporting Services, and Shared Oracle Application Server.
Does IOT have a Disaster Declaration and Communication plan documented as part of the DR plan for the agencies?
Yes, refer to Disaster Declaration and Communication.
Glossary of Terms
IOT – Indiana Office of Technology
IDHS – Indiana Department of Homeland Security
IU – Indiana University
SLA – Service Level Agreement
DR – Disaster Recovery
DRP – Disaster Recovery Plan
MHA – Multisite High Availability
COOP – Continuity Of Operation Plan
MTPOD – Maximum Tolerable Period of Disruption
RTO – Recover Time Objective (Downtime)
RPO – Recover Point Objective (Data loss)
WAN – Wide Area Network
LAN – Local Area Network
SAN – Storage Area Network
NAS – Network Attached Storage
VPN – Virtual Private Network
FAQ – Frequently Asked Questions