- Overview of the State of Indiana Disaster Recovery Plan
- Disaster Recovery Definition
- Agency Responsibilities
- Service Availability
- Disaster Event Possibilities
- State's Disaster Recovery Network
- State DR FAQ
- Glossary of Terms
Disaster Recovery Definition
Disaster recovery is the process of regaining access to the data, hardware and software necessary to resume critical business operations after a natural or human-induced disaster. The complexity of technology systems requires detailed planning and testing to ensure recovery capabilities in the case of a disaster. Disaster Recovery Planning (DRP) is a component of an agency’s Continuity Of Operation Plan (COOP), which is handled by the Indiana Department of Homeland Security (IDHS). Disaster Recovery Planning is managed by the Indiana Office of Technology, which addresses the recovery planning of servers and applications housed in the primary IOT Data Center.
There are many potential disruptive events, and the impact and probability level must be assessed to give a sound basis for progress. If the assessment of the disruptive event doesn’t constitute a need for Disaster Recovery, then the normal SLA process would be invoked depending on the nature of the failure (example: network, hardware, application etc.) To assist with this process, the following list of potential events has been produced (see Disaster Event Possibilities for Indiana Government).
The IOT DR Location
IOT has contracted a state-of-the-art educational institution primary data processing facility to be our secondary data center and recovery site. The distance between IOT’s primary data center and the secondary data center ensures continued state operations during nearly all predictable disasters. The number of efficiencies in the secondary data center also contributes to IOT’s ability to offer various DR Service levels at favorable costs to agencies.
Robust redundant Wide Area Network Connectivity (WAN) between Indianapolis and Bloomington allows unprecedented recovery times for systems bought into the IOT DR Plan. Leveraging the state’s investment in networking infrastructure again drives down costs for state agencies. The state also has Local Area Network Connectivity (LAN) in place at the DR location, and these costs are included in the DR Fee.
IOT Core Infrastructure
IOT has identified the Core Infrastructure systems that support all agency systems on IOT DR plan and able to document all their RTO/RPO timelines on ISI Archer.
Consulting and coordination
IOT has dedicated resources to coordinate facility needs, network connectivity, core infrastructure systems redundant architecture, and technical details to ensure disaster recovery can occur within defined timeframes. Agencies enrolling their systems in IOT's Classification/Category/Designation/Service will work with IOT to ensure necessary components are in place, develop a recovery plan, complete DR Testing and document recovery procedures.
Critical recovery for Windows/UNIX/AIX/LINUX (Classification/Category/Designation/Service)
Systems designated as Critical have two options to choose from DR Premium - Recovery Time Objective (RTO - Downtime) 3 hours and the Recover Point Objective (RPO-Data Loss) of 0 seconds to 1 hour and DR-Traditional 6 hours and the Recover Point Objective (RPO-Data Loss) of 60 seconds to 1 hour if required depending on the agency system owner business needs. To restore services within this timeframe requires agencies to purchase appropriate processing capabilities and have IOT install them to keep them operationally ready. In addition, data must be replicated from the primary production environment to the disaster environment using SAN Replication technology.
Critical production systems running on a virtual environment have an option to replicate to Bloomington. This option significantly improves Recovery Time Objective (RTO-Downtime) of those critical application production systems.
IOT has devised a strategy to make DR testing part of any new system implementation or system upgrade plan so we have an option to document the recovery procedure before any new system goes live in production. This strategy is applicable only to systems designated as Critical in IOT DR plan.
IOT offers DR Testing options for both DR Premium and DR Traditional. IOT standards require all Agencies to perform yearly DR tests to maintain the recoverability with the prescribed RTO and also to keep the recovery documentation current and up to date. IOT also offers Tabletop exercise as an alternate option for DR testing but it doesn’t provide any guarantee for the recoverability with the prescribed RTO but it helps keeping the DR documentation current.
IOT has installed a second mainframe computer in the Bloomington recovery facility, and all IOT mainframe systems should be recoverable within 6 hours. There is no separate DR charge for mainframe systems. DR costs are built into current mainframe rates. IOT performs yearly DR tests on the mainframe and document the recovery procedure.
File and Print recovery
File services provided by IOT include home and shared drives typically used to store Word and Excel documents. IOT File Services supported on NAS (Network Attached Storage) technology, and they are replicated to Bloomington asynchronously. IOT File Services will be recovered within the Critical (6 hours) Recovery Time Objective timeframe and Recovery Point Objective of 60 minutes.
Home Directories are in the process of being migrated to OneDrive in the Microsoft Cloud that has a different SLA than what is being offered OnPrem.
Print Services provided by IOT is covered under critical classification with DR-Traditional RTO/RPO timeframes.
The cost of these capabilities provided by IOT is built into SEAT costs with no additional DR fee applicable.
IOT Shared Citrix, Client VPN, and Site to Site VPN
For Citrix, only the agency systems that bought into Critical Classification/Category/Designation/Service have DR recovery plans for their published application in Bloomington. Agencies should work on their DR plan with IOT if they would want their published application recovered in Bloomington during a DR event. Agencies also have to plan with IOT on the number of Citrix accounts required to access their published application.
Client VPN already is DR ready and is included in the agency current charges for the current active users. Agencies should proactively plan with IOT if they would like to include this as a connectivity option for their COOP plan.
Site to Site VPN for vendor connectivity to agency systems is being worked on to have a DR presence. We expect to get the DR capabilities established by end of calendar year 2020.
All the user mailboxes have been migrated to Exchange Online as of March 2018 which has AZURE Government Cloud Office 365 SLA terms worked out with Microsoft. E-mail Onprem for shared mailbox is supported on Critical Classification/Category/Designation/Service
Disaster Recovery fee applies to every individual physical or virtual server dedicated (agency procured server hardware or virtual environment specifically used only for their application/system use) to the agency supporting a production application environment/system hosted in IOT Data Center designated on IOT DR plan as Critical – DR Premium and Critical – DR Traditional (6 hours Recover Time Objective-Downtime) . This fee includes those incurred by IOT for the dedicated resource providing the project management support, the facilities charges and all network connectivity hosted in the secondary data center.
Critical systems might incur additional Server hosting charges, additional SAN storage charges for the data replication charges for the dedicated physical hardware or VMware server hosted at the secondary data center. VMware hosting charges and additional resources charges apply for the replicated VMware servers depending on DR-Premium or DR –Traditional option.
IOT will periodically review DR costing structure and may modify the costing scheme to more accurately disperse charges based on changing infrastructure and support costs
(Refer to IOT Service Catalog and the Services and Rates Table for details).
- Agencies must categorize systems based on the impact a loss of system availability has on their business.
- Agency is responsible for initiating the DR Plan needs with IOT. This includes the following aspects as it pertains to their DR needs: Design, Planning, Implementation, Testing and Acceptance criteria.
- Complete Archer Profile for the critical system/application on ISI with Business Continuity and Disaster Recovery Time Objective (RTO-Downtime)/Recover Point Objective (RPO – Data loss) requirements
- Project request through Project Success Center
- vFire Ticket to Disaster Recovery Queue with an inquiry
- Agency must be responsible for all DR Fee, Server Hosting, Replicated Storage, Site to Site VPN, and one time hardware procurement charges that are applicable to classification/category/designation/service they bought into for their dedicated system recovery in Bloomington.
- Agencies must also determine the frequency with which their systems need to be tested and also plan/coordinate testing details with IOT.
- DR Testing is required before the system officially GOES LIVE on production and plan on subsequent yearly DR test
- Agency must also responsible for communicating any significant upgrades to their system so DR Testing is repeated to update their outdated DR documentation from the previous DR test.
- Agencies are responsible for executing their own COOP under the guidance of IDHS in case their work place also affected by the disaster event by determining where staff will be located and how they will access the systems that IOT would restore in Bloomington.
- Agency must prioritize their application in their COOP plan and work out a plan with IOT to have a DR plan for Recovery.
- Collaborate with IOT Project Success Center on projects that have DR requirements.
- Collaborate with IOT Operational Teams to document the standards and guidelines supporting DR.
- Collaborate and Partner with agency to initiate the planning process setting high level expectations with costs and options with scope, exceptions to be supported on DR.
- System Architecture drawing with intra/inter agency interfaces and IOT supported infrastructure systems.
- Plan and execute DR Testing.
IOT is pleased to offer a competitive cost-effective disaster recovery solution for all of state government, provided that the agencies have a DR plan with IOT.
Disaster Event Possibilities for Indiana Government
- Electrical storms
- Freezing Conditions
- Contamination and Environmental Hazards
Organized and / or Deliberate Disruption
- Act of terrorism
- Act of sabotage
- Act of war
Loss of Utilities and Services
- Electrical power failure
- Loss of gas supply
- Loss of water supply
- Petroleum and oil shortage
- Communications services breakdown
- Loss of drainage / waste removal
Equipment or System Failure
- Internal power failure
- Air conditioning failure
- Production line failure
- Cooling plant failure
- Equipment failure (excluding IT hardware)
Serious Information Security Incidents
- Cyber crime
- Loss of records or data
- IT system failure
Other Emergency Situations
- Workplace violence
- Health and Safety Regulation
State DR FAQ
Does the state plan protect my agency from all disaster situations?
No. Though it does offer protection from the vast majority of scenarios, there are a limited number of disasters that could affect both the primary and secondary data centers. Most notable among these is an earthquake. Earthquakes are rare in Indiana and damaging ones even more so.
Disasters are not common in Indiana. Why should my agency participate?
Indiana is fortunate that it does not face some of the environmental threats other states do. However, agencies need look no further than recent damage to the Regions bank building in Indianapolis to understand that we are at risk. Similar damage to the state’s data center would have resulted in extended down time. DR capabilities are now available and with the affordable costs should be carefully considered.
What if I don’t sign up for coverage?
Your system will be recovered on a best efforts basis. That time period is at least 45 days and most likely longer. Preparation and planning are the only way to successfully handle disaster scenarios. Facilities, infrastructure and testing must be in place to recover in a timely manner.
Can state agencies split production between the primary and secondary data centers to cut costs and increase protection?
IOT completed the pilot for the split production operation between primary and secondary data centers for 4 years to find there are some performance degradation due to the geographical distance that impacts systems that have heavy transactions demands. IOT came up with a new service offering under Classification/Category/Designation DR Premium that standardized all high availability operations localized to the primary data center and have VMware replication to provide DR capabilities with improved the RTO/RPO timeframes.
How does an agency contact IOT to work on a DR plan for their systems or any questions relate to DR?
Agency contacts the HelpDesk to open a ticket and requests it to be assigned to the Disaster Recovery Queue with the questions or interest in an IOT DR plan.
Does IOT support Disaster Recovery for Cloud supported systems?
In alignment with hybrid everything, IOT is currently working on identifying the core dependencies such as internet gateway on all supported data center, site to site VPN, Identity management for state and public etc.
IOT strongly recommends agencies to follow through the guidelines provided under Agency Responsibilities so we can partner and collaborate to establish the boundaries of support and recoverability expectations, as SLA might vary with the contract worked out with the cloud vendor.
What are the other IOT Shared Services other than Exchange, File Services, Print Services, etc., that have DR Recovery Plans?
Shared Proxy, Shared SQL (not for all systems so check with your SQL Team), Shared Oracle (not for all systems so check with your Oracle Team), Shared SQL Reporting Services, Shared RightFax, Shared Cisco Call Center Services, and Shared Oracle/OBIEE/Web Content Center Application Server
Does IOT have a Disaster Declaration and Communication plan documented as part of the DR plan for the agencies?
Yes, contact the Enterprise Resiliency Services Director Emily Larimer for more information.
Glossary of Terms
IOT – Indiana Office of Technology
IDHS – Indiana Department of Homeland Security
IU – Indiana University
SLA – Service Level Agreement
DR – Disaster Recovery
DRP – Disaster Recovery Plan
MHA – Multisite High Availability
COOP – Continuity Of Operation Plan
MTPOD – Maximum Tolerable Period of Disruption
RTO – Recover Time Objective (Downtime)
RPO – Recover Point Objective (Data loss)
WAN – Wide Area Network
LAN – Local Area Network
SAN – Storage Area Network
NAS – Network Attached Storage
VPN – Virtual Private Network
FAQ – Frequently Asked Questions