SCADA Redundancy: How to Build a High-Availability Industrial Monitoring System
Your SCADA server just went down at 2 AM. The maintenance team is calling. The plant manager wants to know why nobody saw the temperature alarm on reactor vessel 3. The answer is simple: the monitoring system was offline, so the alarm never fired. The reactor tripped on its local safety controller, production stopped, and now you are explaining to management why the system they paid for did not do its job.
This scenario plays out in industrial plants more often than anyone wants to admit. SCADA systems are treated as infrastructure until they fail, at which point everyone realizes they are safety-critical. Redundancy is the engineering practice that prevents this failure mode. It is not glamorous. It is not optional. And it does not have to cost what AVEVA and Ignition charge for their high-availability modules.
This article covers the four redundancy patterns used in industrial monitoring, what each one costs, when each one is appropriate, and how to implement them without enterprise-level budgets.
Why Redundancy Matters in Industrial Monitoring
Industrial monitoring systems serve three functions that cannot tolerate downtime.
Safety. SCADA systems aggregate alarms from PLCs, RTUs, and sensors across a facility. When the SCADA server goes down, the local controllers still protect individual equipment. But the centralized alarm management, the escalation to operators, and the historical record of safety events all disappear. If an operator is relying on the SCADA dashboard to monitor a process that could produce a hazardous condition, a server failure is a safety gap.
Compliance. Environmental monitoring, emissions tracking, and regulatory reporting all depend on continuous data collection. In many jurisdictions, a gap in emissions data triggers a mandatory incident report. In pharmaceutical manufacturing under FDA 21 CFR Part 11, data continuity is not optional. It is a legal requirement. A SCADA outage that creates a data gap can result in regulatory action, fines, or product quarantine.
Production. Every minute of unplanned downtime has a cost. In continuous process industries like chemicals, refining, and food processing, a shutdown and restart cycle can take hours and cost tens of thousands of dollars in lost product, energy waste, and equipment stress. The SCADA system that tells operators when something is drifting out of spec before it triggers an automatic shutdown is the difference between a minor adjustment and a full production stop.
The Two Types of Downtime You Are Protecting Against
Not all downtime is equal. Understanding the difference determines how much you should spend on redundancy.
Planned downtime includes software updates, configuration changes, hardware upgrades, and scheduled maintenance. Planned downtime is announced in advance, coordinated with operations, and managed through work permits. Most plants schedule planned downtime windows during low-production periods. Redundancy for planned downtime means you can take one server offline, do the maintenance, and bring it back without interrupting monitoring. Warm standby handles this adequately.
Unplanned downtime includes hardware failures, power outages, network interruptions, software crashes, and the contractor-who-tripped-over-the-ethernet-cable scenario. Unplanned downtime cannot be scheduled. It happens at the worst possible time. Redundancy for unplanned downtime requires automatic failover. The system detects the failure and switches to the backup before anyone notices. Hot standby or active-active is required here.
The question is not whether you need redundancy. The question is what kind of downtime you are protecting against and what recovery time your operations can tolerate.
The Four Redundancy Levels
There are four established patterns for SCADA redundancy, each with different recovery characteristics, costs, and complexity. Understanding these is the foundation of every high-availability design.
Cold Standby: The Insurance Policy
Cold standby means you have a backup server sitting turned off (or running but not connected to the production network). When the primary fails, someone manually starts the backup, connects it to the network, and points the clients at the new server.
How it works: The backup server has the same software installed and a recent copy of the configuration. Data is not synchronized in real time. When you need it, you power it on, apply any configuration changes made since the last backup, and bring it online.
Recovery time: One to four hours, depending on how quickly someone can physically access the server and how recent the configuration backup is.
Cost: The cheapest redundancy option. You buy a second server and keep it on a shelf. If the primary is a $500 industrial PC, the cold standby costs $500. No additional software licensing in most cases. No network infrastructure changes.
When it makes sense: Facilities where monitoring is important but not safety-critical. Small installations with one operator who can tolerate a few hours of blind operation. Budget-constrained projects where some redundancy is better than none.
When it does not make sense: Continuous process plants, facilities with regulatory data-collection requirements, or any installation where a multi-hour monitoring gap creates safety or financial risk.
Warm Standby: The Smart Middle Ground
Warm standby means the backup server is running and connected to the network, but it is not actively processing data. It periodically synchronizes configuration and historical data from the primary. When the primary fails, the warm standby takes over automatically or with minimal manual intervention.
How it works: The backup server runs the SCADA software in a passive mode. It pulls configuration updates and historical data at regular intervals, typically every few minutes. A heartbeat mechanism monitors the primary. If the primary stops responding, the backup promotes itself to active.
Recovery time: One to five minutes. The server is already running. The software is already loaded. The failover is mostly automatic.
Cost: Moderate. You need a second server running 24/7, which means additional power, cooling, and rack space. The software licensing may require a redundant server license, which some vendors charge extra for.
When it makes sense: Most industrial installations. This is the sweet spot for facilities that need continuous monitoring but cannot justify the cost and complexity of hot standby. A water treatment plant, a small manufacturing facility, or a building management system. The operators expect monitoring to be available at all times, and a five-minute gap during an automatic failover is acceptable.
When it does not make sense: Process environments where a five-second monitoring gap could result in a missed safety alarm. If you are monitoring a batch chemical reactor where excursions happen in seconds, warm standby is not fast enough.
Hot Standby: Near-Zero Downtime
Hot standby means the backup server is running, fully synchronized with the primary in near-real-time, and ready to take over instantly. The failover is automatic and typically completes in under a second. Operators may see a brief flicker on their dashboards but no data loss.
How it works: Both servers share a synchronized state. Every tag update, alarm event, and configuration change on the primary is replicated to the standby within milliseconds. A dedicated heartbeat channel monitors the primary. If it fails, the standby assumes the primary's IP address and resumes serving clients within seconds.
Recovery time: Seconds. In a well-tuned hot standby system, the failover is faster than the operator refresh interval on most dashboards. The operators may not even notice.
Cost: Significant. You need two identical servers, a dedicated synchronization network between them, and SCADA software that supports hot standby natively. AVEVA and Ignition both offer hot standby modules, but they are priced accordingly. The total cost is roughly 1.5x to 2x the cost of a single-server deployment.
When it makes sense: Critical infrastructure. Power generation. Large water and wastewater facilities. Chemical plants with continuous processes. Any facility where the cost of a monitoring gap exceeds the cost of the redundancy infrastructure.
When it does not make sense: Small installations where the redundancy infrastructure costs more than the entire monitoring project. A $50,000 hot standby deployment for a facility with 50 sensors and one operator is over-engineering.
Active-Active: No Single Point of Failure
Active-active means both servers are handling traffic simultaneously. There is no primary and backup. Both servers collect data, serve dashboards, and process alarms. If one fails, the other continues without interruption. There is no failover because both are always active.
How it works: A load balancer distributes client connections between both servers. Both servers connect to the field devices (or share communication gateways). Data is written to a shared database or replicated bidirectionally. When a server fails, the load balancer stops sending traffic to it, and the remaining server handles all clients.
Recovery time: Zero. There is no failover. The surviving server continues processing. Clients may need to reconnect, but modern web-based SCADA systems handle this transparently.
Cost: The most expensive option. Two full servers, a load balancer, shared storage or bidirectional replication, and the engineering effort to configure it all correctly. In enterprise SCADA deployments, active-active can double the project cost.
When it makes sense: Enterprise-grade installations where the SCADA system serves hundreds of users across multiple sites. Power grid control centers. Large refinery-wide monitoring. These are the installations where AVEVA System Platform and Ignition with full redundancy are the standard, and the six-figure price tag is justified by the scale.
When it does not make sense: Almost everywhere else. Active-active is the redundancy pattern for systems that cannot ever go down. Most industrial monitoring systems do not fall into this category, despite what vendors selling redundancy modules might suggest.
Redundancy Comparison: The Full Picture
The warm standby row sits between these two: minutes of recovery time, automatic failover, moderate cost. For most system integrators working on small to mid-size industrial projects, warm standby is the practical choice. It provides meaningful redundancy without the cost and complexity of hot standby or active-active.
What You Actually Need to Replicate
Redundancy is not just about duplicating the SCADA server. A complete high-availability design addresses every component in the data path.
SCADA server. The primary application server running the monitoring software. This is what most people think of when they say "redundancy." Duplicate it with whichever standby pattern fits your requirements and budget.
Database. The historian that stores tag data, alarm events, and audit logs. If the database is on the same server as the SCADA application (common in small deployments), it gets replicated with the server. If it is a separate database server, it needs its own redundancy. PostgreSQL and TimescaleDB support streaming replication natively. SQLite, which many lightweight SCADA systems use, requires file-level replication.
Communication gateways. The gateways that translate between industrial protocols (Modbus, OPC-UA, MQTT) and the SCADA server. If the primary gateway fails, the SCADA server loses its connection to the field devices. Gateway redundancy means running two gateways on different network paths, with the SCADA server failing over between them. In multi-protocol plants with Siemens S7 and Allen-Bradley PLCs alongside Modbus devices, gateway redundancy is especially important since a single translation layer failure can blind the entire monitoring system.
Network paths. Two servers connected to the same switch are not redundant if the switch fails. True network redundancy means two independent paths from the SCADA servers to the field devices. Separate switches, separate cabling, ideally separate power feeds. In practice, many installations use a ring topology or dual-homed servers with connections to two different network segments.
The Voltrus Approach: Redundancy Without Enterprise Complexity
Voltrus was designed as a single binary under 20 MB. That design decision was not arbitrary. A single binary with no external dependencies is trivially easy to replicate.
Here is what a Voltrus warm standby deployment looks like:
- Primary server: A Raspberry Pi 4 or an industrial PC running Voltrus, connected to Modbus TCP devices on the plant network.
- Backup server: A second Raspberry Pi or industrial PC on the same network, running Voltrus in standby mode.
- Configuration sync: Copy the Voltrus configuration file (a single YAML file) from primary to backup. This can be automated with rsync, scp, or any file synchronization tool.
- Data sync: Voltrus uses SQLite for its local database. Sync the database file periodically, or use Litestream for continuous replication to the backup.
- Failover: Use a simple heartbeat script. If the primary stops responding, the backup starts serving on the primary's IP address. This can be automated with keepalived or a custom systemd service.
Total cost for this setup: two Raspberry Pi 4s ($75 each), two SD cards, and about two hours of configuration time. No additional software licensing. No redundancy module to purchase. No vendor-specific failover protocol to learn.
The reason this works is that Voltrus has no moving parts. No JVM. No application server. No message broker. No container runtime. A single binary that starts in under one second and serves dashboards over HTTP. When the backup takes over, it starts serving immediately with the last synchronized data. The gap is measured in seconds of startup time, not minutes of application initialization.
For more critical installations where hot standby is required, the same pattern applies with faster synchronization intervals. Litestream can replicate SQLite changes in near-real-time to the backup server. A keepalived virtual IP ensures that failover happens at the network layer in under a second. The total cost is still two Raspberry Pis and some open-source tooling.
Practical Pattern: Dual-Pi Redundancy Under $200
Here is a concrete deployment pattern that system integrators can replicate for small to mid-size monitoring projects.
Hardware:
- Primary: Raspberry Pi 4 (4 GB) in an industrial case with DIN rail mount
- Backup: Raspberry Pi 4 (4 GB) in the same configuration
- Both connected to the plant network via Ethernet
- Both powered from the same UPS as the network switch
Software:
- Both running Raspberry Pi OS Lite (64-bit)
- Both running Voltrus with the same configuration
- Primary: Voltrus running actively, collecting data from Modbus devices
- Backup: Voltrus running in standby, syncing data via Litestream
- keepalived managing a virtual IP that clients connect to
Failover behavior:
- keepalived monitors the primary via health checks every 2 seconds
- If the primary fails 3 consecutive checks, the backup takes over the virtual IP
- The backup's Voltrus instance starts actively polling Modbus devices
- Total failover time: approximately 6-8 seconds
- When the primary recovers, it can resume as primary or stay in standby (configurable)
Cost breakdown:
Compare this to an Ignition Edge redundant deployment: two Ignition Edge licenses at $1,850 each, two industrial PCs at $1,500 each, and the engineering time to configure Ignition's built-in redundancy module. Total: approximately $7,000. The Voltrus deployment achieves the same warm standby outcome at one-sixth the cost.
How to Choose the Right Redundancy Level
The decision is driven by one question: what is the cost of a monitoring gap?
If the cost of a monitoring gap is measured in inconvenience (a maintenance team that has to walk the plant floor instead of checking a screen), cold or warm standby is sufficient. The recovery time is acceptable because the consequences are limited.
If the cost of a monitoring gap is measured in regulatory exposure (missed emissions data, gaps in FDA-mandated records), warm standby is the minimum. Hot standby is preferable if the budget allows.
If the cost of a monitoring gap is measured in safety risk (a chemical process running without centralized monitoring, a power distribution system without real-time visibility), hot standby is required. Active-active is appropriate for the most critical installations.
Most system integrators working on small to mid-size projects should default to warm standby. It provides meaningful redundancy at moderate cost and complexity. The jump from warm to hot standby doubles the cost but only saves a few minutes of recovery time. Unless those minutes have a quantifiable safety or financial cost, warm standby is the pragmatic choice.
The Bottom Line
SCADA redundancy is not a luxury. It is an engineering requirement for any monitoring system that operators depend on. The four levels, cold standby, warm standby, hot standby, and active-active, offer a spectrum of recovery times and costs. The right choice depends on what a monitoring gap costs your facility.
For most small to mid-size industrial monitoring projects, warm standby is the practical choice. It provides automatic failover in minutes, at moderate cost, with manageable complexity. The Voltrus single-binary architecture makes warm standby deployment straightforward: two servers, synchronized configuration and data, and a heartbeat-based failover mechanism. No enterprise redundancy modules. No proprietary failover protocols. Just a clean, simple pattern that works at 2 AM when the primary server fails and the phone starts ringing.
The best redundancy design is the one you can afford, implement, and maintain. A dual-Pi warm standby deployment that costs $1,200 and actually gets installed is infinitely better than a $50,000 hot standby design that stays on the whiteboard because the budget never gets approved.
Build Redundant Monitoring Without Enterprise Pricing
Voltrus: a single-binary SCADA that runs on Raspberry Pi. Deploy two instances, sync the data, and get warm standby redundancy for under $1,200 total. No redundancy modules. No per-server licensing tricks.
Explore VoltrusFrequently Asked Questions
What is the best redundancy level for small to mid-size SCADA deployments?
Warm standby is the practical choice for most small to mid-size industrial monitoring projects. It provides automatic failover in one to five minutes, at moderate cost, with manageable complexity. A dual-Raspberry Pi warm standby deployment with Voltrus costs around $1,178 total for the complete redundant system, compared to approximately $7,000 for an equivalent Ignition Edge redundant setup.
What is the difference between hot standby and warm standby in SCADA?
Hot standby keeps the backup server fully synchronized with the primary in near-real-time, with automatic failover completing in seconds. Warm standby keeps the backup running and connected but only periodically synchronizes data, with failover taking one to five minutes. Hot standby costs roughly 1.5x to 2x a single-server deployment and requires a dedicated synchronization network. Warm standby is significantly cheaper and simpler while still providing meaningful redundancy.
Can you run redundant SCADA on Raspberry Pi devices?
Yes. A dual-Raspberry Pi 4 warm standby deployment is a proven, cost-effective approach for monitoring 20-150 sensors at a single site. Use two Pi 4s (4 GB model) with Voltrus in a primary/standby configuration, sync data via Litestream, and manage failover with keepalived. Total failover time is approximately 6-8 seconds. The entire redundant system costs under $1,200 including software licensing.
How much does SCADA redundancy cost?
Costs range from $500 for a cold standby server on a shelf to $50,000+ for an active-active enterprise deployment. A practical warm standby setup using Voltrus on two Raspberry Pi 4s costs approximately $1,178 total (hardware plus two Voltrus Professional licenses at $499 each). An equivalent Ignition Edge redundant deployment runs approximately $7,000. Enterprise hot standby from AVEVA or Ignition can cost $15,000-$50,000+.
What components need to be replicated for SCADA high availability?
A complete high-availability design must address every component in the data path: the SCADA server (application), the database (historian), communication gateways (protocol translators), and network paths (independent cables and switches). A redundant SCADA server with a single database, single network connection, and single power supply has not actually solved the availability problem. The chain is only as strong as its weakest link.