|
<< Click to Display Table of Contents >> Navigation: Installation & Administration > 2.5: Data Corruption |
The iCare AMS database is very stable and AMS itself includes several software layers to prevent data corruption. In more than 10 years of operation, none of our customer database was corrupted. Nonetheless, there are some precautions to be taken if you are doing the administration of your own AMS Server (non hosted solution):
1/ Make sure the computer on which AMS Server is installed is protected against power cuts (including UPS and redundant power supplies)
2/ Use RAID controllers with large cash memory, high quality (Server quality) hard drives and RAM,
3/ Ensure your server has sufficient memory to handle your client computer's sessions
4/ Use only approved Firebird versions
The following is a discussion regarding the main causes of data corruption and how to avoid it
Main causes of database corruption
Unfortunately no database can be considered as an absolute warranty that your data are secured and cannot be corrupted. In this chapter we will consider the principal causes that lead to database corruption, some methods of repairing databases and extracting information from them. We will also make recommendations and offer precautions that will minimize the probability of information loss.
First of all, if we speak about database repair we should perhaps first define "database corruption". A database is usually described as damaged if, when trying to extract or modify some information, errors appear and/or the information to be extracted turns out to be lost, incomplete or incorrect. There are cases when database corruption is hidden and can only be found by testing with special facilities. However there are also real database corruptions, when it is impossible to connect to the database, when adjusted programs send strange errors to the clients (without any data manipulation having occurred), or when it is impossible to restore the database from a backup copy.
Principal causes of database corruption are:
1.Abnormal termination of the server computer, especially an electrical power interruption. For the IT-industry it can be a real blow and that is why we hope there is no need to remind you once again about the necessity of having a source of uninterrupted power supply on your server.
2.Defects and faults on the server computer, especially the HDD (hard disk drive), disk controllers, the computer's main memory and the cache memory of Raid controllers.
3.An incorrect connection string to a multi-client database with one or more users (in versions prior to 6.x). When connecting via TCP/IP, the path to the database must be pointed to a server name: drive:/path/databasename/
Even when connecting to a database from the computer, on which the database is located and where the server is running, the same specification should be used, renaming the servername as localhost. It is not possible to use mapped drives in the connection specification. If you break one of these rules, the server thinks that it is working with different databases and database corruption is guaranteed.
1.File copy or other file access to the database when the server is running. The execution of the command shutdown, or disconnecting the users in the usual way is not a guarantee that the server is doing nothing with the database. If the sweep interval is not set to 0, garbage collection may be being executed. Generally the garbage collection is executed immediately after the last user disconnects from the database. Usually it takes several seconds, but if many DELETE or UPDATE operations were committed before it, the process may take longer.
2.Using Firebird version not approved for use with AMS. In some case these non-approved versions may work perfectly but some versions contains known errors. Use only approved versions.
3.Exceeding size restriction of a database file. At the time of writing this, for most existing UNIX platform servers the limit is 64 TB but it is recommended to keep it as small as possible. When the database size is approaching its limit (2 times the size of the freshly restored database), a database cleanup will be required.
4.Exhaustion of free disk space when working with the database.
Power supply failure
When shutting off the power on the server, all data processing activities are interrupted in the most unexpected and (according to Murphy's law) dangerous places. As a result the information in the database may be distorted or lost. The simplest case is when all uncommitted the data from a client’s applications are lost as a result of an emergency server shutdown. After a power-cut restart the server. This analyzes data, makes a note of incomplete transactions related to none of the clients, and cancels all modifications made within the bounds of these «dead» transactions. Actually such behavior is normal and assumed from the start by developers.
However power supply interruption is not always followed just by such insignificant losses. If the server was executing a database extension at the moment of power supply interruption, there is a large probability of orphan pages present in the [[Database structure
| database file]] (pages that are physically allocated and registered on the page inventory page (PIP), upon which it is however impossible to write data).
Only GFIX, the repair and modification tool, is able to combat orphan pages in the database file. Actually orphan pages lead to unnecessary use of disk space and, as such, are not the cause of data loss or corruption. Power loss leads to more serious damages. For example, after shutting off the power and restarting, a great amount of data, including committed data, may be lost (after adding or modification of which the command «commit transaction» was executed). This happens because confirmed data is not written immediately to the database file on disk. The file cache of the operating system (OS) is used for this purpose. The server process gives the data write command to the OS. Then the OS assures the server that all the data has been saved to disk although in reality the data is initially stored in the file cache. The OS doesn't hurry to save this data to disk, because it assumes that there is a lot of main memory left, and therefore delays the slow operation of writing to disk until the main memory is full. Please refer to the next subject - Forced Writes - cuts both ways - for further information.
Forced writes - cuts both ways
In order to influence this situation, tuning of the data write mode is provided in Firebird. This parameter is called FORCED WRITES(FW) and has 2 modes - ON (synchronous) and OFF (asynchronous). FW modes define how Firebird communicates with the disk. If FW is turned on, the setting of synchronous writes to disk is switched on, and confirmed data is written to disk immediately following the COMMIT command, the server waits for writing completion and only then continues processing. If FW is switched off InterBase® doesn't hurry to write data to disk after a transaction is committed, and delegates this task to a parallel thread, while the main thread continues data processing, not waiting until all writes are written to disk.
Synchronous writes mode is one of the most careful options and it minimizes any possible data loss. However it may cause some loss of performance. Asynchronous writes mode increases the probability of loss of a great quantity of data. In order to achieve maximum performance FW OFF mode is usually set. But as a result of power interruption a much higher quantity of data is lost using the asynchronous writes mode than when using the synchronous mode. When setting the write mode you should decide whether a few percentage points of performance are more significant than a few hours of work should power be interrupted unexpectedly.
Very often users are careless with Firebird. Small organizations save on any trifle, often on the computer server, where the DBMS server and different server programs (not only server) are installed and running as well. If they hang-up people don't think for long, and simply press RESET (it happens several times a day). Although Firebird is very stable with regard to such activities compared with other DBMS, and allows work with the database to start immediately after an emergency reboot, such a procedure is not recommended. The number of orphan pages increases and data lose connections among themselves as a result of faulty reboots. It may still function and continue for a long time, but sooner or later it will come to an end. When damaged pages appear among PIP or generator pages, or if the database header page is corrupted, the database may never open again and become a big chunk of separate data from which it is impossible to extract a single byte of useful information.
Corruption of the hard disk
Hard disk corruptions lead to the loss of important database system pages and/or the corruption of links among the remaining pages. Such corruptions are one of the most difficult cases, because they almost always require low-level interference to restore the database.