rmlocks, wsem, monlock - Removing Stale Locks

`rmlocks`, `wsem`, `monlock` - Removing Stale Locks

SYNOPSIS

rmlocks database
wsem database
monlock database

DESCRIPTION
Under certain circumstances it is possible that a Texis server will die and leave locks on a database. The severity of this will depend on the precise location that the error occurred. The worst case is that a lock is left on the database, in which case no other server will be able to access the database. In the best case where no locks are left around, it is still possible that an operating system semaphore will exist without being destroyed. This semaphore will be used by any other server that is using the database, though the existence of the semaphore can consume some system resources. If process if having problems getting a lock it will perform as much cleaning up as it can. This includes cleaning up locks left around by dead processes, as well as trying to free operating system resources.

The most common cause of stray locks is a program using the low level functions crashing with an open connection to the database. Both the Texis daemon and the tsql program make every attempt to free all locks when an unexpected error occurs. User programs can do the same by trapping all signals that would normally end in program termination and closing any open database connections. This process will typically make debugging more difficult, so in a development system one must look for stray locks more often.

When Texis accesses a database it will start a process which will monitor the semaphores and locks, and remove any stale semaphores or locks that it finds. This should prevent the need for running any of these programs manually.

After a system crash or Texis crash the following steps should be performed on each database. If any Texis servers are still running on that database allow them to complete. This will help ensure that there is no corruption in the database, as they will continue using the locks. Any processes accessing the database should be allowed to terminate or else be killed. This will allow a fresh restart.

The programs wsem and ltest can tell you how many open connections there are to the database. A larger number than expected suggests outstanding locks, and so rmlocks should be run. Both take an argument specifying the database to act upon. Wsem simply provides information, and does not try to do anything else to the locks. Wsem does need to obtain a lock to get information about the locks, so if wsem hangs that is a suggestion of a stale lock. ltest can be used in combination with ps to determine if the processes still exist.

Rmlocks tries to break all locks on a database. If there are no servers currently accessing the database then it will be returned to a pristine state. Rmlocks should be chmod u+s (set user id or setuid) to root to allow it to remove any semaphores that are lying around. If there are servers still trying to access the database then they should be allowed to finish before allowing new servers to start. The remaining servers should be able to negotiate a new semaphore to use to finish their operation. Once they have finished the database can once again be used by any server.

Monlock monitors the semaphore, and makes sure that it is not held by a process for an excessive period of time, which would be caused either by a process being stuck, or the OS not cleaning up after the process correctly.

Very rarely a server will get stuck in such a manner as to be unable to continue even after rmlocks has been run. In this case the server must be killed.