안녕하세요.
CUBRID 8.2.2 HA 구성하여 사용 중 웹 서버 접속이 갑자기 증가되는 경우 데이터베이스가 정상적으로 동작하지 않는 현상이 있었습니다.
현재 구성 구조는 broker 서버 2대와 master, slave 서버 각각 1대를 운영 중이며, broker 서버 접속이 안될 경우
웹 프로그램에서 broker 예비 서버로 연결되도록 구성했습니다.
SNMP 그래프를 보면 broker 서버의 평상시 인바운드/아웃바운드 트래픽이 동일한 그래프를 보였으나
접속이 몰리는 당시 인바운드 트래픽은 증가했으나 아웃바운드는 평상시 정도의 수준으로 기록되었습니다.
예비 broker 서버로 일부 커넥션이 유입되어 인바운드/아웃바운드 트래픽이 기록되었습니다.
master 서버의 경우 접속이 몰린 시점에 아웃바운드 트래픽이 일시적으로 증가되었다가 평상시의 절반도 미치지 않는
상태로 감소했으며 아웃바운드 트래픽은 4~5배 정도 증가했습니다.
slave 서버의 경우 일시적으로 아웃바운드 트래픽이 증가했으나 이후 인바운드/아웃바운드가 감소하여 정상 동작이
잘 되지 않는 상태였습니다.
당시에 slave 서버 접속이 원활치 못했습니다.
broker의 MAX_NUM_APPL_SERVER의 갯수 및 master/slave 서버의 max_connection 갯수가 부족하여 해당 현상이 발생하는 것으로 생각되어 현재는 시스템이 버틸 수 있는 한도내로 기존 대비 3배수 정도 늘려둔 상태입니다.
정확한 원인을 분석할 수 없어 고견을 들려주시길 부탁드립니다.
해당 시간대의 각 서버별 로그는 다음과 같습니다.
[broker]
(1) RW broker log
Time: 10/11/12 18:17:11.780 - ERROR *** ERROR CODE = -191, Tran = -1, EID = 1
Cannot connect to server "DATABASE" on "db-master".
Time: 10/11/12 18:17:11.781 - ERROR *** ERROR CODE = -677, Tran = -1, EID = 2
Failed to connect to database server, 'DATABASE', on the following host(s): db-slave
Time: 10/11/12 18:17:11.781 - ERROR *** ERROR CODE = -677, Tran = -1, EID = 3
Failed to connect to database server, 'DATABASE', on the following host(s): db-master:db-slave
(2) RO broker log
Time: 10/11/12 18:05:54.569 - ERROR *** ERROR CODE = -743, Tran = -1, EID = 1
Failed on handshake between client and server. (peer host db-master)
Time: 10/11/12 18:25:55.767 - ERROR *** ERROR CODE = -199, Tran = 104, EID = 2
Server no longer responding.... Operation now in progress
Time: 10/11/12 18:25:55.767 - ERROR *** ERROR CODE = -224, Tran = 104, EID = 3
A database has not been restarted.
[예비 broker]
(1) RW broker log
- 없음
(2) RO broker log
Time: 02/08/12 11:50:30.319 - ERROR *** ERROR CODE = -669, Tran = -1, EID = 1
Server refused client connection : max clients, (50), exceeded.
Time: 02/08/12 11:50:30.321 - ERROR *** ERROR CODE = -669, Tran = -1, EID = 2
Server refused client connection : max clients, (50), exceeded.
Time: 02/08/12 11:50:30.321 - ERROR *** ERROR CODE = -677, Tran = -1, EID = 3
Failed to connect to database server, 'DATABASE', on the following host(s): db-master:db-slave
Time: 02/08/12 11:50:30.376 - ERROR *** ERROR CODE = -669, Tran = -1, EID = 4
Server refused client connection : max clients, (50), exceeded.
Time: 02/08/12 11:50:30.377 - ERROR *** ERROR CODE = -669, Tran = -1, EID = 5
Server refused client connection : max clients, (50), exceeded.
[MASTER DATABASE]
Time: 10/11/12 15:30:06.387 - ERROR *** ERROR CODE = -452, Tran = 5, CLIENT = db-broker1:broker_DATABASE_rw_cub_cas_2(1270), EID = 110
Invalid XASL tree node content.
Time: 10/11/12 15:40:31.184 - ERROR *** ERROR CODE = -72, Tran = 9, CLIENT = db-broker1:broker_DATABASE_rw_cub_cas_6(1274), EID = 111
Your transaction (index 9, CUBRIDUSER@db-broker1|1274) has been unilaterally aborted by the system.
Time: 10/11/12 15:40:32.187 - ERROR *** ERROR CODE = -72, Tran = 10, CLIENT = db-broker1:broker_DATABASE_rw_cub_cas_8(1278), EID = 112
Your transaction (index 10,CUBRIDUSER@db-broker1|1278) has been unilaterally aborted by the system.
Time: 10/11/12 15:40:33.190 - ERROR *** ERROR CODE = -72, Tran = 3, CLIENT = db-broker1:broker_DATABASE_rw_cub_cas_1(1269), EID = 113
Your transaction (index 3, CUBRIDUSER@db-broker1|1269) has been unilaterally aborted by the system.
- 반복 -
Time: 10/11/12 18:00:21.155 - ERROR *** ERROR CODE = -72, Tran = 9, CLIENT = db-broker1:broker_DATABASE_rw_cub_cas_6(1274), EID = 137
Your transaction (index 9, CUBRIDUSER@db-broker1|1274) has been unilaterally aborted by the system.
Time: 10/11/12 18:00:22.158 - ERROR *** ERROR CODE = -72, Tran = 3, CLIENT = db-broker1:broker_DATABASE_rw_cub_cas_1(1269), EID = 138
Your transaction (index 3, CUBRIDUSER@db-broker1|1269) has been unilaterally aborted by the system.
Time: 10/11/12 18:05:21.246 - ERROR *** ERROR CODE = -72, Tran = 3, CLIENT = db-broker1:broker_DATABASE_rw_cub_cas_1(1269), EID = 139
Your transaction (index 3, CUBRIDUSER@db-broker1|1269) has been unilaterally aborted by the system.
Time: 10/11/12 18:12:36.236 - ERROR *** ERROR CODE = -72, Tran = 967, CLIENT = db-broker1:broker_DATABASE_rw_cub_cas_354(2506), EID = 140
Your transaction (index 967, CUBRIDUSER@db-broker1|2506) has been unilaterally aborted by the system.
Time: 10/11/12 18:12:37.241 - ERROR *** ERROR CODE = -72, Tran = 920, CLIENT = db-broker1:broker_DATABASE_rw_cub_cas_307(2203), EID = 141
Your transaction (index 920, CUBRIDUSER@db-broker1|2203) has been unilaterally aborted by the system.
Time: 10/11/12 18:17:10.046 - ERROR *** ERROR CODE = -836, Tran = 194, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_1174(32701), EID = 142
LATCH ON PAGE(32766|0) TIMEDOUT
Time: 10/11/12 18:17:10.046 - ERROR *** ERROR CODE = -836, Tran = 194, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_1174(32701), EID = 142
LATCH ON PAGE(32766|0) TIMEDOUT
Time: 10/11/12 18:17:10.046 - ERROR *** ERROR CODE = -72, Tran = 194, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_1174(32701), EID = 143
Your transaction (index 194, CUBRIDUSER@db-broker1|32701) has been unilaterally aborted by the system.
Time: 10/11/12 18:17:10.046 - ERROR *** ERROR CODE = -836, Tran = 383, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_1351(595), EID = 144
LATCH ON PAGE(32766|0) TIMEDOUT
Time: 10/11/12 18:17:10.046 - ERROR *** ERROR CODE = -72, Tran = 383, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_1351(595), EID = 145
Your transaction (index 383, CUBRIDUSER@db-broker1|595) has been unilaterally aborted by the system.
Time: 10/11/12 18:17:10.047 - ERROR *** ERROR CODE = -836, Tran = 601, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_1527(943), EID = 146
LATCH ON PAGE(32766|0) TIMEDOUT
[SLAVE DATABASE]
Time: 10/11/12 16:39:16.788 - ERROR *** ERROR CODE = -452, Tran = 12, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_2(1330), EID = 3
Invalid XASL tree node content.
Time: 10/11/12 18:08:15.321 - ERROR *** ERROR CODE = -707, Tran = 525, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_523(31299), EID = 4
Unable to expand temporary volume "/home/cubrid/cubrid822/databases/DATABASE/DATABASE_t32766" with 50 pages
Time: 10/11/12 18:08:31.427 - ERROR *** ERROR CODE = -707, Tran = 810, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_808(31770), EID = 5
Unable to expand temporary volume "/home/cubrid/cubrid822/databases/DATABASE/DATABASE_t32765" with 50 pages
Time: 10/11/12 18:09:02.866 - ERROR *** ERROR CODE = -707, Tran = 433, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_431(31147), EID = 6
Unable to expand temporary volume "/home/cubrid/cubrid822/databases/DATABASE/DATABASE_t32764" with 50 pages
Time: 10/11/12 18:09:42.275 - ERROR *** ERROR CODE = -707, Tran = 492, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_490(31248), EID = 7
Unable to expand temporary volume "/home/cubrid/cubrid822/databases/DATABASE/DATABASE_t32763" with 50 pages
Time: 10/11/12 18:14:41.045 - ERROR *** ERROR CODE = -836, Tran = 127, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_125(30643), EID = 8
LATCH ON PAGE(32765|0) TIMEDOUT
Time: 10/11/12 18:14:41.045 - ERROR *** ERROR CODE = -72, Tran = 127, CLIENT = db-broker1:broker_DATABASE_ro_cub_cas_125(30643), EID = 9
Your transaction (index 127, CUBRIDUSER@db-broker1|30643) has been unilaterally aborted by the system.
제가 전문가는 아니라서 아는 것만 일단 적고.. 나머지는 잘 아시는분이 부연하기를..
1)
http://www.cubrid.com/online_manual/841/pm/pm_db_classify_connect.htm 를 참고하시어 cubrid.conf에 있는 max_client 값을 조정해주셔야 합니다. broker에서 처리하는 연결수를 고려하여 조정하여 주시기 바랍니다.
2)
양쪽다 temp 볼륨을 충분한 크기로 추가하여 주세요.
http://www.cubrid.com/online_manual/841/admin/admin_db_addvol.htm 에 예제가 나와 있습니다.