Oracle的高可用方案有很多,每种方案都有其特定,磁盘阵列+双击热备是比较经典的方案。但是主从热备可能更经济。一般主从热备都是通过复制主库的Redo日志,在从库上再执行一遍的原理。Oracle官方有Dataguard和GoldenGate 两种方案,今天我们着重研究一下Dataguard复制方案。

对Oracle高可用方案概念的理解参考这篇文章:

https://www.cnblogs.com/rusking/p/4736543.html

Oracle ADG

Dataguard的三种模式:

  1. 最大性能模式(Maximum Performance):就是保证主库,备库数据不保证。(默认这种模式)
  2. 最大可用模式(Maximum Availability):这里最大可用是指dataguard最大可用,不是指主库最大可用!
  3. 最大保护模式(Maximum Protection) :主库备库数据完全同步。
1
2
3
4
5
6
7
8
9
# 发现服务没有监听,可以用下面的方式试一试
# 参考:https://blog.csdn.net/liby_sunny/article/details/86544020
SQL>show parameter service_names
SQL>alter system register;

$>lsnrctl status

-- 查看数据库运行模式
select name, open_mode, database_role, protection_mode from v$database;

模式性能比较参考文章:

https://blog.csdn.net/Ruishine/article/details/120179212

常见初始化工作

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# 1. 删除历史归档日志 ++++++++++++++++++++++++++++
# 利用 rman 来处理
rman target sys/passwd@oracle_adg_ddd # 登录rman

list archivelog all; # 列出所有归档日志文件
list archivelog all completed before 'sysdate-1'; # 查询一天前日志

delete archivelog all completed before 'sysdate-1'; # 删除一天前日志
delete archivelog from time 'sysdate-7'; # 删除7天前到现在的所有日志
delete archivelog until time 'sysdate-1'; # 删除1天前的所有日志

crosscheck archivelog all; # 检查归档日志
delete expired archivelog all; # 删除所有失效的归档日志文件
# 如果日志爆了,用以上命令可能不起作用,可以使用下面的命令强行删除所有日志
delete noprompt force archivelog all;

# 查询redo_log文件状态
select * from v$log;
# 查询归档日志记录状态,会发现删除的归档日志deleted字段为YES
select * from v$archived_log;

# 如果不想在v$archived_log中看到过多的记录,可以在system模式下执行
# 谨慎使用,执行后 list archivelog all; 看到的是空。归档文件需要手工 rm 删除了。
SQL> execute sys.dbms_backup_restore.resetCfileSection(11); # 清理v$archived_log对应的记录
SQL> execute sys.dbms_backup_restore.resetCfileSection(28); # 清理v$rman_status对应的记录

Oracle ADG 主备配置

我们这里准备两台装好了Oracle 11gR2的服务器如下:

  • 172.17.121.4 fdoracleccc (主)
  • 172.17.121.5 fdoracleddd(备)
  1. 主备配置前准备工作
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# 两台服务器都关闭相关安全设置,比如防火墙和SELinux

# 主备都要加上 vi /etc/hosts
172.17.121.4    fdoracleccc
172.17.121.5    fdoracleddd

# /opt/oracle/product/11.2.0.4/dbhome_1/network/admin/tnsnames.ora 都要加上
oracle_adg_ccc =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = fdoracleccc)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = dspdb)
    )
  )

oracle_adg_ddd =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = fdoracleddd)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = dspdb)
    )
  )
  
# /opt/oracle/product/11.2.0.4/dbhome_1/network/admin/listener.ora
# 主备库中的服务器名不一样,比如fdoracleccc,fdoracleddd 
LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = fdoracleccc)(PORT = 1521))
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
    )
  )
SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC = (SID_NAME = dspdb)(ORACLE_HOME = /opt/oracle/product/11.2.0.4/dbhome_1))
  )
ADR_BASE_LISTENER = /opt/oracle

# add by cd.net on 20220601
# INBOUND_CONNECT_TIMEOUT_LISTENER = 180
DIAG_ADR_ENABLED_LISTENER = OFF

备库不需要实例,可以删除安装好的实例(也可以不删除,后面再处理)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 停掉备库oracle
su - oracle
SQL> shutdown immediate;
lsnrctl stop
# 查看备库实例
find $ORACLE_BASE/ -name $ORACLE_SID
/opt/oracle/admin/dspdb
/opt/oracle/diag/rdbms/dspdb
/opt/oracle/diag/rdbms/dspdb/dspdb
/opt/oracle/cfgtoollogs/dbca/dspdb
# 删除实例
dbca -silent -deleteDatabase -sourcedb orcl
  1. 主库配置
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
-- 主库设置成归档模式
su - oracle
sqlplus / as sysdba
SQL> shutdown immediate;
SQL> startup mount;
SQL> alter database archivelog;
SQL> alter database open;

-- 判断DG是否已安装
SQL> select name, log_mode from v$database;
-- 为true就是正常
SQL> select * from v$option where parameter = 'Oracle Data Guard';

-- 强制记录日志:
SQL> alter database force logging;
-- 检查状态(yes为强制):
SQL> select name,force_logging from v$database;
-- 查询redo log files文件大小(默认256M,3个)
SQL> select group#,bytes/1024/1024 as M from v$log;

-- 创建4个standby logfile:
alter database add standby logfile group 4 '/oradata/dspdb/standby04.log' size 256M;
alter database add standby logfile group 5 '/oradata/dspdb/standby05.log' size 256M;
alter database add standby logfile group 6 '/oradata/dspdb/standby06.log' size 256M;
alter database add standby logfile group 7 '/oradata/dspdb/standby07.log' size 256M;

-- 密码文件
-- 一般数据库默认就有密码文件,存放在$ORACLE_HOME/dbs/orapw+SID
-- 如果没有就手工创建:
orapwd file=$ORACLE_HOME/dbs/orapwdspdb password=xxx;

-- 密码文件远程传输到从库(保证主从密码文件一致)
-- 必须用oracle用户不能用root,不能用连接工具复制粘贴
scp orapwdspdb oracle@fdoracleddd:$ORACLE_HOME/dbs/orapwdspdb

-- 配置监听
-- tnsnames.ora 文件主备都一样
-- listener.ora 各自写自己的服务器名即可
-- 两台服务器都在oracle用户下,测试连通性,必须都通过才行
tnsping oracle_adg_ccc
tnsping oracle_adg_ddd

-- 数据库核心参数设置
-- 执行下面语句,找到文件$ORACLE_HOME/dbs/initdspdb.ora
SQL> create pfile from spfile;
-- 重点:
-- 修改这个文件的配置
-- vi initdspdb.ora
dspdb.__db_cache_size=45902462976
dspdb.__java_pool_size=402653184
dspdb.__large_pool_size=134217728
dspdb.__oracle_base='/opt/oracle'#ORACLE_BASE set from environment
dspdb.__pga_aggregate_target=16911433728
dspdb.__sga_target=50734301184
dspdb.__shared_io_pool_size=0
dspdb.__shared_pool_size=4026531840
dspdb.__streams_pool_size=0
*.audit_file_dest='/opt/oracle/admin/dspdb/adump'
*.audit_trail='db'
*.compatible='11.2.0.4.0'
*.control_files='/oradata/dspdb/control01.ctl','/oradata/dspdb/control02.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='dspdb'
*.diagnostic_dest='/opt/oracle'
*.java_jit_enabled=FALSE
*.log_archive_format='%t_%s_%r.dbf'
*.open_cursors=30000
*.pga_aggregate_target=16869490688
*.processes=3000
*.remote_login_passwordfile='EXCLUSIVE'
*.sessions=3305
*.sga_target=50608472064
*.undo_tablespace='UNDOTBS1'

# add by chende 20220627
*.fal_client='oracle_adg_ccc'
*.fal_server='oracle_adg_ddd'
*.db_unique_name='oracle_adg_ccc'
*.log_archive_config='dg_config=(oracle_adg_ccc,oracle_adg_ddd)'
*.log_archive_dest_1='LOCATION=/oradata/dspdb_arch valid_for=(all_logfiles,all_roles) 
db_unique_name=oracle_adg_ccc'
*.log_archive_dest_2='SERVICE=oracle_adg_ddd lgwr async valid_for=(online_logfile,primary_role) 
db_unique_name=oracle_adg_ddd'
*.log_archive_dest_state_1=enable
*.log_archive_dest_state_2=enable
*.log_archive_max_processes=4
*.standby_file_management='auto'
*.db_file_name_convert='/oradata/dspdb','/oradata/dspdb'
*.log_file_name_convert='/oradata/dspdb','/oradata/dspdb'

-- 主库应用上面的配置
SQL> shutdown immediate;
SQL> create spfile from pfile;
  1. 从库配置
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- 执行下面语句,找到文件$ORACLE_HOME/dbs/initdspdb.ora
sqlplus / as sysdba
SQL> create pfile from spfile;
-- 从库的 pfile 配置
dspdb.__db_cache_size=45902462976
dspdb.__java_pool_size=402653184
dspdb.__large_pool_size=134217728
dspdb.__oracle_base='/opt/oracle'#ORACLE_BASE set from environment
dspdb.__pga_aggregate_target=16911433728
dspdb.__sga_target=50734301184
dspdb.__shared_io_pool_size=0
dspdb.__shared_pool_size=4026531840
dspdb.__streams_pool_size=0
*.audit_file_dest='/opt/oracle/admin/dspdb/adump'
*.audit_trail='db'
*.compatible='11.2.0.4.0'
*.control_files='/oradata/dspdb/control01.ctl','/oradata/dspdb/control02.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='dspdb'
*.diagnostic_dest='/opt/oracle'
*.java_jit_enabled=FALSE
*.log_archive_format='%t_%s_%r.dbf'
*.open_cursors=30000
*.pga_aggregate_target=16869490688
*.processes=3000
*.remote_login_passwordfile='EXCLUSIVE'
*.sessions=3305
*.sga_target=50608472064
*.undo_tablespace='UNDOTBS1'

# add by chende 20220627
*.fal_client='oracle_adg_ddd'
*.fal_server='oracle_adg_ccc'
*.db_unique_name='oracle_adg_ddd'
*.log_archive_config='dg_config=(oracle_adg_ccc,oracle_adg_ddd)'
*.log_archive_dest_1='LOCATION=/oradata/dspdb_arch valid_for=(all_logfiles,all_roles) 
db_unique_name=oracle_adg_ddd'
*.log_archive_dest_2='SERVICE=oracle_adg_ccc lgwr async valid_for=(online_logfile,primary_role) 
db_unique_name=oracle_adg_ccc'
*.log_archive_dest_state_1=enable
*.log_archive_dest_state_2=enable
*.log_archive_max_processes=4
*.standby_file_management='auto'
*.db_file_name_convert='/oradata/dspdb','/oradata/dspdb'
*.log_file_name_convert='/oradata/dspdb','/oradata/dspdb'

-- 从库应用上面的配置
SQL> shutdown immediate;
SQL> create spfile from pfile;
  1. DG配置
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 重启备库
lsnrctl stop
lsnrctl start
sqlplus / as sysdba
SQL> shutdown immediate;
SQL> startup nomount;

# 重启主库
lsnrctl stop
lsnrctl start
sqlplus / as sysdba
SQL> shutdown immediate;
SQL> startup;

测试两台服务器是否能互相通信,前提是按照数据库时候密码是一样的。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 如果密码文件不一样,在主库手动设置密码,并将密码文件传给从服务器
orapwd file=$ORACLE_HOME/dbs/orapwdspdb password=xxx;
# 注意一定要用oracle用户,注意文件所属和读写权限
scp -P 22 orapwdspdb oracle@$ORACLE_HOME/dbs/

# 分别在主从两台服务器执行连通性测试
[oracle@fdoracleccc]$ sqlplus /nolog
conn sys/xxx@oracle_adg_ddd as sysdba
conn sys/xxx@oracle_adg_ddd as sysdba
# 上面两个都通才行

利用RMAN在备库上恢复主库(这个只在备库操作即可)

1
2
3
4
5
6
rman target sys/passwd@oracle_adg_ccc auxiliary sys/passwd@oracle_adg_ddd
duplicate target database for standby from active database nofilenamecheck;

-- 最后看到成功的提示
...
Finished Duplicate Db at 27-JUN-22

重启备库

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- 重启备库,设置可读不可写
SQL> shutdown immediate;
SQL> startup nomount;
SQL> alter database mount standby database;
SQL> alter database open read only;
SQL> alter database recover managed standby database using current logfile disconnect from session;

-- 上面一句是启动日志传输,如果想关闭日志传输执行
SQL> recover managed standby database cancel;
-- 下面这条语句开通同步,但是不是实时的,只能等到日志组切换的时候才会同步
alter database recover managed standby database disconnect from session;

常见操作

主库测试写入数据

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
create tablespace CHENDE logging datafile '/oradata/dspdb/chende01.dbf' size 10m autoextend \
on next 10m maxsize unlimited;
create user sdx identified by "passwd" default tablespace chende temporary tablespace temp; 
grant all privileges to sdx;

CREATE TABLE "SDX"."TL50_USERS" (
  "ID" NUMBER NOT NULL,
  "USER_ID" VARCHAR2(11 BYTE) NOT NULL,
  "STUNAME" VARCHAR2(50 BYTE) NOT NULL,
  "SEX" CHAR(1 BYTE) NOT NULL,
  "AGE" NUMBER(2,0) NOT NULL
)

select * from sdx.TL50_USERS order by id;
insert into sdx.tl50_users VALUES(1, '20220222', 'ccx' 'f', 1);

-- 手动切换日志组
alter system switch logfile;
alter system archive log current;
show parameter convert;

查询备库的常见语句:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- 查询数据库基本信息
select switchover_status,open_mode,protection_mode,protection_level from v$database;
select dbid,name,db_unique_name,primary_db_unique_name,database_role,controlfile_type,open_mode,
log_mode,remote_archive,archivelog_compression from v$database;

-- 检查主备两边的序号
select max(sequence#) from v$log;

-- standby 的状态查询
select pid,process,status,thread#,sequence#,blocks from v$managed_standby;

-- redolog 和 standbylog 大小状态比较
select group#,sequence#,members,bytes/1024/1024 MB,archived,status from v$log;
select group#,sequence#,thread#,bytes/1024/1024 MB,archived,status,last_time from v$standby_log;

-- archived 归档历史文件查询
select SEQUENCE#,name,FIRST_TIME,NEXT_TIME,APPLIED from v$archived_log order by 1;

-- 主备数据库信息
select dest_name,database_mode,recovery_mode,protection_mode,destination,db_unique_name 
from v$archive_dest_status where dest_id in(1,2);
-- 查看备库是否有如下进程
ps aux|grep mrp

-- 如果有报错,查看alert日志和log.xml日志
-- 日志文件目录:/opt/oracle/diag/rdbms/oracle_adg_ddd/dspdb/alert

-- 历史错误信息等
select * from v$dataguard_status;

启停ADG

主备启动(先备再主)

1
2
3
4
5
6
7
8
-- 先启备库
startup nomount
alter database mount standby database;
alter database open read only;
alter database recover managed standby database using current logfile disconnect from session;
        
-- 再启主库
sql>startup

主备停止(先主后备)

1
2
3
4
5
6
-- 1. 先关主库数据库:
sql>shutdown immediate

-- 2. 再关备用库:
sql>alter database recover managed standby database cancel;
sql>shutdown immediate;

常见问题

问题1:ADG配置文件差了项目

ORACLE error from auxiliary database: ORA-19527: physical standby redo log must be renamed
ORA-00312: online log 1 thread 1: '/oradata/dspdb/redo01.log'
RMAN-05535: WARNING: All redo log files were not defined properly.

出现上面的错误,在于ADG配置文件没有设置转换的目录关系,即使两机配置一样,也最好加上。

*.db_file_name_convert='/oradata/dspdb','/oradata/dspdb'
*.log_file_name_convert='/oradata/dspdb','/oradata/dspdb'

问题2:Standby logfile 文件大小和 redo logfile 不一致

standby logfile 文件是用在备库作为主库 redo logfile 的同步文件预留磁盘空间的。其数量推荐比主库redo logfile数量大1个;同时如果standby初始大小和主库redo logfile的大小不一致时,会出现主库的变化无法在备库实时反应出来,需要在主库手动切换日志组才可能更新:

1
2
3
-- 手动切换日志组
alter system switch logfile;
alter system archive log current;

重做备库standby logfile 文件,参考文章如下:

https://blog.csdn.net/weixin_41561862/article/details/104186255

记录操作步骤:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- 1. 主库强制同步到备库(切换日志组)
alter system switch logfile;
alter system archive log current;

-- 2. 关闭备库自动更新功能
alter database recover managed standby database cancel;

-- 3. 删除旧standby logfile
-- 如果备库无法删除 standby logfile,就先关闭重启一次
shutdown immediate;
startup nomount;
alter database mount standby database;
alter database open read only;
-- 执行删除
alter database drop standby logfile group 4;
alter database drop standby logfile group 5;
alter database drop standby logfile group 6;
alter database drop standby logfile group 7;

-- 4. 添加合适大小的standby logfiles(确保和主库redo logfile一样大小)
alter database add standby logfile thread 1 group 4 '/oradata/dspdb/standby04.log' size 512M reuse;
alter database add standby logfile thread 1 group 5 '/oradata/dspdb/standby05.log' size 512M reuse;
alter database add standby logfile thread 1 group 6 '/oradata/dspdb/standby06.log' size 512M reuse;
alter database add standby logfile thread 1 group 7 '/oradata/dspdb/standby07.log' size 512M reuse;

-- 5. 备库开启自动同步
alter database recover managed standby database disconnect from session;

配置正确的主备,在查询备库状态时候,有下图中红框的状态才算正确,否则都是有问题的。

image-20220511164530687

问题3:No standby redo logfiles available for thread 1

如果standby_log的状态都是UNASSIGNED,应该是配置错误了

参考:

https://blog.csdn.net/weixin_41561862/article/details/104186255

问题4:主备库的归档日志文件定期删除

可以考虑在主备课中加一个定时任务,每天执行一次,删除中间的归档日志文件,防止磁盘挤爆。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 比如加一个 deletearch.sh 的文件。然后加入crontab中,每天执行一次。

#!/bin/bash
source ~/.bash_profile
rman target / nocatalog <<!
run{
delete noprompt archivelog all completed before 'sysdate-1'; 
}
exit;
!

ADG主备切换

参考:https://www.cnblogs.com/nathon-wang/p/13391145.html

写在最后

搭建Oracle ADG可能会遇到各种不同问题,在网上一般都能找到答案,下面列举一些参考文章,写的都很好,供参考。

参考:

https://blog.csdn.net/rod0320/article/details/121631609

https://blog.csdn.net/weixin_43901067/article/details/105552585

https://blog.csdn.net/hailang08/article/details/122944859

备库状态查询:

https://www.cnblogs.com/xwg168/p/14452560.html

备库数据不完整,需要重新同步主库完整数据:

https://cdn.modb.pro/db/51676

(完)