Acme Sprockets

LinkedInTwitterRSS

Oracle Grid Infrastructure 11.2.0.2 installer erroneously reported that the storage devices were not available on all nodes

During a recent install of Oracle Grid Infrastructure 11.2.0.2 on Linux x64, the installer found the LUNs as candidate disks, then later erroneously reported that the storage devices were not available on all nodes while running the final checks before the actual installation/configuration. Here are the steps I followed to ensure that Oracle/ASMLIB could in fact access the devices from all nodes. This allowed me to (mostly) confidently ignore the installer "error" and proceed with the installation which then succeeded.

The environment was four 8-core x86_64 (Xeon) systems running RHEL 5.5. The servers were inter-connected with Infiniband. Database Storage was an HP EVA 4400 SAN with FibreChannel connection to each server. At the time of the install, I had presented two 500G RAID5 LUNs to the servers (with plans to present many more after I got the initial configuration working), and configured them under Linux multi-path.

Specific software versions: RHEL 5.5 (2.6.18-194.el5) with oracleasm-2.6.18-194.el5-2.0.5-1.el5.x86_64.rpm, oracleasmlib-2.0.4-1.el5.x86_64.rpm, oracleasm-support-2.1.4-1.el5.x86_64.rpm, and Oracle 11.2.0.2.

My plan was to use Oracle's ASMLIB to manage/mark the disks on Linux and then ASM to manage all database storage.

After installing ASMLIB on all server, I partitioned, then marked the two intial LUNs via ASMLIB as follows:

[root@linux-rac1 /]# /etc/init.d/oracleasm createdisk DISK001 /dev/mapper/36001438006489b680001700000840000p1
Marking disk "DISK001" as an ASM disk: [ OK ]
[root@linux-rac1 /]# /etc/init.d/oracleasm createdisk DISK002 /dev/mapper/36001438006489b680001700000930000p1
Marking disk "DISK002" as an ASM disk: [ OK ]

Note that the Oracle clusterware and Metalink docs specify that the drives should be partitioned, though I do have another system with similar hardware where I use the raw LUNs (not partitioned) very successfully.

Then, I verified that they were visible on the other three nodes. For example:

[root@linux-rac2 /]# /etc/init.d/oracleasm listdisks
DISK001
DISK002

While running the Oracle 11.2.0.2 Grid Infrastructure installer, the installer was finding the two LUNs as candidates (ORCL:DISK001, ORCL:DISK002). I selected them with external redundancy and proceeded. However, the later automated final checks step in the installer reported those same disks as unavailable. The error message included:

Device Checks for ASM - This is a pre-check to verify if the specified devices meet the requirements for configuration through the Oracle Universal Storage Manager Configuration Assistant.
Operation Failed on Nodes: [linux-rac4,  linux-rac3,  linux-rac2,  linux-rac1]  List of errors:
 - 
Could not get the type of storage  - Cause: Cause Of Problem Not Available  - Action: User Action Not Available
 - 
Could not get the type of storage  - Cause: Cause Of Problem Not Available  - Action: User Action Not Available

Verification result of failed node: linux-rac4

. . . 

Naturally I opened a level 2 SR with Oracle since my install was blocked on the "error". Unfortunately Oracle support was not of much help.

I confirmed that the "oracle" user did in fact have access to the two LUNs on all four RAC nodes. I did this by looking at /dev/oracleasm/disks, using the "status" and "listdisks" commands to /usr/sbin/oracleasm, and also doing a "dd" read from the devices as "oracle" on all four nodes.

For example (on all four nodes):

[root@linux-rac1 /]# id oracle
uid=501(oracle) gid=501(oracle) groups=501(oracle),1001(dba)
[root@linux-rac1 /]# ls -l /dev/oracleasm/disks/
total 0
brw-rw---- 1 oracle oracle 253, 5 May 5 10:39 DISK001
brw-rw---- 1 oracle oracle 253, 4 May 5 10:39 DISK002
[root@linux-rac1 /]# /usr/sbin/oracleasm status
Checking if ASM is loaded: yes
Checking if /dev/oracleasm is mounted: yes
[root@linux-rac1 /]# /usr/sbin/oracleasm querydisk DISK001 DISK002
Disk "DISK001" is a valid ASM disk
Disk "DISK002" is a valid ASM disk

And (again, on all four RAC nodes) as user "oracle":

[oracle@linux-rac2 ~]$ dd if=/dev/oracleasm/disks/DISK001 of=/dev/null bs=1024k count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.168156 seconds, 62.4 MB/s
[oracle@linux-rac2 ~]$ dd if=/dev/oracleasm/disks/DISK002 of=/dev/null bs=1024k count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.145556 seconds, 72.0 MB/s

Ultimately, since these manual checks succeeded on all four nodes I decided to ignore the "error" and proceed with the installation. It completed successfully and my subsequent RAC install and RAC database creation worked just fine.

If you're    ready for a zombie apocalypse, then you're ready for any emergency.    emergency.cdc.gov