During a recent install of Oracle Grid Infrastructure 11.2.0.2 on Linux x64, the installer found the LUNs as candidate disks, then later erroneously reported that the storage devices were not available on all nodes while running the final checks before the actual installation/configuration. Here are the steps I followed to ensure that Oracle/ASMLIB could in fact access the devices from all nodes. This allowed me to (mostly) confidently ignore the installer "error" and proceed with the installation which then succeeded.
The environment was four 8-core x86_64 (Xeon) systems running RHEL 5.5. The servers were inter-connected with Infiniband. Database Storage was an HP EVA 4400 SAN with FibreChannel connection to each server. At the time of the install, I had presented two 500G RAID5 LUNs to the servers (with plans to present many more after I got the initial configuration working), and configured them under Linux multi-path.
Specific software versions: RHEL 5.5 (2.6.18-194.el5) with oracleasm-2.6.18-194.el5-2.0.5-1.el5.x86_64.rpm, oracleasmlib-2.0.4-1.el5.x86_64.rpm, oracleasm-support-2.1.4-1.el5.x86_64.rpm, and Oracle 11.2.0.2.
My plan was to use Oracle's ASMLIB to manage/mark the disks on Linux and then ASM to manage all database storage.
After installing ASMLIB on all server, I partitioned, then marked the two intial LUNs via ASMLIB as follows:
[root@linux-rac1 /]# /etc/init.d/oracleasm createdisk DISK001 /dev/mapper/36001438006489b680001700000840000p1 Marking disk "DISK001" as an ASM disk: [ OK ] [root@linux-rac1 /]# /etc/init.d/oracleasm createdisk DISK002 /dev/mapper/36001438006489b680001700000930000p1 Marking disk "DISK002" as an ASM disk: [ OK ]
Note that the Oracle clusterware and Metalink docs specify that the drives should be partitioned, though I do have another system with similar hardware where I use the raw LUNs (not partitioned) very successfully.
Then, I verified that they were visible on the other three nodes. For example:
[root@linux-rac2 /]# /etc/init.d/oracleasm listdisks DISK001 DISK002
While running the Oracle 11.2.0.2 Grid Infrastructure installer, the installer was finding the two LUNs as candidates (ORCL:DISK001, ORCL:DISK002). I selected them with external redundancy and proceeded. However, the later automated final checks step in the installer reported those same disks as unavailable. The error message included:
Device Checks for ASM - This is a pre-check to verify if the specified devices meet the requirements for configuration through the Oracle Universal Storage Manager Configuration Assistant. Operation Failed on Nodes: [linux-rac4, linux-rac3, linux-rac2, linux-rac1] List of errors: - Could not get the type of storage - Cause: Cause Of Problem Not Available - Action: User Action Not Available - Could not get the type of storage - Cause: Cause Of Problem Not Available - Action: User Action Not Available Verification result of failed node: linux-rac4 . . .
Naturally I opened a level 2 SR with Oracle since my install was blocked on the "error". Unfortunately Oracle support was not of much help.
I confirmed that the "oracle" user did in fact have access to the two LUNs on all four RAC nodes. I did this by looking at /dev/oracleasm/disks, using the "status" and "listdisks" commands to /usr/sbin/oracleasm, and also doing a "dd" read from the devices as "oracle" on all four nodes.
For example (on all four nodes):
[root@linux-rac1 /]# id oracle uid=501(oracle) gid=501(oracle) groups=501(oracle),1001(dba) [root@linux-rac1 /]# ls -l /dev/oracleasm/disks/ total 0 brw-rw---- 1 oracle oracle 253, 5 May 5 10:39 DISK001 brw-rw---- 1 oracle oracle 253, 4 May 5 10:39 DISK002 [root@linux-rac1 /]# /usr/sbin/oracleasm status Checking if ASM is loaded: yes Checking if /dev/oracleasm is mounted: yes [root@linux-rac1 /]# /usr/sbin/oracleasm querydisk DISK001 DISK002 Disk "DISK001" is a valid ASM disk Disk "DISK002" is a valid ASM disk
And (again, on all four RAC nodes) as user "oracle":
[oracle@linux-rac2 ~]$ dd if=/dev/oracleasm/disks/DISK001 of=/dev/null bs=1024k count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.168156 seconds, 62.4 MB/s [oracle@linux-rac2 ~]$ dd if=/dev/oracleasm/disks/DISK002 of=/dev/null bs=1024k count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.145556 seconds, 72.0 MB/s
Ultimately, since these manual checks succeeded on all four nodes I decided to ignore the "error" and proceed with the installation. It completed successfully and my subsequent RAC install and RAC database creation worked just fine.