Error building OpenMPI with intel compiler: (2024)

Hello Varsha,

thanks for the suggestion. Here's what I got.

On HPC Cluster, normal queue:

As per your suggestion, ran the following through normal queue

I_MPI_DEBUG=30 FI_LOG_LEVEL=debug mpirun -n 2 -ppn 1 ./hello_mpi >out.txt 

The error channel output is as follows:

Loading compiler version 2021.1.1
Loading tbb version 2021.1.1
Loading debugger version 10.0.0
Loading compiler-rt version 2021.1.1
Loading dpl version 2021.1.1
Loading oclfpga version 2021.1.1
Loading init_opencl version 2021.1.1
Warning: Intel PAC device is not found.
Please install the Intel PAC card to execute your program on an FPGA device.
Warning: Intel PAC device is not found.
Please install the Intel PAC card to execute your program on an FPGA device.

Loading compiler/2021.1.1
Loading requirement: tbb/latest debugger/latest compiler-rt/latest dpl/latest
/opt/intel/oneapi/compiler/2021.1.1/linux/lib/oclfpga/modulefiles/init_opencl /opt/intel/oneapi/compiler/2021.1.1/linux/lib/oclfpga/modul\
efiles/oclfpga
Loading mpi version 2021.1.1
Currently Loaded Modulefiles:
1) tbb/latest
2) debugger/latest
3) compiler-rt/latest
4) dpl/latest
5) /opt/intel/oneapi/compiler/2021.1.1/linux/lib/oclfpga/modulefiles/init_opencl
6) /opt/intel/oneapi/compiler/2021.1.1/linux/lib/oclfpga/modulefiles/oclfpga
7) compiler/2021.1.1

mpi/2021.1.1
libfabric:28642:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:28642:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:28642:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ZE not supported
libfabric:28642:core:mr:ofi_default_cache_size():69<info> default cache size=4223952597
libfabric:14757:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:14757:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:14757:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ZE not supported
libfabric:14757:core:mr:ofi_default_cache_size():69<info> default cache size=4223952597

libfabric:28642:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:28642:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:28642:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ZE not supported
libfabric:28642:core:mr:ofi_default_cache_size():69<info> default cache size=4223952597
libfabric:14757:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_CUDA not supported
libfabric:14757:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ROCR not supported
libfabric:14757:core:core:ofi_hmem_init():202<info> Hmem iface FI_HMEM_ZE not supported
libfabric:14757:core:mr:ofi_default_cache_size():69<info> default cache size=4223952597
libfabric:28642:verbs:fabric:verbs_devs_print():871<info> list of verbs devices found for FI_EP_MSG:
libfabric:28642:verbs:fabric:verbs_devs_print():875<info> #1 mlx5_0 - IPoIB addresses:
libfabric:28642:verbs:fabric:verbs_devs_print():885<info> 192.168.2.2
libfabric:28642:verbs:fabric:verbs_devs_print():885<info> fe80::63f:7203:ae:a922
libfabric:14757:verbs:fabric:verbs_devs_print():871<info> list of verbs devices found for FI_EP_MSG:
libfabric:14757:verbs:fabric:verbs_devs_print():875<info> #1 mlx5_0 - IPoIB addresses:
libfabric:14757:verbs:fabric:verbs_devs_print():885<info> 192.168.2.3
libfabric:14757:verbs:fabric:verbs_devs_print():885<info> fe80::63f:7203:ae:a93e
libfabric:28642:verbs:fabric:vrb_get_device_attrs():617<info> device mlx5_0: first found active port is 1
libfabric:14757:verbs:fabric:vrb_get_device_attrs():617<info> device mlx5_0: first found active port is 1
libfabric:28642:verbs:fabric:vrb_get_device_attrs():617<info> device mlx5_0: first found active port is 1
libfabric:28642:core:core:ofi_register_provider():427<info> registering provider: verbs (111.0)
libfabric:14757:verbs:fabric:vrb_get_device_attrs():617<info> device mlx5_0: first found active port is 1
libfabric:14757:core:core:ofi_register_provider():427<info> registering provider: verbs (111.0)
libfabric:14757:core:core:ofi_register_provider():427<info> registering provider: tcp (111.0)
libfabric:28642:core:core:ofi_register_provider():427<info> registering provider: tcp (111.0)
libfabric:14757:core:core:ofi_register_provider():427<info> registering provider: sockets (111.0)
libfabric:28642:core:core:ofi_register_provider():427<info> registering provider: sockets (111.0)
libfabric:14757:core:core:ofi_register_provider():427<info> registering provider: shm (111.0)
libfabric:28642:core:core:ofi_register_provider():427<info> registering provider: shm (111.0)
libfabric:28642:core:core:ofi_register_provider():427<info> registering provider: ofi_rxm (111.0)
libfabric:14757:core:core:ofi_register_provider():427<info> registering provider: ofi_rxm (111.0)
libfabric:14757:core:core:ofi_register_provider():427<info> registering provider: mlx (1.4)
libfabric:28642:core:core:ofi_register_provider():427<info> registering provider: mlx (1.4)
libfabric:14757:core:core:ofi_register_provider():427<info> registering provider: ofi_hook_noop (111.0)
libfabric:14757:core:core:fi_getinfo_():1117<info> Found provider with the highest priority mlx, must_use_util_prov = 0

ibfabric:14757:mlx:core:mlx_getinfo():172<info> used inject size = 1024
libfabric:14757:mlx:core:mlx_getinfo():219<info> Loaded MLX version 1.10.0
libfabric:14757:mlx:core:mlx_getinfo():266<warn> MLX: spawn support 0
libfabric:14757:core:core:fi_getinfo_():1144<info> Since mlx can be used, verbs has been skipped. To use verbs, please, set FI_PROVIDER=verbs
libfabric:14757:core:core:fi_getinfo_():1144<info> Since mlx can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
libfabric:14757:core:core:fi_getinfo_():1144<info> Since mlx can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=s\
ockets
libfabric:14757:core:core:fi_getinfo_():1144<info> Since mlx can be used, shm has been skipped. To use shm, please, set FI_PROVIDER=shm
libfabric:14757:core:core:fi_getinfo_():1117<info> Found provider with the highest priority mlx, must_use_util_prov = 0
libfabric:14757:mlx:core:mlx_getinfo():172<info> used inject size = 1024
libfabric:14757:mlx:core:mlx_getinfo():219<info> Loaded MLX version 1.10.0
libfabric:14757:mlx:core:mlx_getinfo():266<warn> MLX: spawn support 0
libfabric:14757:core:core:fi_getinfo_():1144<info> Since mlx can be used, verbs has been skipped. To use verbs, please, set FI_PROVIDER=verbs
libfabric:14757:core:core:fi_getinfo_():1144<info> Since mlx can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
libfabric:14757:core:core:fi_getinfo_():1144<info> Since mlx can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=s\
ockets
libfabric:14757:core:core:fi_getinfo_():1144<info> Since mlx can be used, shm has been skipped. To use shm, please, set FI_PROVIDER=shm
libfabric:14757:mlx:core:mlx_fabric_open():172<info>
libfabric:14757:core:core:fi_fabric_():1397<info> Opened fabric: mlx
libfabric:14757:mlx:core:ofi_check_rx_attr():785<info> Tx only caps ignored in Rx caps
libfabric:14757:mlx:core:ofi_check_tx_attr():883<info> Rx only caps ignored in Tx caps
libfabric:28642:core:core:ofi_register_provider():427<info> registering provider: ofi_hook_noop (111.0)
libfabric:28642:core:core:fi_getinfo_():1117<info> Found provider with the highest priority mlx, must_use_util_prov = 0
libfabric:28642:mlx:core:mlx_getinfo():172<info> used inject size = 1024
libfabric:28642:mlx:core:mlx_getinfo():219<info> Loaded MLX version 1.10.0
libfabric:28642:mlx:core:mlx_getinfo():266<warn> MLX: spawn support 0
libfabric:28642:core:core:fi_getinfo_():1144<info> Since mlx can be used, verbs has been skipped. To use verbs, please, set FI_PROVIDER=verbs
libfabric:28642:core:core:fi_getinfo_():1144<info> Since mlx can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
libfabric:28642:core:core:fi_getinfo_():1144<info> Since mlx can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=s\
ockets
libfabric:28642:core:core:fi_getinfo_():1144<info> Since mlx can be used, shm has been skipped. To use shm, please, set FI_PROVIDER=shm
libfabric:28642:core:core:fi_getinfo_():1117<info> Found provider with the highest priority mlx, must_use_util_prov = 0
libfabric:28642:mlx:core:mlx_getinfo():172<info> used inject size = 1024

libfabric:28642:mlx:core:mlx_getinfo():219<info> Loaded MLX version 1.10.0
libfabric:28642:mlx:core:mlx_getinfo():266<warn> MLX: spawn support 0
libfabric:28642:core:core:fi_getinfo_():1144<info> Since mlx can be used, verbs has been skipped. To use verbs, please, set FI_PROVIDER=verbs
libfabric:28642:core:core:fi_getinfo_():1144<info> Since mlx can be used, tcp has been skipped. To use tcp, please, set FI_PROVIDER=tcp
libfabric:28642:core:core:fi_getinfo_():1144<info> Since mlx can be used, sockets has been skipped. To use sockets, please, set FI_PROVIDER=s\
ockets
libfabric:28642:core:core:fi_getinfo_():1144<info> Since mlx can be used, shm has been skipped. To use shm, please, set FI_PROVIDER=shm
libfabric:28642:mlx:core:mlx_fabric_open():172<info>
libfabric:28642:core:core:fi_fabric_():1397<info> Opened fabric: mlx
libfabric:28642:mlx:core:ofi_check_rx_attr():785<info> Tx only caps ignored in Rx caps
libfabric:28642:mlx:core:ofi_check_tx_attr():883<info> Rx only caps ignored in Tx caps
libfabric:14757:mlx:core:ofi_check_rx_attr():785<info> Tx only caps ignored in Rx caps
libfabric:14757:mlx:core:ofi_check_tx_attr():883<info> Rx only caps ignored in Tx caps
libfabric:28642:mlx:core:ofi_check_rx_attr():785<info> Tx only caps ignored in Rx caps
libfabric:28642:mlx:core:ofi_check_tx_attr():883<info> Rx only caps ignored in Tx caps
libfabric:14757:mlx:core:mlx_cm_getname_mlx_format():73<info> Loaded UCP address: [307]...
libfabric:28642:mlx:core:mlx_cm_getname_mlx_format():73<info> Loaded UCP address: [307]...
libfabric:28642:mlx:core:mlx_av_insert():179<warn> Try to insert address #0, offset=0 (size=2) fi_addr=0x20eff80
libfabric:14757:mlx:core:mlx_av_insert():179<warn> Try to insert address #0, offset=0 (size=2) fi_addr=0x1b0cfa0
libfabric:14757:mlx:core:mlx_av_insert():189<warn> address inserted
libfabric:14757:mlx:core:mlx_av_insert():179<warn> Try to insert address #1, offset=1024 (size=2) fi_addr=0x1b0cfa0
libfabric:14757:mlx:core:mlx_av_insert():189<warn> address inserted
libfabric:28642:mlx:core:mlx_av_insert():189<warn> address inserted
libfabric:28642:mlx:core:mlx_av_insert():179<warn> Try to insert address #1, offset=1024 (size=2) fi_addr=0x20eff80
libfabric:28642:mlx:core:mlx_av_insert():189<warn> address inserted
[LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file or directory
[LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET manual.
[LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file or directory
[LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET manual.

and it writes these to console that I have redirected to a file.

[0] MPI startup(): Intel(R) MPI Library, Version 2021.1 Build 20201112 (id: b9c9d2fc5)
[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): Size of shared memory segment (857 MB per rank) * (2 local ranks) = 1715 MB total
[0] MPI startup(): libfabric version: 1.11.0-impi
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): detected mlx provider, set device name to "mlx"
[0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1
[0] MPI startup(): addrnamelen: 1024
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.1.1/etc/tuning_skx_shm-ofi.dat"
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 153103 hpcvisualization {0,1,2,3,4,5,6,7,8,9,20,21,22,23,24,25,26,27,28,29}
[0] MPI startup(): 1 153104 hpcvisualization {10,11,12,13,14,15,16,17,18,19,30,31,32,33,34,35,36,37,38,39}
[0] MPI startup(): I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.1.1
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=30
Hello World from rank: 1 of 2 total ranks
Hello World from rank: 0 of 2 total ranks

The following run with

mpirun -check_mpi -n 2 -ppn 1 ./hello_mpi 

produces the following on error channel

Loading compiler version 2021.1.1
Loading tbb version 2021.1.1
Loading debugger version 10.0.0
Loading compiler-rt version 2021.1.1
Loading dpl version 2021.1.1
Loading oclfpga version 2021.1.1
Loading init_opencl version 2021.1.1
Warning: Intel PAC device is not found.
Please install the Intel PAC card to execute your program on an FPGA device.
Warning: Intel PAC device is not found.
Please install the Intel PAC card to execute your program on an FPGA device.

Loading compiler/2021.1.1
Loading requirement: tbb/latest debugger/latest compiler-rt/latest dpl/latest
/opt/intel/oneapi/compiler/2021.1.1/linux/lib/oclfpga/modulefiles/init_opencl /opt/intel/oneapi/compiler/2021.1.1/linux/lib/oclfpga/modul\
efiles/oclfpga
Loading mpi version 2021.1.1
Currently Loaded Modulefiles:
1) tbb/latest
2) debugger/latest
3) compiler-rt/latest
4) dpl/latest
5) /opt/intel/oneapi/compiler/2021.1.1/linux/lib/oclfpga/modulefiles/init_opencl
6) /opt/intel/oneapi/compiler/2021.1.1/linux/lib/oclfpga/modulefiles/oclfpga
7) compiler/2021.1.1

mpi/2021.1.1
[LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file or directory
[LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET manual.
[LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file or directory
[LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET manual.

[0] INFO: CHECK LOCAL:EXIT:SIGNAL ON
[0] INFO: CHECK LOCAL:EXIT:BEFORE_MPI_FINALIZE ON
[0] INFO: CHECK LOCAL:MPI:CALL_FAILED ON
[0] INFO: CHECK LOCAL:MEMORY:OVERLAP ON
[0] INFO: CHECK LOCAL:MEMORY:ILLEGAL_MODIFICATION ON
[0] INFO: CHECK LOCAL:MEMORY:INACCESSIBLE ON
[0] INFO: CHECK LOCAL:MEMORY:ILLEGAL_ACCESS OFF
[0] INFO: CHECK LOCAL:MEMORY:INITIALIZATION OFF
[0] INFO: CHECK LOCAL:REQUEST:ILLEGAL_CALL ON
[0] INFO: CHECK LOCAL:REQUEST:NOT_FREED ON
[0] INFO: CHECK LOCAL:REQUEST:PREMATURE_FREE ON
[0] INFO: CHECK LOCAL:DATATYPE:NOT_FREED ON
[0] INFO: CHECK LOCAL:BUFFER:INSUFFICIENT_BUFFER ON
[0] INFO: CHECK GLOBAL:DEADLOCK:HARD ON
[0] INFO: CHECK GLOBAL:DEADLOCK:POTENTIAL ON
[0] INFO: CHECK GLOBAL:DEADLOCK:NO_PROGRESS ON
[0] INFO: CHECK GLOBAL:MSG:DATATYPE:MISMATCH ON
[0] INFO: CHECK GLOBAL:MSG:DATA_TRANSMISSION_CORRUPTED ON
[0] INFO: CHECK GLOBAL:MSG:PENDING ON
[0] INFO: CHECK GLOBAL:COLLECTIVE:DATATYPE:MISMATCH ON
[0] INFO: CHECK GLOBAL:COLLECTIVE:DATA_TRANSMISSION_CORRUPTED ON
[0] INFO: CHECK GLOBAL:COLLECTIVE:OPERATION_MISMATCH ON
[0] INFO: CHECK GLOBAL:COLLECTIVE:SIZE_MISMATCH ON
[0] INFO: CHECK GLOBAL:COLLECTIVE:REDUCTION_OPERATION_MISMATCH ON
[0] INFO: CHECK GLOBAL:COLLECTIVE:ROOT_MISMATCH ON
[0] INFO: CHECK GLOBAL:COLLECTIVE:INVALID_PARAMETER ON
[0] INFO: CHECK GLOBAL:COLLECTIVE:COMM_FREE_MISMATCH ON
[0] INFO: maximum number of errors before aborting: CHECK-MAX-ERRORS 1
[0] INFO: maximum number of reports before aborting: CHECK-MAX-REPORTS 0 (= unlimited)
[0] INFO: maximum number of times each error is reported: CHECK-SUPPRESSION-LIMIT 10
[0] INFO: timeout for deadlock detection: DEADLOCK-TIMEOUT 60s
[0] INFO: timeout for deadlock warning: DEADLOCK-WARNING 300s
[0] INFO: maximum number of reported pending messages: CHECK-MAX-PENDING 20

[0] INFO: Error checking completed without finding any problems.

And it produces the following relevant output.

Hello World from rank: 0 of 2 total ranks
Hello World from rank: 1 of 2 total ranks

I don't know what changed between earlier runs and today's but even regular run with

mpirun -n 2 -ppn 1 ./hello_mpi > out.txt

also produces correct output.

Hello World from rank: 1 of 2 total ranks
Hello World from rank: 0 of 2 total ranks

but prints the following on error channel. Let me know if this may cause issue in some other tun.

[LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file or directory
[LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET manual.
[LOG_CAT_SBGP] libnuma.so: cannot open shared object file: No such file or directory
[LOG_CAT_SBGP] Failed to dlopen libnuma.so. Fallback to GROUP_BY_SOCKET manual.

I will report results on "the workstation where original problem exists" with these same step a later taoday.

Thanks and Regards,

Keyur

Error building OpenMPI with intel compiler: (2024)

FAQs

How do you check whether Openmpi is installed or not? ›

The ompi_info(1) command can be used to check the status of your Open MPI installation (located in $prefix/bin/ompi_info ). Running it with no arguments provides a summary of information about your Open MPI installation.

Is Intel MPI based on Mpich? ›

Intel MPI is an implementation based on MPICH that is optimized for Intel processors and integrates with other Intel tools (e.g., compilers and performance tools such as VTune).

Where should OpenMPI be installed? ›

With the -- prefix option given, OpenMPI binaries are installed in the directory /usr/local/bin and shared libraries in /usr/local/lib. If you want a different installation location, replace /usr/local with your desired directory.

What is the difference between OpenMP and OpenMPI? ›

OpenMPI provides API calls such as MPI_Send and MPI_Recv to allow communication between computation nodes. Unlike OpenMP, here each computational unit has to send its results to a master and it manually compiles and aggregates the final result.

Which is better, OpenMPI or MPICH? ›

If it is your desktop either is fine. OpenMPI comes out of the box on Macbooks, and MPICH seems to be more Linux/Valgrind friendly. It is between you and your toolchain. If it is a production cluster you need to do more extensive benchmarking to make sure it is optimized to your network topology.

What is the difference between OpenMP and MPICH? ›

With MPI, each process has its own memory space and executes independently from the other processes. With OpenMP, threads share the same resources and access shared memory. Processes exchange data by passing messages to each other. There is no notion of message-passing.

How to use intel MPI? ›

To launch programs linked with the Intel MPI Library, use the mpiexec command: $ mpiexec -n <# of processes> ./myprog Use the -n option to set the number of processes. This is the only obligatory option for the mpiexec command.

How to find OpenMPI directory? ›

Using apt-get install ? If that is the case, you should see the Open MPI files in these directories /usr/include/openmpi/ and /usr/lib/openmpi/lib .

How do I check my MPI version? ›

You can view the currently installed version of Platform MPI 7.0 (former HP MPI) in several ways:
  1. %$MPI_ROOT/bin/mpirun -version.
  2. (on HP-UX) % swlist -l product | grep “HP MPI”
  3. (on Linux) % rpm -qa | grep “hpmpi”

How to remove OpenMPI completely? ›

Additionally, if you installed Open MPI into a tree by itself, you can simply "rm -rf" the whole tree. You can always re-run "make install" to re-install it.

Where is MPI installed Windows? ›

The default installation location is C:\Cygwin. Second, unzip the MPI files into your main Cygwin directory, also keeping the directory structure from the zip file.

Top Articles
Latest Posts
Article information

Author: Greg Kuvalis

Last Updated:

Views: 5866

Rating: 4.4 / 5 (55 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Greg Kuvalis

Birthday: 1996-12-20

Address: 53157 Trantow Inlet, Townemouth, FL 92564-0267

Phone: +68218650356656

Job: IT Representative

Hobby: Knitting, Amateur radio, Skiing, Running, Mountain biking, Slacklining, Electronics

Introduction: My name is Greg Kuvalis, I am a witty, spotless, beautiful, charming, delightful, thankful, beautiful person who loves writing and wants to share my knowledge and understanding with you.