[00:45:22] *** Quits: guerby (~guerby@april/board/guerby) (Ping timeout: 256 seconds) [02:12:50] *** Joins: guerby (~guerby@ip165.tetaneutral.net) [02:12:50] *** Quits: guerby (~guerby@ip165.tetaneutral.net) (Changing host) [02:12:50] *** Joins: guerby (~guerby@april/board/guerby) [02:35:09] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [04:22:33] *** Quits: nKumar (uid239884@gateway/web/irccloud.com/x-rapzmbkxpihrofww) (Ping timeout: 264 seconds) [06:18:20] *** Joins: nKumar (uid239884@gateway/web/irccloud.com/x-ksohefcqbvwdpchq) [07:57:46] *** Joins: lhodev (~Adium@66-90-218-190.dyn.grandenetworks.net) [10:27:29] *** Quits: guerby (~guerby@april/board/guerby) (Ping timeout: 256 seconds) [10:29:02] *** Joins: guerby (~guerby@april/board/guerby) [11:40:44] *** Joins: EdR_ (d8f01e19@gateway/web/freenode/ip.216.240.30.25) [11:41:44] This may be a long shot, but I'm hoping somebody has seen this issue on FC25 and FC26. I'm trying to diagnose connection issues and am seeing this: [11:41:48] [edwinr@ssan-rx2560-01:20]~/Download/librdmacm/examples(master)> sudo ./rping -s -C 10 -v rdma_create_event_channel: No such device [11:42:03] [edwinr@ssan-rx2560-01:20]~/Download/librdmacm/examples(master)> sudo ./rping -s -C 10 -v [11:42:21] rdma_create_event_channel: No such device [11:42:50] I'v started the rdma service and disabled the firewall and selinux [11:42:53] does ibv_devinfo show any devices? [11:43:30] ibstat does [11:44:08] ifconfig shows the ports up and I can use ping across the 2 nodes [11:44:56] you'll need ibv_devinfo to show your devices - this lists which RDMA devices are available for use from userspace [11:45:43] you may want to check that you have all of the necessary kernel modules loaded - in spdk, doc/nvmf.md shows the list [11:45:51] I ran modprobe mlx4_ib mlx4_core mlx4_en ib_core ib_umad ib_ucm ib_uverbs ib_cm rdma_cm rdma_ucm nvme_fabrics nvme_rdma [11:48:47] well that looks like the full list [11:51:17] ibv_devinfo is returning "No IB devices found" while ibstat shows the port with link type Ethernet [11:53:16] I'd like to use IPoIB [12:01:43] hmmm - you said you've seen this issue on FC25 and FC26 - do you see it working on other Linux distros or FC versions? [12:07:38] This is one of 3 machines we're reconfiguring to be able to run the spdk autotest scripts. They had RHEL7.2 installed and were able to communicate - we used spdk tools like identify and perf to connect to an nvmf_tgt app remotely. We put FC25 on 2 of them, and FC26 on the third so that we align with the CI machines Intel has running. [12:08:49] I'm thinking something changed with FC25+ that I'm missing. [12:09:44] I switched the ports to IB mode and ibv_devinfo still says "No IB devices found" [12:16:27] Yes, there is a Mellanox card installed: 81:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro] [12:18:42] do you have Mellanox OFED installed? wondering if this is an incompatibility between Fedora ibv packages and the Mellanox driver [12:18:54] i'd say hold off until drv (daniel) chimes in [12:24:16] I'd recommend using the ibverbs stack/drivers shipped by Fedora rather than installing the Mellanox OFED [12:24:37] is that we're doing in the test pool? [12:24:41] yes [12:24:48] cool [12:25:21] and if you want to try to run exactly what's on our test pool, you can run autotest.sh (or one of the sub-scripts) - they will load all the required kernel drivers in the right order [12:26:24] oh - that might be the problem [12:26:35] ed - looks like you have the mlx drivers first in your modprobe line [12:26:43] our script loads them last [12:27:15] I'm not sure why it doesn't work if you load them in a different order, but we've definitely seen cases where it matters [12:33:06] I checked on a RHEL7.4 machine, and ibv_devinfo returns "hca_id: mlx4_0 ..." [12:36:56] could we run unit tests on the FreeBSD system? [12:37:54] i'm writing unit tests for this new sock_group abstraction which would test our kqueue implementation on FreeBSD [12:40:47] unit tests should be running on FreeBSD unless somebody turned it off [12:41:22] looks off if I'm reading the log correctly [12:41:39] you're right, not sure how that is happening [12:41:52] SPDK_TEST_UNITTEST is not set in autorun-spdk.conf on that machine [12:42:23] never mind, they are running - i just misread the log [12:42:50] the part at the top of scripts/autotest_common.sh is a bit confusing - the "export" for each variable is after its value [12:43:12] ah [12:43:27] side effect of how 'set -x' works - it doesn't show the variable name in the assignment we're doing [13:07:26] Eureka! The missing lib is libmlx4 :) [13:07:43] excellent :) [13:08:10] *** Parts: lhodev (~Adium@66-90-218-190.dyn.grandenetworks.net) () [13:10:09] On RHEL7, libmlx4 is part of libibverbs [13:15:23] *** Joins: lhodev (~Adium@66-90-218-190.dyn.grandenetworks.net) [13:57:17] *** Quits: ChanServ (ChanServ@services.) (*.net *.split) [14:31:37] *** Joins: ChanServ (ChanServ@services.) [14:31:38] *** wolfe.freenode.net sets mode: +o ChanServ [14:56:52] *** ChanServ sets mode: +o peluse [15:06:24] *** Joins: EdR__ (43ed56b6@gateway/web/freenode/ip.67.237.86.182) [15:06:59] Are there known issues with RDMA loopback on RHEL7.4 ? [15:19:12] EdR__: I think we had some NICs where RDMA loopback didn't work unless a cable was physically plugged in [15:31:35] darsto: have you seen changpe1's reply on https://review.gerrithub.io/#/c/386546/ ? Is it something we can resolve after we merge this? [15:43:16] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [15:44:45] jimharris: it looks like the part of autobuild.sh that tries to verify that Makefile dependencies work is causing a 20-second rebuild time - maybe we could choose something with less deps [15:45:08] seems to be different across machines; some are much faster than that [15:46:43] sounds good to me - will you or sethhowe look at it or would you like me to? [15:46:54] I'm not currently working on it (just added some timing markers) [15:47:04] so whoever wants to work on it is welcome [15:47:19] worst-case example: https://ci.spdk.io/spdk/builds/review/8869e4eef4edac4902c6d3853bd016cafc343f29.1518129392/fedora-01/timing.svg [15:47:34] it might actually be because we are rebuilding without scan-build, which causes a full rebuild [15:47:58] (only fedora-01 is doing scan-build, so that's why it's much longer) [15:56:41] I can take a deeper look into it. [15:57:17] it might be as simple as adding $scanbuild in front of the $MAKE command in the dependency check section [15:57:37] although that will re-run scan-build, which we don't really want either [15:58:21] we could probably skip the dependency check if scan-build is enabled [15:58:38] true. running scanbuild would be worse [15:59:04] I think this will hit the 'make install' test as well - if we skip the dependency check, then the 'make install' step will cause a full rebuild instead [15:59:14] so we should probably skip both of those if scan-build is enabled [15:59:43] That sounds good to me. [16:00:04] either way, it's not the end of the world, since it only adds up to ~30 seconds on the scan-build machine, so maybe we should just leave it alone [16:00:33] or just don't run the dependency check on all systems? [16:00:54] oh - i'm guessing the scan-build system is not the long pole in the tent currently [16:01:15] It's not. [16:02:47] the dependency check thing hasn't failed in a long time; we could just remove it [16:10:04] *** Quits: EdR__ (43ed56b6@gateway/web/freenode/ip.67.237.86.182) (Ping timeout: 260 seconds) [17:17:41] *** Joins: VKon (cf8c2b51@gateway/web/freenode/ip.207.140.43.81)