[00:01:08] *** Quits: mszwed (~mszwed@134.134.139.78) (Ping timeout: 240 seconds) [01:32:13] *** Joins: mszwed (~mszwed@134.134.139.76) [01:35:24] *** Joins: mszwed_ (~mszwed@192.55.54.45) [01:36:40] *** Quits: mszwed (~mszwed@134.134.139.76) (Ping timeout: 260 seconds) [01:47:17] *** Quits: kjakimia (~kjakimia@192.55.54.44) (*.net *.split) [02:31:17] *** Joins: kjakimia (~kjakimia@192.55.54.44) [07:00:53] *** Joins: nKumar (uid239884@gateway/web/irccloud.com/x-dowraasvwtuqhfdf) [07:08:10] *** Quits: mszwed_ (~mszwed@192.55.54.45) (Ping timeout: 276 seconds) [07:22:05] *** Joins: karan (402f1639@gateway/web/freenode/ip.64.47.22.57) [07:26:02] *** Quits: karan (402f1639@gateway/web/freenode/ip.64.47.22.57) (Client Quit) [07:26:46] *** Joins: karan (402f1639@gateway/web/freenode/ip.64.47.22.57) [07:31:45] Hi Everyone, I have a question about the iSCSI target in SPDK. It appears to me like an iSCSI target can only run on one core - that means that all connections to that target are handled by one CPU core. This is a huge bottleneck. In my experiments on a 100Gbit network, I cannot exceed ~2.5GB/sec because of this limitation. To get around it I have to add more iSCSI targets. Now each of my lun is own its own iSCSI target. This is very incon [07:32:25] I am refering to this function: spdk_iscsi_conn_get_migrate_event in file: lib/iscsi/conn.c [07:33:04] A comment here reads: /** * There are other active connections for this target node. * Ignore the lcore specified by the allocator and use the * the target node's lcore to ensure this connection runs on * the same lcore as other connections for this target node. */ [07:34:48] *** Joins: mszwed (~mszwed@192.55.55.41) [08:12:07] *** Joins: mszwed_ (~mszwed@134.134.139.75) [08:12:07] *** Quits: mszwed (~mszwed@192.55.55.41) (Remote host closed the connection) [08:37:22] another blob question: is there a maximum number of spdk_blobs that can all be open at the same time? [09:02:31] *** Joins: pzedlews_ (~pzedlews@109241096052.gdansk.vectranet.pl) [09:19:36] *** Quits: pzedlews_ (~pzedlews@109241096052.gdansk.vectranet.pl) (Remote host closed the connection) [09:46:50] nKumar, so bwalker might be on later with a more definitive answer but looking through the code I see nothing other than memory resources that would prevent you from opening as many as you were able to create. Are you getting an error of some kind on open? [09:49:06] nKumar: peluse is correct - it's limited only by available memory [09:49:22] * peluse high fives himself [09:52:25] karan: all of the connections for one TargetNode must be processed on the same core, but you can have lots of TargetNodes inside of one iSCSI target [09:52:38] each device is usually a separate TargetNode [09:53:01] @peluse @bwalker, was just checking. Thanks for the insight! [09:53:19] the reason they have to be all on the same core is that there is a ton of shared state between related connections in iSCSI [09:53:25] so you either do it all on one core or you take locks [09:53:47] and it's way faster to do it all on one core, especially given that iSCSI tends to have very few connections per TargetNode (unlike NVMe-oF) [09:54:16] removing that shared state between related connections is one of the primary advantages of NVMe-oF's design [09:54:22] relative to iSCSI [10:01:12] @bwalker thanks for that. [10:18:50] drv, thx for merging that UT patch! BTW whacky GH labeled all the others merge conflict even though there wasn't any and they were one long chain, maybe because that one was in the middle of the chain or something. Anyway, I rebased the 1st and no conflict, when I pushed it removed the conflict tag from all the others too so at least it got that right :) [10:19:42] peluse: hmm, looks like some of the "UNIT TEST" ones are back somehow [10:21:39] oh geeze, this is never going to end is it? :) [10:22:09] maybe they will all change back again once they go through CI, stranger things have happened [10:23:02] if not, after they're all done, I'll run though them and fix the commit msgs for a 3rd time, ugh [10:43:35] dang it, it looks like the old commit msgs are not going away after all. I've got an appt here soon but when I get back I'll clobber my local repo as maybe that's where it's getting this history from. Either way, will get them all back up to snuff on the commit msgs before EOD.. [10:45:14] *** Joins: mszwed (~mszwed@134.134.139.75) [10:45:14] *** Quits: mszwed_ (~mszwed@134.134.139.75) (Remote host closed the connection) [11:03:02] *** Joins: lhodev (~Adium@inet-hqmc07-o.oracle.com) [11:18:12] *** Parts: lhodev (~Adium@inet-hqmc07-o.oracle.com) () [11:18:25] *** Joins: lhodev (~Adium@inet-hqmc07-o.oracle.com) [11:54:40] *** Parts: lhodev (~Adium@inet-hqmc07-o.oracle.com) () [11:54:49] *** Joins: lhodev (~Adium@inet-hqmc07-o.oracle.com) [12:07:19] *** Joins: ppelplin_ (~ppelplin@192.55.54.44) [12:15:36] *** Quits: mszwed (~mszwed@134.134.139.75) (Ping timeout: 240 seconds) [12:15:36] *** ppelplin_ is now known as mszwed [12:58:48] Is there any way to attach a remote NVMf controller in a secondary process? I'm trying to run the nvme/identify example as a secondary process to query a remote NVMf device and it is segfaulting trying to lock the controller just before calling the attach_cb [13:00:38] that's a good question and I don't know off the top of my head [13:00:51] there is all sorts of complexity there that I need to think about [13:01:16] let me dig through the code [13:02:10] Looking at the code it looks like it may be currently impossible because the function which allocates the RDMA transport controller object uses calloc() instead of a DPDK/memzone allocation function like the PCIe transport does [13:02:34] yeah - there are probably a bunch of things like that [13:02:46] those are all easy fixes - the part that I'm concerned about is how memory registration would work [13:02:52] with the RDMA NIC [13:04:47] are you using the controller from both the primary and the secondary? [13:04:52] or do you only want to use it from one or the other? [13:05:03] I only need it in the secondary process [13:05:10] ok, that may save us [13:10:54] so your code first probes in the primary process [13:10:59] but you say no to all of the devices there [13:11:00] right? [13:14:58] right now I'm saying yes because I thought I had to [13:15:38] in the primary process call spdk_nvme-probe(NULL, NULL, probe_cb, attach_cb, remove_cb), and implement probe_cb to always return false [13:15:48] then, the primary won't create any NVMe controllers [13:16:02] but it will initialize the necessary driver structures in shared memory [13:16:28] ok, I will give that a try, thanks [13:16:29] then in the secondary, call spdk_nvme_probe again, providing the transport ID for the NVMf device [13:17:05] that second probe should create the nvmf controller [13:17:11] using calloc, but it's created in the correct process [13:17:18] so I think it will get you past the issue [13:17:35] no guarantees you don't hit more - you are the first I'm aware of trying to combine NVMf and multiprocess [13:27:53] ok, I did what you suggested and in the secondary process, I give the discovery NQN and the probe_cb gets called with the NQN of the controller, but the attach_cb does not get called [13:28:34] that seems to imply that SPDK is making the connection to the target but it isn't getting added to the attached device list [13:28:35] the probe_cb in the secondary process returns true, right? [13:28:49] the probe_cb in the primary must return false, but the probe_cb in the secondary must return true [13:28:54] yes [13:29:04] ok, let me look at what may be happening [13:31:05] just to confirm - if you run our identify example connecting to the same target with the same discovery NQN, but as the primary [13:31:07] it all works, right? [13:33:27] and you don't see any error messages either, right? [13:34:50] I see the problem actually [13:34:58] let me see what I can do about it [13:41:25] *** Parts: lhodev (~Adium@inet-hqmc07-o.oracle.com) () [13:43:29] *** Joins: lhodev (~Adium@inet-hqmc07-o.oracle.com) [13:44:50] patrickmacarthur: do you need to make any SPDK calls from the primary? If you don't, I may have a quick hack to make this work [13:48:04] Pausing the build pool for a few minutes to reconfigure some of the machines. [14:00:36] SPDK calls, not at the moment [14:02:39] ok, then try this [14:02:49] in lib/env_dpdk/env.c [14:02:57] change spdk_process_is_primary to always return true [14:03:07] then delete the call to spdk_nvme_probe from your primary process [14:03:42] if you only make SPDK calls from the secondary, I think that may fix it [14:12:41] bwalker: that makes it work [14:13:03] great [14:13:11] I'll think about a real fix, but at least you are unblocked [14:13:51] the problem was that we didn't account for the case where a device was initially discovered by the secondary process [14:38:39] *** Quits: lhodev (~Adium@inet-hqmc07-o.oracle.com) (Quit: Leaving.) [15:01:57] *** Quits: sethhowe (sethhowe@nat/intel/x-dsgadnhurjodckdv) (Remote host closed the connection) [15:02:12] *** Joins: sethhowe (~sethhowe@192.55.55.41) [15:07:19] *** Joins: lhodev (~Adium@inet-hqmc01-o.oracle.com) [18:26:22] *** Quits: lhodev (~Adium@inet-hqmc01-o.oracle.com) (Quit: Leaving.) [18:59:30] Hi and , just tried the latest SPDK code on github. I started the SPDK example "nvme_manage" as the primary process (-i 0) and keep it there. Then started the SPDK "nvmf_tgt" as the secondary process (-i 0) with one subsystem. Later I used the native nvme discover/connect command, it works there. Also used SPDK clone of nvme, the connect command works. The SPDK "identify" tool failed as the "calloc" issue mentioned earlier. For [18:59:31] the primary process, it returns "ture" for all the devices in the probe_cb() and attach them all. As Ben mentioned, so far the secondary process can not attach new process. Not sure whether this meets your usage of NVMe-oF and multiprocess. [19:00:55] a typo there, the secondary process can not probe/attach new controller [20:22:32] *** Quits: nKumar (uid239884@gateway/web/irccloud.com/x-dowraasvwtuqhfdf) (Quit: Connection closed for inactivity) [23:54:54] *** Joins: mszwed_ (~mszwed@192.55.54.44)