[00:43:40] *** Joins: tkulasek_ (~tkulasek@192.55.54.45) [02:01:20] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 268 seconds) [02:02:16] *** Joins: dlw (~Thunderbi@114.255.44.139) [03:46:46] *** Quits: dlw (~Thunderbi@114.255.44.139) (Ping timeout: 260 seconds) [03:47:06] *** Joins: dlw (~Thunderbi@114.255.44.143) [04:39:44] *** Joins: pohly (~pohly@p5484976F.dip0.t-ipconnect.de) [04:46:33] Can someone help me understand how SPDK uses hugepages? When I set up the system with HUGEMEM=2048 (the default, 2GB), how many instances of app/vhost can I start before I run out of huge pages? Two, right? That's because vhost has 1GB as default for the -s parameter. [04:46:38] How does the size affect runtime performance? [04:47:03] Or in other words, what determines how much memory is needed? [04:47:14] Malloc BDevs come to mind. What else? [04:51:35] When using vhost-user-scsi-pci in QEMU, the virtual machine must also use huge pages. It comes out of the same pool. Is it possible to run QEMU with 1GB and vhost with something a lot smaller? [04:57:08] *** Joins: lyan (~lyan@2605:a000:160e:3c8:4a4d:7eff:fef2:eea3) [05:02:51] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 240 seconds) [05:20:55] Testing shows that "-s 64" leads to "RING: Cannot reserve memory"... [05:21:50] 128MB: bdev.c: 630:spdk_bdev_initialize: *ERROR*: create rbuf small pool failed [06:10:15] good morning! can i get CI (re-)started please for https://review.gerrithub.io/#/c/spdk/spdk/+/416879/ [08:03:17] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-cdqaqyuxqmctnjmp) [08:19:49] pohly: spdk doesn't need much memory. 1GB is usually much more than required [08:20:50] generally you need some extra memory for each bdev or target you have [08:20:50] darsto: it works for me with 256MB. 128MB was not enough. I haven't tried anything between those two values. [08:20:50] and then there are mallocs ofc [08:21:50] This is just for testing, so my Malloc Bdevs are as small as possible (1MB). [08:21:50] DPDK 18.05 offers dynamic memory allocation, so once we integrate that into SPDK, you will be able to start vhost with -s 0 [08:22:01] and then it will allocate as many hugepages as it's required [08:22:21] Ah, good to know. [08:22:51] the error you're getting refers to a fixed-size I/O pool in SPDK [08:23:21] I think there was some work recently to make that I/O pool size configurable via RPC [08:24:05] but I'm not familiar with it [08:24:58] and the number of hugepages doesn't impact runtime performance in any noticeable way [08:26:32] speaking of dpdk 18.05, drv: could you look into this patch? https://review.gerrithub.io/c/spdk/spdk/+/416992/1 [08:27:23] apparently I'm hitting the assert from spdk_nvme_ctrlr_cmd_get_log_page, it tries to allocate 8k on TP [08:29:24] so far we hardcode the number of log pages retrieved (?) and it's always 8k, so we could just align the buffer to 8k and safely assume it's contiguous, right? [08:29:28] 8k bytes, I mean [08:38:31] darsto: most log pages are 4K, I think, but the Get Log Page command can retrieve much larger buffers than that (the size is represented as a 32-bit number of Dwords) [08:39:13] but Get Log Page can use an arbitrary PRP list, so it should be okay to have discontiguous physical memory for that [08:39:29] the PRP building code should already be stepping through and building PRP entries one 4K page at a time [08:40:59] that should apply for anything using nvme_allocate_request_user_copy(), in fact - the only things that need to be physically contiguous are the PRP lists themselves (nvme_tracker) and the SQ/CQ entries [08:44:01] so we allocate that spdk_dma_zmalloc() temporary buffer just because the original buffer may not be dma-able? [08:45:31] right, the user_copy helper is only used for non-fast-path admin commands so that the user doesn't have to provide a dma-able buffer [08:47:02] I don't think we can limit the user_copy buffer size to 4K, but I also don't think it should matter (assuming spdk_vtophys() returns the right physical address/IOVA on each 4K page) [08:48:33] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-cdqaqyuxqmctnjmp) (Ping timeout: 268 seconds) [08:48:33] then we have a lot of outdated documentation in there [08:48:39] e.g. nvme_payload->contig_or_cb_arg [08:49:38] yes, the original assumption was contig == physically contiguous, but now it should be OK to just require virtually contiguous as long as vtophys is doing the right thing [08:50:03] nvme_pcie_prp_list_append() is where the magic should happen [08:51:37] I see [08:52:04] can I ask you to reword those variables/docs? [08:53:35] yes, I'll take a look [08:54:05] I think the Create I/O {SQ,CQ} case is already special cased and doesn't use nvme_payload anyway [08:55:05] yes, looks like nvme_pcie_ctrlr_cmd_create_io_cq() fills out the PRP1 field directly [09:00:41] darsto: posted https://review.gerrithub.io/#/c/spdk/spdk/+/417045 - let me know if you see any other misleading comments [09:05:47] hey drv, any idea whom can i ask to poke CI for https://review.gerrithub.io/#/c/spdk/spdk/+/416878/ and https://review.gerrithub.io/#/c/spdk/spdk/+/416879/? [09:06:11] philipp-sk: I approved them just a bit ago - they should be in the queue now [09:06:13] you can see the queue at https://ci.spdk.io/spdk/status/ [09:06:54] is nvme_pcie_qpair_build_contig_request() being split as well? [09:07:11] great, thanks for the link drv [09:07:41] darsto: that calls the prp_list_append which does the splitting [09:07:42] it steps one ctrlr->page_size at a time, which should normally be 4K [09:08:17] yeah - so it doesn't really matter if the buffer is contiguous or not [09:08:21] right [09:09:27] we did originally use the assumption that the whole buffer was physically contiguous, but that requirement was removed some time ago [09:10:42] now that I think about it, the SGL payload type probably needs equivalent splitting (each SGL element is currently assumed to be physically contiguous) [09:10:43] but the "contig" path should be fine [09:11:20] the doc string for spdk_nvme_req_next_sge_cb does say that the segment must be physically contiguous [09:11:43] so if the user is following the rules, that should be OK, but it sounds like that might not be easily possible in DPDK 18.05+ [09:12:13] the NVMe I have doesn't support HW SGL [09:12:28] yeah, the only Intel devices that support it that I know of are the DC D3x00 (not P3x00) series [09:29:28] does bdev_nvme make any use of HW SGLs? [09:29:33] I don't see it [09:30:58] bdev_nvme calls spdk_nvme_ns_cmd_readv/writev, which can translate into HW SGLs if the NVMe device supports it [09:31:56] I see [09:45:27] sethhowe: looks like fedora-06 is down for some reason [09:47:00] drv thx, will look into it. [09:49:29] drv: It's back up now. Dirty shared folder. [09:52:12] thanks [10:12:13] dpdk got a new webpage layout [10:12:28] looks nice [10:19:56] that does look way better [11:18:55] I'm attempting, for the first time, to create an issue on GitHub. Seems rather straightforward except how to add a label to it. Per https://help.github.com/articles/applying-labels-to-issues-and-pull-requests/ one is to select a checkbox next to the items you want to apply a label. However, my issue has no such checkbox (nor do I see checkboxes on any of the issues). What am I missing? [11:19:22] only admins add labels [11:19:44] you don't need to label it - does the documentation say you need to? [11:20:51] Thanks @bwalker. Sure would've been nice if that was documented as such on github's help page (eye-roll). I didn't read any explicit requirement to add a label. I was just trying to be helpful. No good deed goes unpunished ;-) [11:25:16] *** Joins: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) [11:26:17] drv: getting back to NVMe - do we want to split those SGEs that arent physically contiguous? [11:26:17] I think we should [11:26:47] k, let me craft something [11:26:52] that could be fairly complicated, but it is necessary for correctness if DPDK is going to provide non-physically-contiguous buffers from rte_malloc [11:30:34] hmm SPDK_ERRLOG("multi-element SGL currently not supported for RDMA\n"); [11:31:31] yeah we just added support in the target [11:31:50] but the host hasn't needed it previously [11:56:31] *** Quits: tkulasek_ (~tkulasek@192.55.54.45) (Ping timeout: 268 seconds) [12:12:54] I pushed some patches for pcie, but will hold on with rdma [12:14:09] rdma doesn't allocate any trackers and managing those seems complicated when looking at pcie [12:35:04] RDMA should not care about physically contiguous buffers, at least as far as I understand the ibverbs API [12:37:57] yeah RDMA doesn't need them to be physically contiguous [12:40:21] but we should add multiple SGL entry support to it [12:41:52] well, it's the same scenario as with VFIO iovas [12:42:37] we could be passing a buffer that was registered with multiple ibv_reg_mr calls [12:43:23] can rte_malloc() end up returning pieces from multiple DPDK memory regions? [12:43:53] that's what I've been experiencing [12:43:53] the env_dpdk/memory.c code that calls the notifier that eventually calls ibv_reg_mr() coalesces virtually contiguous regions, so it should be OK as long as the DPDK allocations don't span regions [12:43:53] hmm [12:44:24] that would be a problem if it can happen, yes [12:45:24] the NVMe tracker list was discontiguous in my case. I was getting syslog errors from IOMMU about unpriviledged DMA at addr 0 [12:45:54] and that tracker list comes from a single spdk_dma_zmalloc [12:46:25] i did actually check, and the memory was not physically contiguous [12:47:25] iova-contiguous, I mean [12:50:27] hmm, the key thing is whether it was from a single memory_hotplug_cb, though [12:50:57] I'm not sure if a single hotplug CB can have a non-iova-contiguous region? [13:00:36] let me look through the DPDK code [13:25:26] an area from a single hotplug CB will always have iova-contiguous region [13:26:51] *** Quits: lyan (~lyan@2605:a000:160e:3c8:4a4d:7eff:fef2:eea3) (Ping timeout: 265 seconds) [13:53:39] *** Joins: lyan (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) [13:55:43] *** Joins: travis-ci (~travis-ci@ec2-54-205-136-216.compute-1.amazonaws.com) [13:55:44] (spdk/master) lib: return error in case of start_fn is null in spdk_app_start() (Leonid Ravich) [13:55:44] Diff URL: https://github.com/spdk/spdk/compare/322e50468cb7...2a92c30292bd [13:55:44] *** Parts: travis-ci (~travis-ci@ec2-54-205-136-216.compute-1.amazonaws.com) () [13:58:01] darsto: so that means that DPDK will potentially combine (virtually-contiguous) regions from multiple memory hotplug CBs in a single rte_malloc() allocation? [14:18:41] it seems so, but let me check that tomorrow [14:19:11] i do not trust myself at this hour [14:22:51] sure, no rush [14:23:13] if that is the case, then there is definitely a problem with RDMA, since we will only ibv_reg_mr() contiguous regions within a single hotplug CB [14:24:58] *** Quits: pohly (~pohly@p5484976F.dip0.t-ipconnect.de) (Quit: Leaving.) [14:25:57] *** Quits: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) (Quit: Page closed) [15:09:04] *** Joins: travis-ci (~travis-ci@ec2-54-205-233-51.compute-1.amazonaws.com) [15:09:04] (spdk/master) spdkcli: remove print_array when creating lvol store (Karol Latecki) [15:09:04] Diff URL: https://github.com/spdk/spdk/compare/2a92c30292bd...fccd03b1f0d4 [15:09:04] *** Parts: travis-ci (~travis-ci@ec2-54-205-233-51.compute-1.amazonaws.com) () [16:02:30] *** Joins: travis-ci (~travis-ci@ec2-23-20-192-84.compute-1.amazonaws.com) [16:02:30] (spdk/master) nvme/rdma: factor out Connect command (Daniel Verkamp) [16:02:30] Diff URL: https://github.com/spdk/spdk/compare/fccd03b1f0d4...1d260441b437 [16:02:30] *** Parts: travis-ci (~travis-ci@ec2-23-20-192-84.compute-1.amazonaws.com) () [16:33:55] *** Joins: travis-ci (~travis-ci@ec2-54-82-23-81.compute-1.amazonaws.com) [16:33:56] (spdk/master) env/dpdk: do not pass raw memzone flags param to DPDK (Dariusz Stojaczyk) [16:33:57] Diff URL: https://github.com/spdk/spdk/compare/1d260441b437...70df6cb890d2 [16:33:57] *** Parts: travis-ci (~travis-ci@ec2-54-82-23-81.compute-1.amazonaws.com) () [17:35:54] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [17:43:17] *** Quits: lyan (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) (Quit: Leaving) [17:44:22] Hi, a little related with Ben's patches do you know why iov_base = (buf + 512) & ~511 is used in spdk_bdev_io_set_buf() ? I think if 512 is used buf will always padded even if buf is already 512 aligned. So (buf + 511) & ~511 ? [17:45:48] This is no critical question. [17:45:48] hmm, that is probably a bug [17:45:55] it should be 511 as you noted [17:46:19] I suppose it does not hurt since there is an extra 512 bytes allocated for each buffer [17:49:20] drv: Thank you, I got it. How should I do about that? [17:49:31] Should be fixed? [17:50:26] I think we can keep it as 512 for now since that is consistent with the existing code, but it could be fixed in another patch [17:50:34] if you could add a comment to the review, that would be good [17:50:54] OK, I'll do that, thanks. [17:51:02] and I will remind bwalker to take a look tomorrow [18:38:49] *** Joins: dlw (~Thunderbi@114.255.44.143) [19:31:05] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [20:05:49] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp140-04-65-92-69-234.dsl.bell.ca) (Ping timeout: 260 seconds) [23:02:39] *** Joins: pohly (~pohly@p5484976F.dip0.t-ipconnect.de) [23:28:22] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-jetyqhpwgyavabzb)