[00:43:40] *** Joins: tkulasek_ (~tkulasek@192.55.54.45)
[02:01:20] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 268 seconds)
[02:02:16] *** Joins: dlw (~Thunderbi@114.255.44.139)
[03:46:46] *** Quits: dlw (~Thunderbi@114.255.44.139) (Ping timeout: 260 seconds)
[03:47:06] *** Joins: dlw (~Thunderbi@114.255.44.143)
[04:39:44] *** Joins: pohly (~pohly@p5484976F.dip0.t-ipconnect.de)
[04:46:33] <pohly> Can someone help me understand how SPDK uses hugepages? When I set up the system with HUGEMEM=2048 (the default, 2GB), how many instances of app/vhost can I start before I run out of huge pages? Two, right? That's because vhost has 1GB as default for the -s parameter.
[04:46:38] <pohly> How does the size affect runtime performance?
[04:47:03] <pohly> Or in other words, what determines how much memory is needed?
[04:47:14] <pohly> Malloc BDevs come to mind. What else?
[04:51:35] <pohly> When using vhost-user-scsi-pci in QEMU, the virtual machine must also use huge pages. It comes out of the same pool. Is it possible to run QEMU with 1GB and vhost with something a lot smaller?
[04:57:08] *** Joins: lyan (~lyan@2605:a000:160e:3c8:4a4d:7eff:fef2:eea3)
[05:02:51] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 240 seconds)
[05:20:55] <pohly> Testing shows that "-s 64" leads to "RING: Cannot reserve memory"...
[05:21:50] <pohly> 128MB: bdev.c: 630:spdk_bdev_initialize: *ERROR*: create rbuf small pool failed
[06:10:15] <philipp-sk> good morning! can i get CI (re-)started please for https://review.gerrithub.io/#/c/spdk/spdk/+/416879/
[08:03:17] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-cdqaqyuxqmctnjmp)
[08:19:49] <darsto> pohly: spdk doesn't need much memory. 1GB is usually much more than required
[08:20:50] <darsto> generally you need some extra memory for each bdev or target you have
[08:20:50] <pohly> darsto: it works for me with 256MB. 128MB was not enough. I haven't tried anything between those two values.
[08:20:50] <darsto> and then there are mallocs ofc
[08:21:50] <pohly> This is just for testing, so my Malloc Bdevs are as small as possible (1MB).
[08:21:50] <darsto> DPDK 18.05 offers dynamic memory allocation, so once we integrate that into SPDK, you will be able to start vhost with -s 0
[08:22:01] <darsto> and then it will allocate as many hugepages as it's required
[08:22:21] <pohly> Ah, good to know.
[08:22:51] <darsto> the error you're getting refers to a fixed-size I/O pool in SPDK
[08:23:21] <darsto> I think there was some work recently to make that I/O pool size configurable via RPC
[08:24:05] <darsto> but I'm not familiar with it
[08:24:58] <darsto> and the number of hugepages doesn't impact runtime performance in any noticeable way
[08:26:32] <darsto> speaking of dpdk 18.05, drv: could you look into this patch? https://review.gerrithub.io/c/spdk/spdk/+/416992/1
[08:27:23] <darsto> apparently I'm hitting the assert from spdk_nvme_ctrlr_cmd_get_log_page, it tries to allocate 8k on TP
[08:29:24] <darsto> so far we hardcode the number of log pages retrieved (?) and it's always 8k, so we could just align the buffer to 8k and safely assume it's contiguous, right?
[08:29:28] <darsto> 8k bytes, I mean
[08:38:31] <drv> darsto: most log pages are 4K, I think, but the Get Log Page command can retrieve much larger buffers than that (the size is represented as a 32-bit number of Dwords)
[08:39:13] <drv> but Get Log Page can use an arbitrary PRP list, so it should be okay to have discontiguous physical memory for that
[08:39:29] <drv> the PRP building code should already be stepping through and building PRP entries one 4K page at a time
[08:40:59] <drv> that should apply for anything using nvme_allocate_request_user_copy(), in fact - the only things that need to be physically contiguous are the PRP lists themselves (nvme_tracker) and the SQ/CQ entries
[08:44:01] <darsto> so we allocate that spdk_dma_zmalloc() temporary buffer just because the original buffer may not be dma-able?
[08:45:31] <drv> right, the user_copy helper is only used for non-fast-path admin commands so that the user doesn't have to provide a dma-able buffer
[08:47:02] <drv> I don't think we can limit the user_copy buffer size to 4K, but I also don't think it should matter (assuming spdk_vtophys() returns the right physical address/IOVA on each 4K page)
[08:48:33] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-cdqaqyuxqmctnjmp) (Ping timeout: 268 seconds)
[08:48:33] <darsto> then we have a lot of outdated documentation in there
[08:48:39] <darsto> e.g. nvme_payload->contig_or_cb_arg
[08:49:38] <drv> yes, the original assumption was contig == physically contiguous, but now it should be OK to just require virtually contiguous as long as vtophys is doing the right thing
[08:50:03] <drv> nvme_pcie_prp_list_append() is where the magic should happen
[08:51:37] <darsto> I see
[08:52:04] <darsto> can I ask you to reword those variables/docs?
[08:53:35] <drv> yes, I'll take a look
[08:54:05] <drv> I think the Create I/O {SQ,CQ} case is already special cased and doesn't use nvme_payload anyway
[08:55:05] <drv> yes, looks like nvme_pcie_ctrlr_cmd_create_io_cq() fills out the PRP1 field directly
[09:00:41] <drv> darsto: posted https://review.gerrithub.io/#/c/spdk/spdk/+/417045 - let me know if you see any other misleading comments
[09:05:47] <philipp-sk> hey drv, any idea whom can i ask to poke CI for https://review.gerrithub.io/#/c/spdk/spdk/+/416878/ and https://review.gerrithub.io/#/c/spdk/spdk/+/416879/?
[09:06:11] <drv> philipp-sk: I approved them just a bit ago - they should be in the queue now
[09:06:13] <drv> you can see the queue at https://ci.spdk.io/spdk/status/
[09:06:54] <darsto> is nvme_pcie_qpair_build_contig_request() being split as well?
[09:07:11] <philipp-sk> great, thanks for the link drv
[09:07:41] <drv> darsto: that calls the prp_list_append which does the splitting
[09:07:42] <drv> it steps one ctrlr->page_size at a time, which should normally be 4K
[09:08:17] <darsto> yeah - so it doesn't really matter if the buffer is contiguous or not
[09:08:21] <drv> right
[09:09:27] <drv> we did originally use the assumption that the whole buffer was physically contiguous, but that requirement was removed some time ago
[09:10:42] <drv> now that I think about it, the SGL payload type probably needs equivalent splitting (each SGL element is currently assumed to be physically contiguous)
[09:10:43] <drv> but the "contig" path should be fine
[09:11:20] <drv> the doc string for spdk_nvme_req_next_sge_cb does say that the segment must be physically contiguous
[09:11:43] <drv> so if the user is following the rules, that should be OK, but it sounds like that might not be easily possible in DPDK 18.05+
[09:12:13] <darsto> the NVMe I have doesn't support HW SGL
[09:12:28] <drv> yeah, the only Intel devices that support it that I know of are the DC D3x00 (not P3x00) series
[09:29:28] <darsto> does bdev_nvme make any use of HW SGLs?
[09:29:33] <darsto> I don't see it
[09:30:58] <drv> bdev_nvme calls spdk_nvme_ns_cmd_readv/writev, which can translate into HW SGLs if the NVMe device supports it
[09:31:56] <darsto> I see
[09:45:27] <drv> sethhowe: looks like fedora-06 is down for some reason
[09:47:00] <sethhowe> drv thx, will look into it.
[09:49:29] <sethhowe> drv: It's back up now. Dirty shared folder.
[09:52:12] <drv> thanks
[10:12:13] <darsto> dpdk got a new webpage layout
[10:12:28] <darsto> looks nice
[10:19:56] <bwalker> that does look way better
[11:18:55] <lhodev> I'm attempting, for the first time, to create an issue on GitHub.   Seems rather straightforward except how to add a label to it.   Per https://help.github.com/articles/applying-labels-to-issues-and-pull-requests/ one is to select a checkbox next to the items you want to apply a label.   However, my issue has no such checkbox (nor do I see checkboxes on any of the issues).   What am I missing?
[11:19:22] <bwalker> only admins add labels
[11:19:44] <bwalker> you don't need to label it - does the documentation say you need to?
[11:20:51] <lhodev> Thanks @bwalker.   Sure would've been nice if that was documented as such on github's help page (eye-roll).    I didn't read any explicit requirement to add a label.  I was just trying to be helpful.   No good deed goes unpunished ;-)
[11:25:16] *** Joins: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241)
[11:26:17] <darsto> drv: getting back to NVMe - do we want to split those SGEs that arent physically contiguous?
[11:26:17] <drv> I think we should
[11:26:47] <darsto> k, let me craft something
[11:26:52] <drv> that could be fairly complicated, but it is necessary for correctness if DPDK is going to provide non-physically-contiguous buffers from rte_malloc
[11:30:34] <darsto> hmm SPDK_ERRLOG("multi-element SGL currently not supported for RDMA\n");
[11:31:31] <bwalker> yeah we just added support in the target
[11:31:50] <bwalker> but the host hasn't needed it previously
[11:56:31] *** Quits: tkulasek_ (~tkulasek@192.55.54.45) (Ping timeout: 268 seconds)
[12:12:54] <darsto> I pushed some patches for pcie, but will hold on with rdma
[12:14:09] <darsto> rdma doesn't allocate any trackers and managing those seems complicated when looking at pcie
[12:35:04] <drv> RDMA should not care about physically contiguous buffers, at least as far as I understand the ibverbs API
[12:37:57] <bwalker> yeah RDMA doesn't need them to be physically contiguous
[12:40:21] <drv> but we should add multiple SGL entry support to it
[12:41:52] <darsto> well, it's the same scenario as with VFIO iovas
[12:42:37] <darsto> we could be passing a buffer that was registered with multiple ibv_reg_mr calls
[12:43:23] <drv> can rte_malloc() end up returning pieces from multiple DPDK memory regions?
[12:43:53] <darsto> that's what I've been experiencing
[12:43:53] <drv> the env_dpdk/memory.c code that calls the notifier that eventually calls ibv_reg_mr() coalesces virtually contiguous regions, so it should be OK as long as the DPDK allocations don't span regions
[12:43:53] <drv> hmm
[12:44:24] <drv> that would be a problem if it can happen, yes
[12:45:24] <darsto> the NVMe tracker list was discontiguous in my case. I was getting syslog errors from IOMMU about unpriviledged DMA at addr 0
[12:45:54] <darsto> and that tracker list comes from a single spdk_dma_zmalloc
[12:46:25] <darsto> i did actually check, and the memory was not physically contiguous
[12:47:25] <darsto> iova-contiguous, I mean
[12:50:27] <drv> hmm, the key thing is whether it was from a single memory_hotplug_cb, though
[12:50:57] <drv> I'm not sure if a single hotplug CB can have a non-iova-contiguous region?
[13:00:36] <darsto> let me look through the DPDK code
[13:25:26] <darsto> an area from a single hotplug CB will always have iova-contiguous region
[13:26:51] *** Quits: lyan (~lyan@2605:a000:160e:3c8:4a4d:7eff:fef2:eea3) (Ping timeout: 265 seconds)
[13:53:39] *** Joins: lyan (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3)
[13:55:43] *** Joins: travis-ci (~travis-ci@ec2-54-205-136-216.compute-1.amazonaws.com)
[13:55:44] <travis-ci> (spdk/master) lib: return error in case of start_fn is null in spdk_app_start() (Leonid Ravich)
[13:55:44] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/322e50468cb7...2a92c30292bd
[13:55:44] *** Parts: travis-ci (~travis-ci@ec2-54-205-136-216.compute-1.amazonaws.com) ()
[13:58:01] <drv> darsto: so that means that DPDK will potentially combine (virtually-contiguous) regions from multiple memory hotplug CBs in a single rte_malloc() allocation?
[14:18:41] <darsto> it seems so, but let me check that tomorrow
[14:19:11] <darsto> i do not trust myself at this hour
[14:22:51] <drv> sure, no rush
[14:23:13] <drv> if that is the case, then there is definitely a problem with RDMA, since we will only ibv_reg_mr() contiguous regions within a single hotplug CB
[14:24:58] *** Quits: pohly (~pohly@p5484976F.dip0.t-ipconnect.de) (Quit: Leaving.)
[14:25:57] *** Quits: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) (Quit: Page closed)
[15:09:04] *** Joins: travis-ci (~travis-ci@ec2-54-205-233-51.compute-1.amazonaws.com)
[15:09:04] <travis-ci> (spdk/master) spdkcli: remove print_array when creating lvol store (Karol Latecki)
[15:09:04] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/2a92c30292bd...fccd03b1f0d4
[15:09:04] *** Parts: travis-ci (~travis-ci@ec2-54-205-233-51.compute-1.amazonaws.com) ()
[16:02:30] *** Joins: travis-ci (~travis-ci@ec2-23-20-192-84.compute-1.amazonaws.com)
[16:02:30] <travis-ci> (spdk/master) nvme/rdma: factor out Connect command (Daniel Verkamp)
[16:02:30] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/fccd03b1f0d4...1d260441b437
[16:02:30] *** Parts: travis-ci (~travis-ci@ec2-23-20-192-84.compute-1.amazonaws.com) ()
[16:33:55] *** Joins: travis-ci (~travis-ci@ec2-54-82-23-81.compute-1.amazonaws.com)
[16:33:56] <travis-ci> (spdk/master) env/dpdk: do not pass raw memzone flags param to DPDK (Dariusz Stojaczyk)
[16:33:57] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/1d260441b437...70df6cb890d2
[16:33:57] *** Parts: travis-ci (~travis-ci@ec2-54-82-23-81.compute-1.amazonaws.com) ()
[17:35:54] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97)
[17:43:17] *** Quits: lyan (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) (Quit: Leaving)
[17:44:22] <Shuhei> Hi, a little related with Ben's patches do you know why iov_base = (buf + 512) & ~511 is used in spdk_bdev_io_set_buf() ? I think if 512 is used buf will always padded even if buf is already 512 aligned. So (buf + 511) & ~511 ?
[17:45:48] <Shuhei> This is no critical question.
[17:45:48] <drv> hmm, that is probably a bug
[17:45:55] <drv> it should be 511 as you noted
[17:46:19] <drv> I suppose it does not hurt since there is an extra 512 bytes allocated for each buffer
[17:49:20] <Shuhei> drv: Thank you, I got it. How should I do about that?
[17:49:31] <Shuhei> Should be fixed?
[17:50:26] <drv> I think we can keep it as 512 for now since that is consistent with the existing code, but it could be fixed in another patch
[17:50:34] <drv> if you could add a comment to the review, that would be good
[17:50:54] <Shuhei> OK, I'll do that, thanks.
[17:51:02] <drv> and I will remind bwalker to take a look tomorrow
[18:38:49] *** Joins: dlw (~Thunderbi@114.255.44.143)
[19:31:05] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds)
[20:05:49] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp140-04-65-92-69-234.dsl.bell.ca) (Ping timeout: 260 seconds)
[23:02:39] *** Joins: pohly (~pohly@p5484976F.dip0.t-ipconnect.de)
[23:28:22] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-jetyqhpwgyavabzb)