[01:10:53] *** Joins: travis-ci (~travis-ci@ec2-54-225-48-150.compute-1.amazonaws.com) [01:10:54] (spdk/master) ioat: allow ring to be physically discontiguous (Daniel Verkamp) [01:10:54] Diff URL: https://github.com/spdk/spdk/compare/2470278c9249...104fa6e722e8 [01:10:54] *** Parts: travis-ci (~travis-ci@ec2-54-225-48-150.compute-1.amazonaws.com) () [03:03:40] *** Joins: dlw1 (~Thunderbi@114.255.44.143) [03:03:40] *** Quits: dlw (~Thunderbi@114.255.44.143) (Read error: Connection reset by peer) [03:03:40] *** dlw1 is now known as dlw [03:36:07] *** Joins: dlw1 (~Thunderbi@114.255.44.139) [03:37:53] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 248 seconds) [03:37:53] *** dlw1 is now known as dlw [04:51:34] *** Joins: lyan (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) [04:51:57] *** lyan is now known as Guest90874 [05:31:15] *** Quits: dlw (~Thunderbi@114.255.44.139) (Ping timeout: 256 seconds) [06:14:58] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) [06:28:51] drv, more UT issues related to DPDK, ugh. Ping me when you're on line, have an idea I want to run by you [08:24:00] it's actually working pretty good, let me know if this seems reasonable. There's a DPDK header that is included that also has definitions in it used by the key function I'm testing so I can't mock them and they're written such there's no real way for me to pass in anything that makes them benign, so [08:24:45] I just put a hacked up copy of that header in my UT dir and include it in the UT file and I'm able to make those functions do what I need them to do... [08:48:06] peluse: I'm around now [08:48:17] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-fviiolifqochxwzk) (Ping timeout: 268 seconds) [08:49:01] cool, read ^ and let me know if you think that's OK [08:49:23] if it compiles like that, sounds reasonable to me [08:49:31] cool, just making sure :) [08:49:39] as long as you put a comment at the top noting why it's there [08:49:45] yup, its already in there [08:49:59] * included in the header file that can't be done with our mock library. [08:49:59] */ [08:50:05] "/* rte_crypto.h is our local copy of the DPDK header hacked to mock some functions [08:50:05] * included in the header file that can't be done with our mock library. [08:50:05] */" [08:50:16] cool [08:50:29] probably run it by jimharris/bwalker, but I'm on board [08:51:33] sounds good to me [08:51:43] gracias [08:51:59] *** Quits: lhodev (~lhodev@66-90-218-190.dyn.grandenetworks.net) (Quit: My MacBook has gone to sleep. ZZZzzz…) [09:28:05] *** Joins: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com) [09:28:06] (spdk/master) ioat: fix typo on IOAT_DEFAULT_ORDER comment (Dariusz Stojaczyk) [09:28:06] Diff URL: https://github.com/spdk/spdk/compare/104fa6e722e8...f75cc493d992 [09:28:06] *** Parts: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com) () [10:04:23] does everyone agree with changpe1? https://review.gerrithub.io/c/spdk/spdk/+/416993/2 [10:06:19] i split those patches so we can decide which way to go [10:16:25] I typically prefer one source of truth [10:16:39] so I like dynamically building them from their source data, as you do in the second patch [10:16:54] and if we agree on the second patch, then I think we can squash them [10:17:21] super nitpick: there's a tab in the middle of line 1013 on the first patch :) [10:18:25] I also don't love the memzone API we're exposing in env.h [10:18:36] it wasn't a big deal before because it wasn't used in any significant way [10:18:49] I know it just mirrors the DPDK API, but I think we can design something better [10:19:42] for instance, maybe a new flag to spdk_malloc is in order [10:19:46] SPDK_MALLOC_CONTIG [10:21:09] the spdk_memzone stuff also exposes a name by which the memzones can be looked up [10:21:18] however, only one spot in the code actually needs to do a lookup [10:21:28] and that's the discovery of the root memory location for the nvme driver [10:21:51] when running in secondary process mode [10:22:12] so maybe a more intuitive API is to add the SPDK_MALLOC_CONTIG flag to spdk_malloc (and make that case allocate via a DPDK memzone internally) [10:22:26] yeah, memzones serve two purposes - one to have a process shared memory, other to allocate contig memory [10:22:46] and replace the existing memzone API with spdk_malloc_named(...) to allocate shared memory with a given name [10:23:10] not that spdk_malloc_named is a good name for that function, but a separate a function with a more intuitive name than spdk_memzone is what I'm suggesting [10:24:02] the only actual use of the memzone API today is in the nvme driver [10:24:10] so this is the time when we can change this [10:24:16] before it gets propagated everywhere [10:27:26] i remember reading some mailing list discussion about passing the same spdk_malloc flags to spdk_free [10:27:49] someone was suggesting this. I don't remember the conclusion we came up with, though [10:29:37] I don't think they ended up needing it [10:29:50] but you're right - we'd potentially need it if we went that route [10:30:13] unless we can somehow look it up in DPDK - it probably has the right info [10:30:52] looking into this atm [10:32:40] there's rte_memzone_walk() [10:33:50] we *could* use it if we plan to use physically contiguous memory only for hw rings [10:36:24] ah, memzones are also reserved internally in DPDK [10:40:59] passing the flags to spdk_free seems like such a pain [10:47:32] yeah we've been doing everything we can to avoid it [10:47:39] I'm hoping we can come up with something clever here [10:48:49] would something simple/stupid like spdk_dma_contigmalloc/spdk_dma_contigfree work? [10:49:13] yes - but we've been moving away from separate spdk_**type**_alloc functions and to a unified spdk_malloc with flags [10:49:17] freebsd kernel does it this way (instead of a separate flag) [10:50:23] where did those patches go though? i know we had patches out recently (from zahra?) but we're not using flags in spdk_dma_malloc currently [10:50:49] we added spdk_malloc with flags already - and internally spdk_dma_malloc calls spdk_malloc [10:50:59] but we haven't changed all of the code everywhere to the new calls [10:51:25] the issue with a separate contig calls is that you also need to flag whether it is shared memory [10:51:31] so you have flags no matter what [10:51:52] ugh - yeah, i was only looking at spdk_dma_malloc [10:52:38] (one solution is to eliminate multi-process support in the nvme driver, which eliminates the use of shared memory) [10:53:24] but if we're going to continue to move ahead with shared memory, I think the API with the flags is the cleanest [10:57:31] is there a plan (or a need) to get rid of spdk_dma_malloc/free moving forward? [10:57:46] I think over time, we'd want to get rid of them [10:57:48] if not, then I don't see why spdk_dma_contigmalloc/contigfree wouldn't work [10:57:52] but they're in there still for backward compatibility [10:58:24] we were trying to keep the API as small and clear as possible, to make it easier to implement an env [11:09:38] *** Joins: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk) [11:36:11] i think the SPDK_MALLOC_CONTIG flag is fine - it will be trivial in the env_dpdk implementation to key off that flag and use memzone [11:36:49] *** Quits: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk) (Quit: nik_b) [11:37:05] yeah but in the free path you have to know if you should memzone_release the pointer or rte_free it [11:37:24] well no, because we'll explicitly call spdk_free(SPDK_MALLOC_CONTIG) [11:37:50] instead of spdk_dma_free [11:39:08] drv: looking at your patch to fix the size of those structures for address translation [11:39:17] *** Joins: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk) [11:39:42] the MAP_1GB_IDX part is the one that concerns me - your patch looks right, but why didn't we see problems related to that before? [11:44:48] oh - i see, that extra space actually wasn't being wasted - it was actually needed because the MAP_1GB_IDX calculation was wrong [11:45:05] so each "1GB map" was actually handing the mapping for a full 2GB [12:00:10] *** Joins: lhodev (~lhodev@66-90-218-190.dyn.grandenetworks.net) [12:03:55] *** Joins: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com) [12:03:56] (spdk/master) test/spdkcli: add missing python dependencies to pkgdep.sh (Pawel Niedzwiecki) [12:03:57] Diff URL: https://github.com/spdk/spdk/compare/f75cc493d992...95fe928067e3 [12:03:57] *** Parts: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com) () [12:12:25] *** Joins: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com) [12:12:27] (spdk/master) nvme: adjust physically contiguous memory comments (Daniel Verkamp) [12:12:27] Diff URL: https://github.com/spdk/spdk/compare/95fe928067e3...4af4e4f509c9 [12:12:27] *** Parts: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com) () [12:13:45] *** Joins: travis-ci (~travis-ci@ec2-54-158-152-243.compute-1.amazonaws.com) [12:13:46] (spdk/master) test/json: Add pmem bdev test (Pawel Kaminski) [12:13:47] Diff URL: https://github.com/spdk/spdk/compare/4af4e4f509c9...4a9583f7fe8a [12:13:47] *** Parts: travis-ci (~travis-ci@ec2-54-158-152-243.compute-1.amazonaws.com) () [12:16:45] *** Joins: travis-ci (~travis-ci@ec2-54-91-67-9.compute-1.amazonaws.com) [12:16:47] (spdk/master) bdev: remove public usage of bdev_part_base.ref (Seth Howell) [12:16:47] Diff URL: https://github.com/spdk/spdk/compare/4a9583f7fe8a...426513872266 [12:16:47] *** Parts: travis-ci (~travis-ci@ec2-54-91-67-9.compute-1.amazonaws.com) () [12:41:18] jimharris: yup, the errors cancelled out, but once the array size is fixed, the map macro also does or else it overflows [12:42:56] *** Quits: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk) (Quit: nik_b) [12:43:32] *** Joins: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk) [12:48:07] *** Quits: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk) (Ping timeout: 245 seconds) [12:50:10] what do folks think of forbidding use of CU_ASSERT_FATAL? [12:51:08] i just reviewed a patch that used CU_ASSERT_FATAL instead of SPDK_CU_ASSERT_FATAL, and added the workaround to get scan-build to shut up [12:52:14] if we find CU_ASSERT_FATAL used, make check_format.sh fail and tell them to use SPDK_CU_ASSERT_FATAL instead (and why) [12:52:53] yeah, that seems reasonable [12:53:07] can we just #define CU_ASSERT_FATAL to something else so it won't compile? [12:53:20] check_format.sh check is fine too, whatever's simpler [12:55:01] where would #define CU_ASSERT_FATAL to something else? if we do it in the test/spdk_cunit.h, then we have to know the unit test included spdk_cunit.h and not just CUnit/Basic.h [12:55:51] we still have at least a few unit test files that include CUnit/Basic.h directly [12:57:04] in spdk_cunit.h, the comment says "The abort() will never actually execute." ? [12:57:31] oh, I forgot SPDK_CU_ASSERT_FATAL() still uses CU_ASSERT_FATAL(), so that won't work [12:57:50] check_format.sh seems like the way to go, then [12:58:37] and now i'm looking at whether we even need this spdk_cunit_print_results anymore [12:59:25] I think only one test ever used it [13:00:48] *** Quits: pohly (~pohly@p5484976F.dip0.t-ipconnect.de) (Quit: Leaving.) [13:28:55] *** Quits: sethhowe (sethhowe@nat/intel/x-owspgkuuxdayxcoj) (Remote host closed the connection) [13:32:29] *** Joins: sethhowe (~sethhowe@134.134.139.72) [13:48:32] bwalker: looking at your bdevperf zcopy patch and had a thought [13:49:58] i think we want the spdk_bdev_zcopy_start/end functions to work even if the bdev doesn't support zcopy - in those cases, bdev allocates the buffer internally and translates the zcopy_start/end calls to read/write equivalents [13:51:20] https://review.gerrithub.io/#/c/spdk/spdk/+/416465/ [13:52:40] nice - could you rebase the bdevperf patch on top of this, and then just use the zcopy_start/end in all cases? [13:53:21] sure [13:53:27] *** Joins: travis-ci (~travis-ci@ec2-54-158-152-243.compute-1.amazonaws.com) [13:53:28] (spdk/master) virtio: check F_CONFIG feature before sending GET/SET_CONFIG (Dariusz Stojaczyk) [13:53:28] Diff URL: https://github.com/spdk/spdk/compare/9aff546b7f9f...436c0c189bca [13:53:28] *** Parts: travis-ci (~travis-ci@ec2-54-158-152-243.compute-1.amazonaws.com) () [13:54:33] then you could use it for writes too in bdevperf [13:56:02] this is going to be so nice - we can get rid of all of the spdk_bdev_io_get_buf calls in all of the bdev modules [14:27:05] jimharris: if you have a moment, please take a look at my comment on https://review.gerrithub.io/#/c/spdk/spdk/+/417962/ [14:27:21] I want to make sure we're on the same page before Xiaodong goes off and reimplements everything [14:28:48] looking... [14:29:04] mainly the big code block with a proposed JSON-RPC parameter example [14:32:30] yeah - i like your idea and indicated such on the review just now [14:47:17] if I name the parameter to the function 'dirty' [14:47:32] what do I name the internal argument that is currently "populate_or_commit" [14:48:00] populate_or_clean? [14:48:13] sure [14:49:48] what about the word 'sync' [14:50:54] "dirty" is definitely a better name for the parameter in the API [14:51:04] not sure about sync, since that has connotations of flushing [14:52:12] playing devil's advocate...is we use 'sync', we could possibly use that for both start and end [14:52:15] instead of populate [14:52:23] s/is/if [14:54:45] then 'sync' basically indicates if the backing store should be synced with the io buffer [14:55:42] I like the consistency, but I'm not sure 'sync' is obvious enough on the start side [14:57:07] after our side discussion earlier, if we don't view malloc as a "real" zero copy implementation, then I'm more tempted to just go back to "commit" [14:57:54] I think if it doesn't provide block-level atomicity, it isn't a real block device [14:58:06] yeah, then I'm OK with commit [14:58:42] all drives do provide block-level atomicity regarding two simultaneous writes [14:58:54] not necessarily during power failure for HDDs [14:59:07] but I think that's the bare minimum requirement to be considered a legitimate block device [14:59:22] and the malloc bdev doesn't meet that [15:04:15] jimharris: do you suggest that vhost-blk targets should be removed automatically? [15:04:45] we could do that, but not while there are active connections to that target [15:05:25] so all the no_bdev_process_vq logic still needs to be there [15:07:51] i was just wondering if we can ensure no new connections happen if the bdev is hot removed [15:08:47] so maybe that patch i commented on is fine - i was just wondering if we have a way to avoid new connections at least [15:09:24] like not accept()'ing those? [15:10:15] ideally - but is there a way to tell the vhost library to not accept new connections but still keep the old ones? [15:11:01] returning -1 from the new_connection op [15:11:14] ah - perfect [15:12:04] what do you think? i'm just thinking that once a bdev is hot removed, we don't want new connections added to it, and i'd even say that once all existing connections are closed, we should delete the target completely [15:12:15] i.e. delete it automatically with no user intervention [15:15:39] hmmm [15:18:16] i see that vhost-blk targets with hotremoved bdevs aren't even exported with config dumps [15:18:38] okay - let's to that, then [15:24:11] definitely have a challenge ahead of us for zcopy if we want to support larger I/O sizes than the internal buffer pool size [15:24:17] need to do sgl's for the whole thing [15:24:35] I was thinking that maybe we could allocate an array of iov elements in the bdev_io [15:24:40] I'm not sure how much that would blow the size out [15:36:27] i think an iov array would be fine - it means we'll still have some limit, but at least it can be much bigger than 64KB [15:37:08] yeah it doesn't need to be unlimited [15:37:31] i'm not super concerned about making the bdev_io bigger since we're making all of these changes to remove memory allocations in other areas [15:37:39] (i.e. removing the nvme-of buffer pools) [15:41:02] current bdev_io size (including max context size for modules) is 440 bytes [15:41:36] and we allocate 64K of them by default == 27.5MB [15:42:38] I'm not concerned with total memory size as much as cache line usage [15:42:43] if we make an array of 32 iovs, that increases it to 936 bytes or 58.5MB [15:43:09] we may need to do a pass on spdk_bdev_io and re-pack/re-order the fields at some point [15:43:51] yeah - of course separating out the internal structure makes that a bit more difficult now [15:46:23] we have some holes in the new __bdev_io_internal_fields [15:47:02] sethhowe [15:47:27] sethhowe: pahole is your friend [15:47:56] we can easily get 8 bytes back [15:48:07] thx, will check it out. [15:49:03] note i've always had to build/install from source - the package (at least on ubuntu) doesn't work on my systems, not sure if it's related to spdk build options, or what [15:49:15] https://git.kernel.org/pub/scm/devel/pahole/pahole.git [15:55:42] on fedora you can just install the package [15:55:49] 'dwarves' is the name of it [15:55:58] I think seth knows how to use it already though [15:57:24] Yeah, I do. When I said check it out I was offering to look into packing the cache lines in that struct. [18:15:58] *** Joins: Jianfeng (2ffc0185@gateway/web/freenode/ip.47.252.1.133) [18:21:54] May be not so appropriate to ask here, but I only know this storage community :-) My question is about Linux Direct I/O. See lots of blogs say user should provide aligned buffer (mandatory requirement). I'm wondering the reason is that device DMA needs physical-contiguous buffer. Then could this requirement be just physical-contiguous instead of alignment? [18:22:58] Thanks in advance [18:29:36] *** Joins: dlw (~Thunderbi@114.255.44.143) [19:23:21] *** Quits: Guest90874 (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) (Quit: Leaving) [21:09:22] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Ping timeout: 260 seconds) [22:14:32] *** Quits: Jianfeng (2ffc0185@gateway/web/freenode/ip.47.252.1.133) (Quit: Page closed) [23:03:52] *** Joins: pohly (~pohly@p5484976F.dip0.t-ipconnect.de) [23:26:50] *** Parts: pohly (~pohly@p5484976F.dip0.t-ipconnect.de) () [23:38:51] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-sftfderxohkyfljz)