[01:10:53] *** Joins: travis-ci (~travis-ci@ec2-54-225-48-150.compute-1.amazonaws.com)
[01:10:54] <travis-ci> (spdk/master) ioat: allow ring to be physically discontiguous (Daniel Verkamp)
[01:10:54] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/2470278c9249...104fa6e722e8
[01:10:54] *** Parts: travis-ci (~travis-ci@ec2-54-225-48-150.compute-1.amazonaws.com) ()
[03:03:40] *** Joins: dlw1 (~Thunderbi@114.255.44.143)
[03:03:40] *** Quits: dlw (~Thunderbi@114.255.44.143) (Read error: Connection reset by peer)
[03:03:40] *** dlw1 is now known as dlw
[03:36:07] *** Joins: dlw1 (~Thunderbi@114.255.44.139)
[03:37:53] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 248 seconds)
[03:37:53] *** dlw1 is now known as dlw
[04:51:34] *** Joins: lyan (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3)
[04:51:57] *** lyan is now known as Guest90874
[05:31:15] *** Quits: dlw (~Thunderbi@114.255.44.139) (Ping timeout: 256 seconds)
[06:14:58] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca)
[06:28:51] <peluse> drv, more UT issues related to DPDK, ugh. Ping me when you're on line, have an idea I want to run by you
[08:24:00] <peluse> it's actually working pretty good, let me know if this seems reasonable.  There's a DPDK header that is included that also has definitions in it used by the key function I'm testing so I can't mock them and they're written such there's no real way for me to pass in anything that makes them benign, so
[08:24:45] <peluse> I just put a hacked up copy of that header in my UT dir and include it in the UT file and I'm able to make those functions do what I need them to do...
[08:48:06] <drv> peluse: I'm around now
[08:48:17] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-fviiolifqochxwzk) (Ping timeout: 268 seconds)
[08:49:01] <peluse> cool, read ^ and let me know if you think that's OK
[08:49:23] <drv> if it compiles like that, sounds reasonable to me
[08:49:31] <peluse> cool, just making sure :)
[08:49:39] <drv> as long as you put a comment at the top noting why it's there
[08:49:45] <peluse> yup, its already in there
[08:49:59] <peluse>  * included in the header file that can't be done with our mock library.
[08:49:59] <peluse>  */
[08:50:05] <peluse> "/* rte_crypto.h is our local copy of the DPDK header hacked to mock some functions
[08:50:05] <peluse>  * included in the header file that can't be done with our mock library.
[08:50:05] <peluse>  */"
[08:50:16] <drv> cool
[08:50:29] <drv> probably run it by jimharris/bwalker, but I'm on board
[08:51:33] <jimharris> sounds good to me
[08:51:43] <peluse> gracias
[08:51:59] *** Quits: lhodev (~lhodev@66-90-218-190.dyn.grandenetworks.net) (Quit: My MacBook has gone to sleep. ZZZzzz…)
[09:28:05] *** Joins: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com)
[09:28:06] <travis-ci> (spdk/master) ioat: fix typo on IOAT_DEFAULT_ORDER comment (Dariusz Stojaczyk)
[09:28:06] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/104fa6e722e8...f75cc493d992
[09:28:06] *** Parts: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com) ()
[10:04:23] <darsto> does everyone agree with changpe1? https://review.gerrithub.io/c/spdk/spdk/+/416993/2
[10:06:19] <darsto> i split those patches so we can decide which way to go
[10:16:25] <bwalker> I typically prefer one source of truth
[10:16:39] <bwalker> so I like dynamically building them from their source data, as you do in the second patch
[10:16:54] <bwalker> and if we agree on the second patch, then I think we can squash them
[10:17:21] <drv> super nitpick: there's a tab in the middle of line 1013 on the first patch :)
[10:18:25] <bwalker> I also don't love the memzone API we're exposing in env.h
[10:18:36] <bwalker> it wasn't a big deal before because it wasn't used in any significant way
[10:18:49] <bwalker> I know it just mirrors the DPDK API, but I think we can design something better
[10:19:42] <bwalker> for instance, maybe a new flag to spdk_malloc is in order
[10:19:46] <bwalker> SPDK_MALLOC_CONTIG
[10:21:09] <bwalker> the spdk_memzone stuff also exposes a name by which the memzones can be looked up
[10:21:18] <bwalker> however, only one spot in the code actually needs to do a lookup
[10:21:28] <bwalker> and that's the discovery of the root memory location for the nvme driver
[10:21:51] <bwalker> when running in secondary process mode
[10:22:12] <bwalker> so maybe a more intuitive API is to add the SPDK_MALLOC_CONTIG flag to spdk_malloc (and make that case allocate via a DPDK memzone internally)
[10:22:26] <darsto> yeah, memzones serve two purposes - one to have a process shared memory, other to allocate contig memory
[10:22:46] <bwalker> and replace the existing memzone API with spdk_malloc_named(...) to allocate shared memory with a given name
[10:23:10] <bwalker> not that spdk_malloc_named is a good name for that function, but a separate a function with a more intuitive name than spdk_memzone is what I'm suggesting
[10:24:02] <bwalker> the only actual use of the memzone API today is in the nvme driver
[10:24:10] <bwalker> so this is the time when we can change this
[10:24:16] <bwalker> before it gets propagated everywhere
[10:27:26] <darsto> i remember reading some mailing list discussion about passing the same spdk_malloc flags to spdk_free
[10:27:49] <darsto> someone was suggesting this. I don't remember the conclusion we came up with, though
[10:29:37] <bwalker> I don't think they ended up needing it
[10:29:50] <bwalker> but you're right - we'd potentially need it if we went that route
[10:30:13] <bwalker> unless we can somehow look it up in DPDK - it probably has the right info
[10:30:52] <darsto> looking into this atm
[10:32:40] <darsto> there's rte_memzone_walk()
[10:33:50] <darsto> we *could* use it if we plan to use physically contiguous memory only for hw rings
[10:36:24] <darsto> ah, memzones are also reserved internally in DPDK
[10:40:59] <jimharris> passing the flags to spdk_free seems like such a pain
[10:47:32] <bwalker> yeah we've been doing everything we can to avoid it
[10:47:39] <bwalker> I'm hoping we can come up with something clever here
[10:48:49] <jimharris> would something simple/stupid like spdk_dma_contigmalloc/spdk_dma_contigfree work?
[10:49:13] <bwalker> yes - but we've been moving away from separate spdk_**type**_alloc functions and to a unified spdk_malloc with flags
[10:49:17] <jimharris> freebsd kernel does it this way (instead of a separate flag)
[10:50:23] <jimharris> where did those patches go though?  i know we had patches out recently (from zahra?) but we're not using flags in spdk_dma_malloc currently
[10:50:49] <bwalker> we added spdk_malloc with flags already - and internally spdk_dma_malloc calls spdk_malloc
[10:50:59] <bwalker> but we haven't changed all of the code everywhere to the new calls
[10:51:25] <bwalker> the issue with a separate contig calls is that you also need to flag whether it is shared memory
[10:51:31] <bwalker> so you have flags no matter what
[10:51:52] <jimharris> ugh - yeah, i was only looking at spdk_dma_malloc
[10:52:38] <bwalker> (one solution is to eliminate multi-process support in the nvme driver, which eliminates the use of shared memory)
[10:53:24] <bwalker> but if we're going to continue to move ahead with shared memory, I think the API with the flags is the cleanest
[10:57:31] <jimharris> is there a plan (or a need) to get rid of spdk_dma_malloc/free moving forward?
[10:57:46] <bwalker> I think over time, we'd want to get rid of them
[10:57:48] <jimharris> if not, then I don't see why spdk_dma_contigmalloc/contigfree wouldn't work
[10:57:52] <bwalker> but they're in there still for backward compatibility
[10:58:24] <bwalker> we were trying to keep the API as small and clear as possible, to make it easier to implement an env
[11:09:38] *** Joins: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk)
[11:36:11] <jimharris> i think the SPDK_MALLOC_CONTIG flag is fine - it will be trivial in the env_dpdk implementation to key off that flag and use memzone
[11:36:49] *** Quits: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk) (Quit: nik_b)
[11:37:05] <bwalker> yeah but in the free path you have to know if you should memzone_release the pointer or rte_free it
[11:37:24] <jimharris> well no, because we'll explicitly call spdk_free(SPDK_MALLOC_CONTIG)
[11:37:50] <jimharris> instead of spdk_dma_free
[11:39:08] <jimharris> drv: looking at your patch to fix the size of those structures for address translation
[11:39:17] *** Joins: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk)
[11:39:42] <jimharris> the MAP_1GB_IDX part is the one that concerns me - your patch looks right, but why didn't we see problems related to that before?
[11:44:48] <jimharris> oh - i see, that extra space actually wasn't being wasted - it was actually needed because the MAP_1GB_IDX calculation was wrong
[11:45:05] <jimharris> so each "1GB map" was actually handing the mapping for a full 2GB
[12:00:10] *** Joins: lhodev (~lhodev@66-90-218-190.dyn.grandenetworks.net)
[12:03:55] *** Joins: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com)
[12:03:56] <travis-ci> (spdk/master) test/spdkcli: add missing python dependencies to pkgdep.sh (Pawel Niedzwiecki)
[12:03:57] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/f75cc493d992...95fe928067e3
[12:03:57] *** Parts: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com) ()
[12:12:25] *** Joins: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com)
[12:12:27] <travis-ci> (spdk/master) nvme: adjust physically contiguous memory comments (Daniel Verkamp)
[12:12:27] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/95fe928067e3...4af4e4f509c9
[12:12:27] *** Parts: travis-ci (~travis-ci@ec2-54-87-121-123.compute-1.amazonaws.com) ()
[12:13:45] *** Joins: travis-ci (~travis-ci@ec2-54-158-152-243.compute-1.amazonaws.com)
[12:13:46] <travis-ci> (spdk/master) test/json: Add pmem bdev test (Pawel Kaminski)
[12:13:47] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/4af4e4f509c9...4a9583f7fe8a
[12:13:47] *** Parts: travis-ci (~travis-ci@ec2-54-158-152-243.compute-1.amazonaws.com) ()
[12:16:45] *** Joins: travis-ci (~travis-ci@ec2-54-91-67-9.compute-1.amazonaws.com)
[12:16:47] <travis-ci> (spdk/master) bdev: remove public usage of bdev_part_base.ref (Seth Howell)
[12:16:47] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/4a9583f7fe8a...426513872266
[12:16:47] *** Parts: travis-ci (~travis-ci@ec2-54-91-67-9.compute-1.amazonaws.com) ()
[12:41:18] <drv> jimharris: yup, the errors cancelled out, but once the array size is fixed, the map macro also does or else it overflows
[12:42:56] *** Quits: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk) (Quit: nik_b)
[12:43:32] *** Joins: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk)
[12:48:07] *** Quits: nik_b (~niklasbeg@93-97-199-132.zone5.bethere.co.uk) (Ping timeout: 245 seconds)
[12:50:10] <jimharris> what do folks think of forbidding use of CU_ASSERT_FATAL?
[12:51:08] <jimharris> i just reviewed a patch that used CU_ASSERT_FATAL instead of SPDK_CU_ASSERT_FATAL, and added the workaround to get scan-build to shut up
[12:52:14] <jimharris> if we find CU_ASSERT_FATAL used, make check_format.sh fail and tell them to use SPDK_CU_ASSERT_FATAL instead (and why)
[12:52:53] <drv> yeah, that seems reasonable
[12:53:07] <drv> can we just #define CU_ASSERT_FATAL to something else so it won't compile?
[12:53:20] <drv> check_format.sh check is fine too, whatever's simpler
[12:55:01] <jimharris> where would #define CU_ASSERT_FATAL to something else?  if we do it in the test/spdk_cunit.h, then we have to know the unit test included spdk_cunit.h and not just CUnit/Basic.h
[12:55:51] <jimharris> we still have at least a few unit test files that include CUnit/Basic.h directly
[12:57:04] <jimharris> in spdk_cunit.h, the comment says "The abort() will never actually execute." ?
[12:57:31] <drv> oh, I forgot SPDK_CU_ASSERT_FATAL() still uses CU_ASSERT_FATAL(), so that won't work
[12:57:50] <drv> check_format.sh seems like the way to go, then
[12:58:37] <jimharris> and now i'm looking at whether we even need this spdk_cunit_print_results anymore
[12:59:25] <drv> I think only one test ever used it
[13:00:48] *** Quits: pohly (~pohly@p5484976F.dip0.t-ipconnect.de) (Quit: Leaving.)
[13:28:55] *** Quits: sethhowe (sethhowe@nat/intel/x-owspgkuuxdayxcoj) (Remote host closed the connection)
[13:32:29] *** Joins: sethhowe (~sethhowe@134.134.139.72)
[13:48:32] <jimharris> bwalker: looking at your bdevperf zcopy patch and had a thought
[13:49:58] <jimharris> i think we want the spdk_bdev_zcopy_start/end functions to work even if the bdev doesn't support zcopy - in those cases, bdev allocates the buffer internally and translates the zcopy_start/end calls to read/write equivalents
[13:51:20] <bwalker> https://review.gerrithub.io/#/c/spdk/spdk/+/416465/
[13:52:40] <jimharris> nice - could you rebase the bdevperf patch on top of this, and then just use the zcopy_start/end in all cases?
[13:53:21] <bwalker> sure
[13:53:27] *** Joins: travis-ci (~travis-ci@ec2-54-158-152-243.compute-1.amazonaws.com)
[13:53:28] <travis-ci> (spdk/master) virtio: check F_CONFIG feature before sending GET/SET_CONFIG (Dariusz Stojaczyk)
[13:53:28] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/9aff546b7f9f...436c0c189bca
[13:53:28] *** Parts: travis-ci (~travis-ci@ec2-54-158-152-243.compute-1.amazonaws.com) ()
[13:54:33] <jimharris> then you could use it for writes too in bdevperf
[13:56:02] <jimharris> this is going to be so nice - we can get rid of all of the spdk_bdev_io_get_buf calls in all of the bdev modules
[14:27:05] <drv> jimharris: if you have a moment, please take a look at my comment on https://review.gerrithub.io/#/c/spdk/spdk/+/417962/
[14:27:21] <drv> I want to make sure we're on the same page before Xiaodong goes off and reimplements everything
[14:28:48] <jimharris> looking...
[14:29:04] <drv> mainly the big code block with a proposed JSON-RPC parameter example
[14:32:30] <jimharris> yeah - i like your idea and indicated such on the review just now
[14:47:17] <bwalker> if I name the parameter to the function 'dirty'
[14:47:32] <bwalker> what do I name the internal argument that is currently "populate_or_commit"
[14:48:00] <bwalker> populate_or_clean?
[14:48:13] <jimharris> sure
[14:49:48] <bwalker> what about the word 'sync'
[14:50:54] <drv> "dirty" is definitely a better name for the parameter in the API
[14:51:04] <drv> not sure about sync, since that has connotations of flushing
[14:52:12] <jimharris> playing devil's advocate...is we use 'sync', we could possibly use that for both start and end
[14:52:15] <jimharris> instead of populate
[14:52:23] <jimharris> s/is/if
[14:54:45] <jimharris> then 'sync' basically indicates if the backing store should be synced with the io buffer
[14:55:42] <bwalker> I like the consistency, but I'm not sure 'sync' is obvious enough on the start side
[14:57:07] <jimharris> after our side discussion earlier, if we don't view malloc as a "real" zero copy implementation, then I'm more tempted to just go back to "commit"
[14:57:54] <bwalker> I think if it doesn't provide block-level atomicity, it isn't a real block device
[14:58:06] <drv> yeah, then I'm OK with commit
[14:58:42] <bwalker> all drives do provide block-level atomicity regarding two simultaneous writes
[14:58:54] <bwalker> not necessarily during power failure for HDDs
[14:59:07] <bwalker> but I think that's the bare minimum requirement to be considered a legitimate block device
[14:59:22] <bwalker> and the malloc bdev doesn't meet that
[15:04:15] <darsto> jimharris: do you suggest that vhost-blk targets should be removed automatically?
[15:04:45] <darsto> we could do that, but not while there are active connections to that target
[15:05:25] <darsto> so all the no_bdev_process_vq logic still needs to be there
[15:07:51] <jimharris> i was just wondering if we can ensure no new connections happen if the bdev is hot removed
[15:08:47] <jimharris> so maybe that patch i commented on is fine - i was just wondering if we have a way to avoid new connections at least
[15:09:24] <darsto> like not accept()'ing those?
[15:10:15] <jimharris> ideally - but is there a way to tell the vhost library to not accept new connections but still keep the old ones?
[15:11:01] <darsto> returning -1 from the new_connection op
[15:11:14] <jimharris> ah - perfect
[15:12:04] <jimharris> what do you think?  i'm just thinking that once a bdev is hot removed, we don't want new connections added to it, and i'd even say that once all existing connections are closed, we should delete the target completely
[15:12:15] <jimharris> i.e. delete it automatically with no user intervention
[15:15:39] <darsto> hmmm
[15:18:16] <darsto> i see that vhost-blk targets with hotremoved bdevs aren't even exported with config dumps
[15:18:38] <darsto> okay - let's to that, then
[15:24:11] <bwalker> definitely have a challenge ahead of us for zcopy if we want to support larger I/O sizes than the internal buffer pool size
[15:24:17] <bwalker> need to do sgl's for the whole thing
[15:24:35] <bwalker> I was thinking that maybe we could allocate an array of iov elements in the bdev_io
[15:24:40] <bwalker> I'm not sure how much that would blow the size out
[15:36:27] <jimharris> i think an iov array would be fine - it means we'll still have some limit, but at least it can be much bigger than 64KB
[15:37:08] <bwalker> yeah it doesn't need to be unlimited
[15:37:31] <jimharris> i'm not super concerned about making the bdev_io bigger since we're making all of these changes to remove memory allocations in other areas
[15:37:39] <jimharris> (i.e. removing the nvme-of buffer pools)
[15:41:02] <jimharris> current bdev_io size (including max context size for modules) is 440 bytes
[15:41:36] <jimharris> and we allocate 64K of them by default == 27.5MB
[15:42:38] <bwalker> I'm not concerned with total memory size as much as cache line usage
[15:42:43] <jimharris> if we make an array of 32 iovs, that increases it to 936 bytes or 58.5MB
[15:43:09] <bwalker> we may need to do a pass on spdk_bdev_io and re-pack/re-order the fields at some point
[15:43:51] <jimharris> yeah - of course separating out the internal structure makes that a bit more difficult now
[15:46:23] <jimharris> we have some holes in the new __bdev_io_internal_fields
[15:47:02] <bwalker> sethhowe
[15:47:27] <jimharris> sethhowe: pahole is your friend
[15:47:56] <jimharris> we can easily get 8 bytes back
[15:48:07] <sethhowe> thx, will check it out.
[15:49:03] <jimharris> note i've always had to build/install from source - the package (at least on ubuntu) doesn't work on my systems, not sure if it's related to spdk build options, or what
[15:49:15] <jimharris> https://git.kernel.org/pub/scm/devel/pahole/pahole.git
[15:55:42] <bwalker> on fedora you can just install the package
[15:55:49] <bwalker> 'dwarves' is the name of it
[15:55:58] <bwalker> I think seth knows how to use it already though
[15:57:24] <sethhowe> Yeah, I do. When I said check it out I was offering to look into packing the cache lines in that struct.
[18:15:58] *** Joins: Jianfeng (2ffc0185@gateway/web/freenode/ip.47.252.1.133)
[18:21:54] <Jianfeng> May be not so appropriate to ask here, but I only know this storage community :-) My question is about Linux Direct I/O. See lots of blogs say user should provide aligned buffer (mandatory requirement). I'm wondering the reason is that device DMA needs physical-contiguous buffer. Then could this requirement be just physical-contiguous instead of alignment?
[18:22:58] <Jianfeng> Thanks in advance
[18:29:36] *** Joins: dlw (~Thunderbi@114.255.44.143)
[19:23:21] *** Quits: Guest90874 (~lyan@2605:a000:160e:2124:4a4d:7eff:fef2:eea3) (Quit: Leaving)
[21:09:22] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Ping timeout: 260 seconds)
[22:14:32] *** Quits: Jianfeng (2ffc0185@gateway/web/freenode/ip.47.252.1.133) (Quit: Page closed)
[23:03:52] *** Joins: pohly (~pohly@p5484976F.dip0.t-ipconnect.de)
[23:26:50] *** Parts: pohly (~pohly@p5484976F.dip0.t-ipconnect.de) ()
[23:38:51] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-sftfderxohkyfljz)