[00:17:31] *** guerby_ is now known as guerby
[00:17:34] *** Quits: guerby (~guerby@ip165.tetaneutral.net) (Changing host)
[00:17:34] *** Joins: guerby (~guerby@april/board/guerby)
[01:10:19] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl)
[01:43:39] *** Quits: qwan_ (c066cc2d@gateway/web/freenode/ip.192.102.204.45) (Ping timeout: 260 seconds)
[04:20:49] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[04:32:30] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl)
[06:07:32] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[06:07:52] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl)
[07:05:54] *** Joins: tomzawadzki (~tomzawadz@134.134.139.76)
[07:23:30] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[07:27:12] *** Joins: lhodev (~Adium@inet-hqmc07-o.oracle.com)
[07:28:45] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl)
[07:29:34] *** Parts: lhodev (~Adium@inet-hqmc07-o.oracle.com) ()
[07:43:01] <peluse> qwan_: I'm setting up a few new systems this week, are you following the directions in test/config/README.md? I'll let you know how it goes over here
[09:16:15] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[09:50:41] *** Quits: tomzawadzki (~tomzawadz@134.134.139.76) (Ping timeout: 248 seconds)
[11:19:21] <darsto> could someone retrigger test pool on https://review.gerrithub.io/c/393962/, please?
[11:55:08] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl)
[12:32:27] *** Joins: starter (0cda5282@gateway/web/freenode/ip.12.218.82.130)
[12:34:00] <bwalker> darsto: done
[12:42:31] <starter> New to spdk and looking answer for - Can one do end-to-end IO (i.e. from NVMeoF initiator <-> NVMeoF target <-> NVMe drives) using only NVMeoF and NVMe spdk/dpdk stack that is provided in spdk/dpdk website?
[12:43:03] <bwalker> yes - SPDK includes an initiator and a target for NVMe-oF
[12:44:16] <starter> Thank you @bwalker. And will this require the dpdk driver from the NIC vendor?
[12:44:29] <bwalker> no
[12:44:37] <bwalker> NVMe-oF uses RDMA as its transport
[12:44:48] <bwalker> and we just use libibverbs to access that today
[12:45:00] <bwalker> i.e. it's making requests to the kernel's NIC driver
[12:45:37] <bwalker> someone could maybe do a userspace RDMA driver for one of these NICs, plus a userspace RDMA stack on top and then we could use that, but no one has ever done that
[12:45:55] <bwalker> and RDMA is designed to bypass the kernel mostly anyway, so it probably wouldn't help much
[12:48:29] <starter> That make sense. Any idea why some NIC vendors provide DPDK NIC drivers? Is it for to be used for something else (a use case which is not related to NVMeoF initiator <-> NVMeoF target)?
[12:49:10] <bwalker> there are lots of use cases for a userspace NIC driver, but none of those reasons are RDMA
[12:49:25] <bwalker> if you want to implement a high-performance switch in software, for instance
[12:49:30] <bwalker> and you're just processing IP headers
[12:49:42] <bwalker> or maybe you're implementing a TCP load balancer
[12:49:56] <bwalker> there are full TCP/IP stacks implemented on top of DPDK as well
[12:50:54] <starter> Aaah... Got it. Thanks.
[12:51:32] <bwalker> DPDK is primarily used to implement switches and routers
[12:51:46] <bwalker> virtual or physical appliances
[12:52:26] <bwalker> SPDK uses DPDK just for it's other utilities - as a framework for allocating pinned memory and managing PCI devices
[12:52:36] <bwalker> we don't even compile the NIC drivers themselves
[12:53:11] <bwalker> there is a whole bunch of stuff that all userspace drivers need to do and DPDK already had all of that implemented
[12:55:24] <starter> Thanks for details. the Can I use kernel based NVMeoF initiator with SPDK based NVMeoF+NVMe target? There will be performance impact, but want to know if they compatible?
[12:55:44] <bwalker> yes - it all interoperates
[12:55:49] <bwalker> you can use any combination of kernel + SPDK
[12:56:17] <starter> That is wonderful!
[13:00:14] <starter> With all this Spectre and Meltdown going on, will the Spectre and Meltdown fix impact the peformance of SPDK when there is interaction between SPDK stack in user space and the helper SPDK support module in kernel space that interact with PCI device/hardware?
[13:01:18] <bwalker> for NVMe-oF it depends on the particular NIC vendor's driver implementation
[13:01:41] <bwalker> in particular, the way they plug in to the generic libibverbs library (which is the equivalent of the socket API, but for RDMA)
[13:02:11] <bwalker> my understanding is that all of the performance sensitive operations in most vendor implementations aren't making syscalls in the I/O path
[13:02:17] <bwalker> so the answer should be no - it is not affected
[13:02:22] <bwalker> but someone would need to audit all of that
[13:03:14] <bwalker> the rest of SPDK of course does not make any syscalls in the I/O path
[13:03:37] <bwalker> so the rest of SPDK outside of the RDMA driver stack is definitely not impacted, for NVMe-oF
[13:04:29] <bwalker> there are some impacts for vhost and iscsi, in some configurations
[13:06:38] *** Joins: lhodev (~Adium@inet-hqmc07-o.oracle.com)
[13:07:40] <starter> Got it and glad to know that there is no known impact to the SPDK io NVMeoF path.
[13:08:10] <starter> Appreciate all your help @bwalker. Thank you.
[13:08:13] <bwalker> np
[13:17:39] *** Quits: lhodev (~Adium@inet-hqmc07-o.oracle.com) (Quit: Leaving.)
[13:22:28] *** Joins: lhodev (~Adium@inet-hqmc05-o.oracle.com)
[13:27:11] *** Quits: lhodev (~Adium@inet-hqmc05-o.oracle.com) (Client Quit)
[13:27:27] *** Joins: lhodev (~Adium@inet-hqmc05-o.oracle.com)
[13:42:50] <jimharris> weird - the fedora-07 timing charts indicate the aggregate time is over 8 minutes
[13:51:05] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[13:55:44] *** Quits: vijay_ (89fe07a8@gateway/web/freenode/ip.137.254.7.168) (Ping timeout: 260 seconds)
[13:55:58] <jimharris> ah - the bdev loop in test/vhost/initiator/blockdev.sh was using the same enter/exit names for each of the bdevs which was throwing off the calculations
[13:56:01] *** Joins: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl)
[14:45:46] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[15:02:25] *** Quits: lhodev (~Adium@inet-hqmc05-o.oracle.com) (Remote host closed the connection)
[15:02:37] *** Joins: lhodev (~Adium@inet-hqmc05-o.oracle.com)
[15:04:39] *** Quits: lhodev (~Adium@inet-hqmc05-o.oracle.com) (Remote host closed the connection)
[15:05:15] *** Joins: lhodev (~Adium@66-90-218-190.dyn.grandenetworks.net)
[15:09:37] <lhodev> Apologize if this appears as a RESEND.   Lost my connection during my last post and a peek at the irclog online isn't displaying it:
[15:10:12] <lhodev> Hey guys:  I have a gerrit workflow question.   I submitted a patch to the SPDK which touched a number of files.  I understand that it requires a review process and so forth before/if it will be incorporated into master.   Let's say I want to do some new work which may or may not touch some of the same files, but it's altogether feature-wise a separate functional effort.   Is it suggested that I, say, (option-1) create a branch off fr
[15:10:12] <lhodev> om my most recently submitted work?   Logically, it would seem to me that's the way to continue working as opposed to, say, (option-2) starting with a branch off what's the latest/greatest upstream on master which would, of course, not contain any of the work I previously submitted.   Obviously, if the new work had a dependency on the previously submitted, then I'd need to do something like option-1.   But, if the new work is entirely o
[15:10:12] <lhodev> rthogonal to that previously just submitted, I could envision going the route of option-2.   Trying to understand the ramifications of the flows when multiple code submissions are in the review state before hitting master.
[15:16:12] <jimharris> hi lhodev
[15:16:19] <jimharris> nope - doesn't look like a resend
[15:16:20] <lhodev> Hi Jim
[15:18:34] <jimharris> your analysis is correct - if the new work is orthogonal, option #2 is the better route because then the two can be reviewed independently and the new work could be committed while the "first" piece of work is still getting worked on
[15:19:40] <jimharris> sometimes you could have two different patches that aren't really related but touch the same file (or maybe even the same part of that file) - in that case it just depends and either option would be ok
[15:20:53] <jimharris> if they are related, you definitely want to do option #1 - not only so that the later patches have the changes from the earlier patches, but also so it's clear when reviewing in gerrit that the patches are part of the same series
[15:21:46] <jimharris> you're probably already familiar, but the "Related Changes" section in the upper right of each review page shows where that patch is in relation to others (if any) in the patch series
[15:23:54] <lhodev> I may be more in scenario 2.   While the next work doesn't have a dependency on the previously submitted, it [the new work] is part of the "grand scheme" of replacing exit()'s and abort()'s, where possible, in lib code with returning failures instead.   I just didn't want to do ALL of those changes in one submitted commit, but instead piece-meal it so it's not so daunting reviewing all the various paths that are affected.
[15:24:14] <jimharris> thank you for breaking them up :)
[15:24:44] <jimharris> yeah - I agree, in that case even though the patches themselves don't touch the same code, they are part of a related series so I think having them in one series makes sense
[15:27:06] <lhodev> Thanks for your wisdom!  I'll proceed on that path, while hoping that given the # of files that are changing, the end result won't be a merge nightmare competing with other submitted code.
[16:35:36] <bwalker> so in vhost, the fd's that describe the memory are sent over the domain socket as ancillary data
[16:35:52] <bwalker> to receive them, we call recvmsg with a structure filled out that points to the ancillary data buffer
[16:36:05] <bwalker> if that data buffer isn't big enough, we get a flag MSG_CTRUNC to tell us so
[16:36:14] <bwalker> but I can't see a good way to handle that
[16:36:37] <bwalker> today everyone just assumes that 8 is the max that can appear
[16:37:01] <bwalker> but if the data gets truncated, there isn't any way in the protocol to say please send again
[16:37:21] <bwalker> and there isn't any message that "appends" the memory map - just one message that contains the whole memory map to replace the previous one
[16:38:33] <bwalker> this design just doesn't seem very good, but I don't currently see a way to fix it without modifying QEMU
[16:39:33] <jimharris> modifying QEMU seems drastic
[16:39:56] <bwalker> it's probably not a workable way forward at all
[16:40:20] <jimharris> so if vhost-user app sends 1024 regions (for example) - the vhost target sees 1024 for nregions and could read the vhost message with a dynamic buffer?
[16:40:44] <bwalker> but by the time I see 1024, I already had to call recvmsg
[16:40:52] <bwalker> which means I already had to know how big the ancillary data was
[16:41:04] <bwalker> because the fd's are associated with the message header
[16:41:16] <bwalker> if they were associated instead with the message payload, I could deduce their size by the payload size
[16:43:12] <jimharris> what about (just an idea) adding a vhost message that's somewhat specific to SPDK, where before the SET_MEMORY_TABLE message, we send a message saying how big the memory table will be?
[16:43:33] <jimharris> QEMU wouldn't need to send this - if the target doesn't get the message, it just uses the default (8)
[16:43:43] <bwalker> yeah - that seems feasible
[16:44:12] <jimharris> you could arguably probably get that into upstream DPDK even
[16:44:35] <jimharris> but there are other differences we have with upstream DPDK still that need to get resolved first
[16:44:40] <bwalker> I've got some other changes that make it so that the payload size is now dynamically allocated based on the message header instead of a #define too
[16:44:44] <bwalker> that could go upstream in DPDK
[16:45:06] <bwalker> because there are arrays of memory regions in the regular payloads too that used to be hardcoded to 8
[16:45:09] <bwalker> that I've solved
[16:45:29] <bwalker> it also allows for larger payloads to be placed on the heap
[16:46:15] <bwalker> while here I've also noticed that a ton of these struct definitions are not fixed-width
[16:46:26] <bwalker> i.e. they depend on what you are targeting with your compiler
[16:46:40] <bwalker> but since the two processes are communicating over a unix socket, they're not necessarily compiled in the same way
[16:46:49] <bwalker> the size of an int and an enum can change...
[16:46:57] <bwalker> padding can change, alignment can change
[16:47:02] <bwalker> it's a mess really
[16:47:14] *** Quits: starter (0cda5282@gateway/web/freenode/ip.12.218.82.130) (Ping timeout: 260 seconds)
[16:47:18] <bwalker> whoever wrote this hadn't implementing a binary wire protocol before clearly
[16:47:25] <bwalker> implemented*
[17:35:43] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97)
[18:26:32] *** Joins: James (~James@106.11.34.8)
[18:26:56] *** James is now known as Guest97308
[18:33:16] *** Quits: Guest97308 (~James@106.11.34.8) (Remote host closed the connection)
[18:35:43] *** Quits: lhodev (~Adium@66-90-218-190.dyn.grandenetworks.net) (Quit: Leaving.)
[20:28:09] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.82)
[22:56:37] *** Joins: sbasierx__ (sbasierx@nat/intel/x-almietdawzfjwnvp)