[00:42:22] drv: thanks, I haven't seen it earlier. Answering your question - yes, go ahead with merging. [01:10:41] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-fyibjyzkyywudjxg) [02:34:14] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [02:58:10] Are there any plans to merge SPDK's lib/virtio back into DPDK? [02:58:31] I'm asking because DPDK currently doesn't have a reusable virtio library. [02:58:48] It would be nice to have a single virtio library (and a single vhost library!) [03:04:20] stefanha: none that I'm aware of [03:04:58] darsto: How different are SPDK and DPDK? Does it make sense to merge this code down into DPDK? [03:06:14] it is surely a good idea to merge it back into DPDK [03:06:29] but SPDK lib/virtio would still need quite a lot of work before that happens [03:07:33] SPDK lib/virtio was designed mostly for poll-mode [03:08:29] i'm afraid we'll face some problems when trying to make it usage for interrupt-based driver [03:09:04] usage -> usable [03:09:07] I got the impression that DPDK virtio is poll mode, although maybe it can switch to rx interrupts during idle periods (but then it switches back to polling). [03:09:51] The missing pieces shouldn't be too great. I'm working on a feature in DPDK that will require a reusable librte_virtio, [03:10:06] so my thought is to port across SPDK's virtio (and add missing bits as needed). [03:59:24] *** Quits: VKon (cf8c2b51@gateway/web/freenode/ip.207.140.43.81) (Ping timeout: 260 seconds) [06:32:43] gila: About sharing virtio and vhost in SPDK and DPDK, I recently sent a patch series to DPDK that adds another virtio device driver (not virtio-net). [06:33:06] gila: And it also extends the librte_vhost library. [06:33:20] gila: I'm aware that SPDK has its own virtio and vhost implements and there is some duplication. [06:33:35] For example, DPDK has it's own vhost-user-scsi implementation. [06:33:53] Okay, that perhaps is a little out of scope with what we did -- as we wrote a virtio-scsi client that works with SPKD but without the reactor pieces in it. [06:34:16] This allows client applications to chat with SPDK, without using PMDs [06:34:54] gila: I'm not sure I understand. Is there code somewhere I can look at? [06:35:22] Yes -- https://github.com/openebs/vhost-user [06:35:45] gila: I hope that long-term both DPDK and SPDK can share the same virtio, vhost-user master and vhost-user slave implementations. [06:35:53] gila: Cool, thanks! Taking a look [06:38:05] gila: Okay, I see. It's a vhost-user client library. [06:38:12] I have no idea if thats what you are looking for, an also have no idea if its suitable for integration with {S,D}PDK either, its one of the things we did in order to wrap our head around it, and determine how we can/should/want use it [06:38:19] stefanha yes. [06:39:59] gila: Thanks for sharing. It's not the same thing I'm looking at right now, but a vhost-user client library is a useful thing to have for efficient I/O between applications. [06:40:56] Yes in particular legacy apps (we hope) but as mentioned we are very much still trying to find the right model for our uses cases [06:43:26] gila: The hard thing about integrating it into existing applications is usually that the API isn't what legacy code expects (e.g. a block device node or POSIX file I/O). [06:44:11] gila: There is no "block devices in userspace" in mainline Linux, as far as I know, and FUSE isn't as performant as using vhost-user directly. [06:44:44] GlusterFS and other distributed file systems have similar challenges. It's hard for existing apps to integrate and still get native performance. [06:44:58] (because they are not kernel file system drivers but userspace libraries) [06:45:42] Yes, exactly. [06:46:23] I've also recently found out about "buse" (block device in user space) but I think using that will incur lots of performance hits as well as you in and out of kernel [06:47:59] It might never get upstream, so could become a headache to keep an out-of-tree kernel module going with future kernel versions. [06:49:43] its actually based on the NBD framework which is already in kernel iirc. [06:51:15] gila: But does BUSE have kernel code? You can already implement an NBD server today and attach it as a block device. I guess BUSE adds additional kernel code to make it more efficient for local processes. [06:52:43] it does not come with any kernel code, and perhaps its just a regular NBD device and its just called that way =) dunno. [06:54:06] NBD transfers data over sockets, so it's not that efficient (context switches + data copies). [06:54:26] I think the vhost-user approach your taking will outperform it a lot. [06:56:58] we get almost the same performance as when running fio directly with the SPDK nmve plugin -- if we sleep, using poll() its not that fast yet, still looking in to that [08:11:42] stefanha: taking this one part at a time... [08:11:47] ...first - rte_vhost [08:13:06] generally a lot of work went into upstream DPDK for rte_vhost to support non-net protocols - one of the things we ran into with storage is that both emulated BIOS and the operating system initialize the virtio device, where for network, BIOS never initializes it [08:14:05] we've made changes in the SPDK version to handle that - initially DPDK was reluctant to take these changes, since they aren't needed for networking - we need to push on this again [08:14:15] second - virtio [08:15:31] the DPDK rte_virtio PMD is very network specific - we (darsto and I) had to do a lot of work to get it to work for storage [08:16:16] but once darsto is finished with the virtio standalone library, i don't see any reason why DPDK couldn't build virtio-net on top of it [08:31:43] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-fyibjyzkyywudjxg) (Ping timeout: 248 seconds) [09:00:50] jimharris: I think that DPDK will need a generic virtio library fairly soon for the vhost-pci/virtio-vhost-user work that Wei Wang, Zhiyong Yang, and I have been doing in DPDK. [09:01:28] jimharris: That's why I hope to share a virtio library with SPDK [09:02:02] It would be a shame to do pretty much the same work that was done in SPDK again in DPDK and end up with two virtio libraries. [09:04:59] stefanha: agreed [09:05:15] jimharris: The vhost-pci/virtio-vhost-user feature allows *PDK to run inside a VM (instead of a vhost-user process on the host) to provide a vhost device backend to another VM. [09:05:47] jimharris: I have tested that DPDK's vhost-scsi works with virtio-vhost-user. You can run a VM that emulates the vhost-user-scsi device for another VM. [09:06:12] jimharris: The reason why you might want this is for deploying in a cloud or other environment where it's not feasible to run host userspace processes. [09:06:28] (In a cloud everything is a VM and the provider will not allow you to run custom code on the host) [09:06:56] jimharris: When this feature is merged into DPDK rte_vhost it would be cool if SPDK could also take advantage of it. [09:08:29] More info here: https://wiki.qemu.org/Features/VirtioVhostUser [09:08:36] yes - this sounds like an excellent feature - I'm assuming there's QEMU work also involved to enable this? [09:09:01] jimharris: Yes, there is a new virtio-vhost-user device for QEMU. [09:09:25] Wei and Zhiyong have already been working on this for networking for a while and it was called vhost-pci. [09:15:48] thanks stefanha - this helps with the immediacy of getting the SPDK and DPDK rte_vhost libraries synced up again [09:37:24] jimharris: can you take a look at https://review.gerrithub.io/#/c/386546/ and the following patch? (vhost-blk updates to work with unmodified qemu) [09:44:38] I fixed a doc issue on 386546 [09:44:48] otherwise looks good - will +2 once it runs through test pool [09:51:40] ok, thanks [11:15:21] I've merged the vhost-blk update series to master [13:24:26] jimharris: I made a couple of minor comments on the sock_group review: https://review.gerrithub.io/#/c/398969/ [13:28:57] thanks - I'm working on the comments for this header file [13:29:17] updated the patch with your other comment though [13:32:20] cool [14:01:16] Hey guys, Lance here. I've a question regarding dev/test activities with respect to gerrithub. Is there a *special* ref to which one can push to for no other purpose than to build and test withOUT triggering a review in an iterative manner, and such that the artifacts/results expire and get removed "quickly" (e.g. a day or two)? I understand if the backend just isn't equipped for handling the possibility of such a load, but figured I'd [14:01:16] inquire in case it's a possibility (and maybe even something others are actively using). [14:02:38] lhodev: nothing like that currently, but reviews that are abandoned will get their builds cleaned up relatively quickly [14:03:08] so you could just push something for review, then abandon it [14:03:37] can you elaborate on your use case? [14:11:00] Do I understand correctly that in the course of a normal code submission flow, that the continuous integration mechanism results in builds and tests across multiple servers each of which invokes "configure" with different options and respective tests? Was considering the scenario that, say, someone made a change that is a common code path and the developer tested it with a NVMe device, but did not have the infrastructure in place for NVM [14:11:01] eoF. And, so for a greater confidence that nothing untoward happened in that type of configuration, the ability to have it exercised/tested via gerrithub would add value before the person initiated the formal review. [14:17:47] drv jimharris: can we run some kind of autotest loop on TP? [14:18:29] lhodev: currently, the test pool isn't really provisioned with enough capacity to serve as a testbed for individual contributors; it runs the reviews one at a time, at about 8 minutes per test [14:19:31] we are working on making the test scripts easier to run on developer machines (a lot of things are hard-coded for our test environment currently, but that is being fixed) [14:19:51] pwodkowx: do you mean running autotest on current master during idle times, or something like that? [14:20:02] I am getting random failures b [14:20:22] drv: yes [14:20:53] it does look like vhost tests are randomly failing after the qemu update... [14:22:04] this one looks like it failed because fio detected data corruption: https://ci.spdk.io/spdk/builds/release/master/4256/fedora-07/build.log [14:22:10] "verify: bad magic header 0, wanted acca at file VirtioScsi3t0 offset 87117824, length 4096" [14:22:39] the following build failed in the same way [14:23:09] *** Joins: VKon (cf8c2b51@gateway/web/freenode/ip.207.140.43.81) [14:24:19] yeah, and https://review.gerrithub.io/#/c/399236/ also, only commit message change to get it V+1 [14:24:33] and other also [14:25:39] pwodkowx: I think it's an interesting idea, but I don't know how complex it would be to set up in our current system [14:26:15] probably worth making a feature request with the team setting up Jenkins [14:26:55] you mean Karol? [14:27:30] yes [14:27:59] which exact commit did you installed on TP? [14:31:31] Hi guys, I was running to an issue with spdk v17.03, where having high number (~200) of 1GB huge pages resulted in "nvme.c: 241:nvme_driver_init: ***ERROR*** primary process failed to reserve memory". I updated to v18.01 and don't see the issue any longer. Wanted to check whether its something you're aware of [14:32:40] pwodkowx: the qemu version on the test machine is https://github.com/spdk/qemu/tree/spdk-2.12-pre [14:35:02] VKon: did you also upgrade DPDK? I don't think we made any changes that would have affected this case in SPDK [14:36:21] yes upgraded the dpdk as well, thought it might be an issue there but you guys are easier to get in touch with [14:41:27] drv: hmmm... diff fb2516ef94b8399ddded4a41c4346cbc1d5d98fc vs 008a51bbb343972dd8cf09126da8c3b87f4e1c96 show zilion of commits [14:42:11] zillion = 146? :) [14:43:03] there's a few that mention vhost in the commit message, but they don't look relevant (just test changes, as far as I can tell) [14:43:14] is it only fedora-07 that's failing due to the qemu update? [14:44:09] seems to be so far [14:44:18] (3 cases that I see in the build history since the qemu update) [14:44:41] it is during the virtio-inside-VM tests, I think [14:44:50] (using our virtio initiator) [14:45:14] which only runs on fedora-07 [14:45:41] https://ci.spdk.io/spdk/builds/review/573549515a1a3a04e0dabbde244ec9857f2cf32c.1518209425/fedora-07/build.log [14:46:08] this is the initiator faile [14:46:16] yep, that looks like the same thing [14:46:17] "verify: bad magic header 0, wanted acca at file VirtioScsi3t0 offset 54333440, length 4096" [14:46:51] but why qemu update would affect initatior? [14:47:33] we use much more devices in fedora-07 [14:47:49] is this possibly related to the mem barrier changes (that we haven't merged yet) from darsto_? [14:48:07] I have suspicion that increasing vq count in test scripts triggered this [14:48:11] my first shot at this would be to specify serialize_overlap in fio [14:48:56] it enforces I/O submission ordering [14:53:27] anyway it is friday night (CET time:P ) what do we do with this issue? do we revert qemu update? [14:54:14] does it fail frequently? [14:55:39] *** Quits: EdR_ (d8f01e19@gateway/web/freenode/ip.216.240.30.25) (Ping timeout: 260 seconds) [14:56:35] "frequently" == "run-many-times-and-we-can-generate-statistics" [14:58:57] I don't see where the vq count would have changed in test scripts [15:01:52] https://review.gerrithub.io/#/c/395986/ [15:03:24] hmm, but that's been merged for a few days - did we see any issues before the qemu update? [15:04:38] dont think so but didt we [15:04:52] "run-many-times-and-we-can-generate-statistics" ? [15:06:40] we have qemu updated and merged https://review.gerrithub.io/#/c/399121/ [15:07:21] drv: fio is failing specifically on that emulated QEMU virtio-scsi disk [15:08:13] my first guess, based on the error message, is that we are completing an I/O up the bdev stack before it's actually done somehow [15:08:29] since the buffer contents are 0 [15:09:12] I don't say that this commit is guilty but we did not tested those crucial changes intensively [15:10:16] hm? but it's virtio-scsi that's failing [15:10:32] yeah, the last change mentioned above is just for virtio-blk [15:10:38] not related to SPDK vhost at all [15:10:39] so I don't see how that could affect the virtio-scsi tests [15:11:30] https://review.gerrithub.io/c/399261/ [15:11:56] we'll need that patch sooner or later as virtio-scsi standard allows I/O reordering [15:13:39] we had outdated fio version that didn't support this option yet, so there was no pressure on adding this param [15:13:56] what does this actually do? we probably need that for any bdev backend, not just virtio-scsi [15:14:08] yup [15:14:09] e.g. NVMe spec allows I/O to be completed in any order as well [15:14:28] fio by default expects I/O to finish in the same order as it was submitted [15:14:38] seems like an FIO bug if it doesn't handle that by default... [15:14:46] :) [15:14:53] (unless it's documented somewhere that FIO plugins are supposed to do that as well) [15:15:38] well, if this patch passes, I can re-run it a few times and see if we can reproduce the failure [15:15:54] otherwise, I guess we should revert the qemu update series until someone has time to investigate [15:16:40] seems like it's not happy [15:16:46] Bad option [15:16:46] fio: job global dropped [15:17:44] Hello, I am getting this error when launching VM that uses Vhost [15:17:57] qemu-system-x86_64: total memory for NUMA nodes (0x40000000) should equal RAM size (0x80000000) [15:18:06] I think this has to do the hugepage size [15:18:41] I have 2048 hugepages of size 2048 kb [15:18:48] jkkariu: can you pastebin your qemu launch command? [15:18:53] it's probably a configuration issue [15:19:11] qemu-system-x86_64 --enable-kvm -m 2048 \ [15:19:11] -cpu host -smp 4 -nographic \ [15:19:11] -object memory-backend-file,id=mem,size=1G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem \ [15:19:11] -drive file=/home/john/debian.qcow2,if=none,id=disk \ [15:19:11] -device ide-hd,drive=disk,bootindex=0 \ [15:19:12] -chardev socket,id=spdk_vhost_scsi0,path=/home/john/spdk/app/vhost/vhost.0 \ [15:19:14] -device vhost-user-scsi-pci,id=scsi0,chardev=spdk_vhost_scsi0,num_queues=4 \ [15:19:16] -net user,hostfwd=tcp::10022-:22 -net nic \ [15:19:57] you need to specify either -m 1G, or -object memory-backend-file,size=2G [15:20:01] these 2 have to match [15:21:32] _darsto: if you are saing is true fio would be useless. 'filename=' can gave path to any device/ file etc. so IO could end in any order [15:21:37] darsto_: do you know how /root/fio_src gets into the VM in this test? seems like it is not copied in the test script anywhere [15:21:48] (so it's probably outdated) [15:21:50] i'm trying to find out [15:21:54] might be [15:21:57] it's probably baked into the VM image [15:22:11] should be copied in from the host /usr/src/fio, probably [15:22:24] along with the spdk source [15:23:08] we should really get around to scripting creation of these test VMs from install ISOs or other clean sources so other people can reproduce them [15:23:19] rather than some mystery-meat disk image [15:24:41] pwodkowx: what can i say... [15:24:44] http://fio.readthedocs.io/en/latest/fio_doc.html [15:24:48] serialize_overlap [15:25:34] hmm, this seems odd [15:25:45] fio generates random I/O that can potentially overlap? [15:26:14] I made the change and the VM launched [15:26:16] thanks [15:26:48] I have one more question, does the value need to be less than equal to huge pages used by Vhost [15:28:20] jkkariu: it doesn't really have anything to do with vhost [15:28:57] is the max the size of hugepages on the system [15:29:00] you can use as many hugepages in VM as you like [15:31:18] you know it's Friday afternoon when there's talk of mystery meat on #spdk [15:31:42] :) [15:32:14] jimharris: +1 for you TP give -1 so I cant made my mind [15:32:16] jimharris: any blinding insights or general opinions on what to do about this new qemu/virtio issue? [15:34:33] we could revert QEMU virtio-scsi-pci tests for now [15:34:45] https://review.gerrithub.io/c/395986/ [15:37:26] I know this is first shot but before be blame some one (innocent or not), can you prepare revert and run it several times? [15:38:57] I run out of beer. I will be back on monday, cheers! [15:39:33] monday? good luck with that! [15:45:26] drv: maybe updating the fio on fedora-07 VM will fix these issues just by itself [15:46:30] sounds like a good idea anyway [15:46:52] adding a second scp in the script to copy the host's version of fio src in there would be a good starting point, I think [16:51:43] drv, OK I made all the channel changes we talked about today w/Jim and it compiled the first time so I'm *really* worried now :) [16:51:51] haha [16:51:52] good luck [16:52:08] if you apply enough (void*) casts, everything will compile the first time ;) [16:52:15] yeah, time to open a Coors Light for debug... thanks for all the help, both of you guys [16:52:27] LOL [16:55:14] darsto_: I think your scp needs a trailing slash like :/root/fio_src/ [16:55:25] otherwise it will try to copy into /root/fio_src/fio/... [16:59:52] drv: you sure? why would it work for `/root/spdk` below? [17:01:22] well, actually the trailing slash might need to be on the source path [17:01:27] I always get confused by this [17:01:34] I think it's the same behavior as rsync [17:01:52] the spdk one probably works because the target doesn't exist yet [17:04:31] drv, wow, it almost works right out of the chute! Doing IO, just hanging in my ch destroy cb for some reason. Good time to call it a day! [17:04:40] nice [17:05:21] darsto_: I tweaked the commit msg on your patch that disables the qemu virtio-scsi-pci tests - if it looks ok to you, I'm going to merge that until we figure out what is going on [17:05:31] +1 [17:05:51] * drv wonders why darsto_ is up and working on SPDK at 1 AM anyway [17:06:11] oh cmon [17:07:15] :) [17:17:54] I think what we need is "-r /usr/src/fio/ 127.0.0.1:/root/fio_src" (slash on source, not destination) [17:18:08] or just avoid the problem entirely and copy it somewhere else, and fix up the rest of the script to match... [18:05:03] *** Quits: VKon (cf8c2b51@gateway/web/freenode/ip.207.140.43.81) (Quit: Page closed)