[01:26:54] *** Joins: pzedlews__ (~pzedlews@134.134.139.75) [01:27:15] *** Quits: pzedlews__ (~pzedlews@134.134.139.75) (Client Quit) [01:34:23] *** Joins: ziyeyang_ (~ziyeyang@192.55.54.38) [02:08:05] *** Quits: ziyeyang_ (~ziyeyang@192.55.54.38) (Ping timeout: 240 seconds) [03:14:58] jimharris: yes, https://review.gerrithub.io/#/c/402531, I see rare migration failures because vm_shutdown report failure. I want to see what failing there. [06:00:55] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-xxxmuhcppyadqqth) [07:00:43] @jimharris - By checkout do you mean "test it", especially with vhost?:) [07:17:51] hi param - you are right, check_format.sh does not autocorrect those types of errors, we should probably at least add some error messages there explaining that those errors need to be corrected by the user [07:18:00] pwodkowx: ack - we'll check it in today [07:18:18] klateck: yes - exactly [07:28:25] pwodkowx: https://review.gerrithub.io/#/c/402883/ looks good but I posted a comment [07:30:32] *** Joins: peluse_ (48d0c853@gateway/web/freenode/ip.72.208.200.83) [07:31:06] jimharris: so I was able to tweak the PMD driver and make some changes to the vbdev driver and can now provide src & dst buffers for encryption... [07:32:15] tweak the DPDK PMD? [07:32:34] yeah [07:32:38] cool [07:32:50] might not be acceptable for upstream DPDK though because of how we use external buffers [07:32:57] ugh [07:33:10] if you have time sometime between now and 9ish I can publish a lync meeting and review (well not now, 15 min from now) [07:33:22] but on the + side, small changes.... [07:33:45] let's try to do it on Monday [07:34:17] i'm WFH today but am going to be interrupted a lot to finish up this remodeling job [07:34:20] sounds good, that'l give me time to clean things up and do some more thinking about maybe another approach that might work for them. [07:34:26] cool [07:34:30] what are you remodeling? [07:35:47] both guest bathrooms - was supposed to be done more than week ago - my mom's flying in tonight so we're really finishing last minute :) [07:36:44] nice, just in time! [07:46:19] *** Quits: ed__ (d8f01e19@gateway/web/freenode/ip.216.240.30.25) (Ping timeout: 260 seconds) [07:52:44] drv: please take a look at https://review.gerrithub.io/#/c/402883/ [08:22:12] peluse_: thinking about how your module can get information about whether encrypt/decrypt in place is OK [08:23:01] for example, on writes, with iSCSI or NVMe-oF, encrypting in place is fine - but with vhost, it's not otherwise you end up encrypting the memory in the guest [08:24:01] for reads, if it's a zero-copy read - like in malloc, where we just return a pointer to the region in memory itself instead of allocating a separate buffer - you can't decrypt in place otherwise you end up decrypting the ramdisk [08:25:27] i think we want to avoid this being a per-IO flag type of thing, and instead make it a parameter of the bdev_open call [08:25:46] which should cover the write path [08:44:05] *** Quits: peluse_ (48d0c853@gateway/web/freenode/ip.72.208.200.83) (Ping timeout: 260 seconds) [09:02:04] Am I correct in my understanding that there is not, at least directly, a companion to API to spdk_event_allocate()? That is, there is no spdk_event_free()? [09:06:53] @lhodev: Yes, first spdk_event is allocated, then it can be enqueued to the reactor by spdk_event_call(). It is freed by reactor automatically after the event is processed. [09:08:46] Regarding the former query (re: absence of spdk_event_free()), can I get confirmation that I can move the call to spdk_trace_init() — done in spdk_app_start() — from its current position to BEFORE the call of spdk_app_setup_signal_handlers()? I think that would address the issue I'm attempting to resolve. [09:13:42] On my remove-exit()'s-from-lib-code mission, I'm now working on lib/trace. spdk_trace_init() currently calls exit()'s on initialization failures and so I want to return a failure instead. The "trouble" is that spdk_trace_init() is invoked, currently, in spdk_app_start() following spdk_app_setup_signal_handlers() where, therein, an spdk_event_allocate(), has occurred. I thusly was looking at how to "unwind" the allocations involved. [09:13:42] If I can just move spdk_trace_init() ahead of spdk_app_signal_handlers(), then I don't have to worry about free'ing up any resources (i.e. the event that was possibly allocated for g_shutdown_event). [09:15:53] @lhodev: This patch here https://review.gerrithub.io/#/c/401243. I don't see a problem with moving trace init higher up, since it does depend on reactor framework. [09:16:09] *does not [09:17:12] tomzawadzki: Cool. Thx for the confirmation. I'll just do that then. [09:22:57] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-xxxmuhcppyadqqth) (Ping timeout: 248 seconds) [10:00:19] jimharris: I made some comments on https://review.gerrithub.io/#/c/402883/ - I'm OK with merging it as is and cleaning it up later, though [10:04:40] drv: can you look at this one too? pwodkowx just wants this in to help debug some of the vhost intermittent failures √ [10:04:41] https://review.gerrithub.io/#/c/402531/ [10:05:15] he has the subsystem and bdev changes based on this patch for some reason [10:05:31] looks good to me √ [10:05:32] :) [10:28:50] *** Joins: ziyeyang_ (ziyeyang@nat/intel/x-qnrkfaqwavpapqdv) [10:32:34] *** Quits: ziyeyang_ (ziyeyang@nat/intel/x-qnrkfaqwavpapqdv) (Client Quit) [10:33:28] *** Joins: ziyeyang_ (ziyeyang@nat/intel/x-uuaindvhzvuhtaxv) [10:36:55] drv : jim : the code can be reviewed.. https://review.gerrithub.io/#/c/403211/ [10:41:37] jimharris: build fix after the subsystem change (nvmf didn't get updated): https://review.gerrithub.io/#/c/403355/ [10:53:13] done - is test pool paused? [10:54:09] sethhowe is working on it [10:54:34] Wow, there sure are lots of jobs queued… [10:55:08] drv : i see a message cannot merge in gerrit.. if i rebase will it need and approval again and can rebase be done from UI itself? [10:55:35] hi param, in this case, I think Gerrit won't be able to resolve the merge via the UI, so you'll need to rebase and push for review again [10:55:46] it should not require any intervention, though - we just renamed the file you modified, but git should be able to figure it out [10:56:14] okay...sure.. [10:57:13] lhodev: yeah - this is why we are pretty relentless about trying to keep the overall test time around 7 to 8 minutes per patch [10:57:44] we used to just have 2 or 3 test systems that ran every SPDK test - but that day has long passed [10:58:12] I'm glad I arrived late to the party then ;-). [10:58:56] I've merged the build fix patch [11:00:26] drv: Cool, thx. [11:03:05] Build Pool PSA: The build pool is currently running. The status page is just not updating. That will be fixed as soon as this current build finishes. [11:12:44] *** Quits: ziyeyang_ (ziyeyang@nat/intel/x-uuaindvhzvuhtaxv) (Ping timeout: 255 seconds) [11:12:57] drv : i have rebased the code and pushed it.. [11:21:11] *** Joins: ziyeyang_ (ziyeyang@nat/intel/x-vpzgvvkmbkakfhjd) [11:31:38] *** Quits: ziyeyang_ (ziyeyang@nat/intel/x-vpzgvvkmbkakfhjd) (Quit: Leaving) [13:05:03] *** Joins: leospdk (42718442@gateway/web/freenode/ip.66.113.132.66) [13:05:29] hello [13:05:45] hi leospdk [13:05:53] hi Jim, [13:08:44] I am playing a little bit with spdk and I am wondering which behavior should I expect form the SPDK API when there are hardware failures regarding the nvme disk. that would be (bad block in the drive for instance). [13:10:36] if you are using the spdk nvme api, the nvme completion entry will be passed to the callback function that you passed when submitting the I/O command [13:11:21] if your application is using the spdk nvme api directly, then you can decode the completion entry as needed [13:11:54] if it is being used through one of the spdk applications, then these completions go to the spdk bdev nvme driver and this will get translated to error status back through the spdk bdev (block) layer [13:25:17] I am using the API directly. 
I guess you mean the callback spdk_nvme_cmd_cb passed to spkd_nvme_ns_cmd_read/write. So I should look for decoding the spdk_nvme_cpl struct for errors? [13:25:21] In the event of a disk failure, Should I expect the completions to always be called to every read/write requests sent? OR should I expect some to timeout? [13:26:36] *** Joins: johnmeneghini (~johnmeneg@216.240.30.5) [13:27:16] *** Quits: johnmeneghini (~johnmeneg@216.240.30.5) (Client Quit) [14:15:44] question #1: correct - you should decode spdk_nvme_cpl for errors [14:16:09] question #2: you can register a timeout callback - see spdk_nvme_ctrlr_register_timeout_callback [14:16:25] or implement timeout tracking yourself in your application [14:16:51] *** Joins: peluse_ (48d0c853@gateway/web/freenode/ip.72.208.200.83) [14:17:04] if you use the spdk timeout, you will still be responsible for polling each queue pair - but if you poll a queue pair and it is detected that one of the I/O has timed out, it will invoke your callback [14:17:25] *** ChanServ sets mode: +o peluse [14:18:40] *** Joins: johnmeneghini (~johnmeneg@216.240.30.5) [14:18:57] spdk will only give you the notification - it is the application's responsibility to either try to abort the command or issue a reset [14:19:17] jimharris, drv I setup a 7am call for Mon (sorry morning options are tight), let me know if that works and I'll publish the WebEx for anyone that want to join - high level flow review of crypto vbdev and a chance to address some of the key questions that need answering before moving forward [14:19:48] jimharris, I also saw a few things on IRC earlier you were saying but not all, I only had history that was on my browser momentarily so, sorry, could you repeat? [14:20:04] I don't know if I will provide any coherent review at 7 AM :) [14:20:04] *** Quits: peluse_ (48d0c853@gateway/web/freenode/ip.72.208.200.83) (Client Quit) [14:20:44] LOL, yeah I can move to an afternoon if that's better I just figured morning we'd get some folks from Euro maybe [14:21:30] and its more discussion than strict "code review" kinda thing but following along the IO path will be a heckuvalot easier than trying to explain it [14:26:08] drv, 8am sound better? [14:26:13] yeah [14:26:24] cool, done deal [14:34:52] re: time conversion, DST switchover is this Sunday, I believe, so MST will be on the same time as PDT on Monday [14:36:04] peluse: basically how to encrypt/decrypt in place if it's ok and how you figure out in your driver if it's ok or if you need to allocate a separate buffer [14:36:20] we can talk more about it on monday [14:37:24] *** Quits: johnmeneghini (~johnmeneg@216.240.30.5) (Quit: Leaving.) [14:38:57] sure, I think simpler is better though, I'd rather not support both ways if we don't have to. The cost of not doing it in place is basically the amount of memory needed for all outstanding writes at any given point in time, there's no extra copy or anything. [14:39:25] but I can see either way working out just fine... [14:40:17] i'm thinking there's less cache thrash if its done in place [14:40:48] meanwhile I'll see if I can get a discussion going on the DPDK side about the AES PMD limitation as it's coded already to do src/dst in two buffers just with data in the mbuf so I'm not sure why the docs call out in place only. Need to spend some more time looking at it though..., [14:40:55] ah, true [14:40:56] we could always allocate buffers to start though [14:41:15] you make a good point - we don't have to do the in place stuff now [14:43:55] Hi Jim, [14:44:27] thanks for the answers I will try what you said. [14:50:17] Just one more question. If I detect some malfunction, Is there a way to set the hard drive state to failed? I would like to somehow use the hard drive LED indicator to show the user which of the drives in the server is bad. [15:09:10] *** Joins: jkkariu (~jkkariu@134.134.139.82) [15:45:42] jimharris, FYI I got a few QAT cards from tsg, started the process of getting them installed, configured, unbound, rebound, bla, bla, bla... [16:36:18] Hi all, I just submitted a new blog post about test script hierarchy diagram for review. https://review.gerrithub.io/#/c/403373/ Please check it out and leave your comments. Thanks!