[00:42:03] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 252 seconds)
[00:46:12] *** Joins: tkulasek (~tkulasek@134.134.139.73)
[00:58:43] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-vycrwbmpoogxcaxx) (Remote host closed the connection)
[01:00:40] *** Joins: tzawadzki (~tomzawadz@192.55.54.42)
[01:06:05] *** Quits: tzawadzki (~tomzawadz@192.55.54.42) (Ping timeout: 240 seconds)
[01:06:39] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-mghkothqgiosbpsl)
[01:21:13] *** Quits: guerby (~guerby@april/board/guerby) (Read error: Connection reset by peer)
[01:21:25] *** Joins: guerby (~guerby@ip165.tetaneutral.net)
[01:21:25] *** Quits: guerby (~guerby@ip165.tetaneutral.net) (Changing host)
[01:21:25] *** Joins: guerby (~guerby@april/board/guerby)
[04:32:30] *** Joins: dlw1 (~Thunderbi@114.255.44.139)
[04:34:02] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 256 seconds)
[04:34:02] *** dlw1 is now known as dlw
[04:46:38] *** Quits: dlw (~Thunderbi@114.255.44.139) (Ping timeout: 256 seconds)
[05:11:34] *** Joins: dlw (~Thunderbi@114.255.44.139)
[05:49:24] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-mghkothqgiosbpsl) (Ping timeout: 256 seconds)
[05:54:24] *** Joins: lyan (~lyan@2605:a000:160e:2dd:4a4d:7eff:fef2:eea3)
[05:54:47] *** lyan is now known as Guest27255
[06:09:58] *** Quits: dlw (~Thunderbi@114.255.44.139) (Ping timeout: 264 seconds)
[06:11:48] <pwodkowx> bwalker: spdk_tgt: blobstore.c:2737: _spdk_bs_load_used_clusters_cpl: Assertion `ctx->mask->length <= (ctx->super->used_cluster_mask_len * sizeof( struct spdk_blob_md_page) * 8)' failed.
[06:12:59] <pwodkowx> can those asserts be converted into failures or trigger some kind of recovery?
[06:26:18] *** Joins: tkulasek_ (~tkulasek@192.55.54.42)
[06:29:24] *** Quits: tkulasek (~tkulasek@134.134.139.73) (Ping timeout: 265 seconds)
[07:21:56] *** Joins: tomzawadzki (~tomzawadz@134.134.139.74)
[07:24:27] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca)
[07:26:17] <tomzawadzki> @bwalker: Can I ask for review on delete_lvol_bdev patch series? Would really appreciate feedback, if needed, on those. https://review.gerrithub.io/#/c/spdk/spdk/+/407402/
[07:55:46] *** Quits: tomzawadzki (~tomzawadz@134.134.139.74) (Ping timeout: 256 seconds)
[08:05:01] <jimharris> pwodkowx: any idea what steps led up to that assert?
[08:05:21] <jimharris> i agree these shouldn't assert and should just fail in some way - but we really shouldn't be hitting that condition
[08:07:21] <jimharris> could you try running blob_cli -D on that bdev?
[08:52:17] *** Joins: peter_turschm (~peter_tur@66.193.132.66)
[09:11:53] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Read error: Connection reset by peer)
[09:12:18] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca)
[09:30:44] <jimharris> peluse: so the new webex meetings will use both a meeting number/access code, as well as this password?
[09:31:09] <peluse> yes
[09:31:32] <peluse> well, I haven't tried calling in but that sure looks like how it will work.  We'll find out at the next one :)
[09:36:12] <bwalker> peluse: could write webex password into an image and save it
[09:36:15] <bwalker> then put it up on the website
[09:37:12] <peluse> Yup, I can do that - good diea!
[09:37:14] <peluse> idea
[09:37:47] <peluse> still need address a few of your crypto comments first so maybe later this week :)
[09:37:50] <peluse> thanks for the review BTW
[09:38:50] <darsto> bwalker: hmm, blobstore still leaks io_channels in case of failed lvol examine
[09:39:26] <darsto> at the time bdev_register() finishes, blobstore still has an io_channel held
[09:41:06] <peluse> bwalker, BTW by "use one of our pools" where I'm currently allocating a buffer for writes, you do mean use the callback driven spdk_bdev_io_get_buf() right?
[09:42:02] <darsto> i'm not sure if that's the bug causing dpdk 18.05 patch to fail, but it must be fixed as well
[09:42:06] <bwalker> peluse: yep exactly
[09:42:33] <peluse> gracias
[09:42:48] <bwalker> darsto: which io_channel is it? metadata channel?
[09:42:49] <darsto> i think my one and only patch for blobstore was addressing that back in the day - https://review.gerrithub.io/c/spdk/spdk/+/388424
[09:43:59] <bwalker> if there is a fairly simple way to reproduce it, I can take a look
[09:44:03] <bwalker> and see what's going wrong
[10:21:06] <jimharris> root cause (i think) on the fedora-02 failure with dpdk 18.05 - the aio bdev module never unregisters its io_device
[10:21:59] <jimharris> this manifests itself by aio freeing the memory that was the io_device, bdev/part does a calloc and gets this same buffer and tries to register it as an io_device
[10:22:24] <jimharris> something about fedora-02 + dpdk 18.05 allowed these steps to occur - testing patch now to confirm
[10:25:50] <peluse> nice
[10:26:28] *** Quits: peter_turschm (~peter_tur@66.193.132.66) (Remote host closed the connection)
[10:40:14] *** Joins: peter_turschm (~peter_tur@66.193.132.66)
[11:04:22] *** Joins: travis-ci (~travis-ci@ec2-54-147-133-39.compute-1.amazonaws.com)
[11:04:23] <travis-ci> (spdk/master) blobfs: add the check for buf allocation (Ziye Yang)
[11:04:23] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/0de89b361735...e152aa8e5179
[11:04:23] *** Parts: travis-ci (~travis-ci@ec2-54-147-133-39.compute-1.amazonaws.com) ()
[11:30:13] <jimharris> peluse: i've cherry picked your two crypto dpdk 18.02 patches to dpdk 18.05:  https://review.gerrithub.io/#/c/spdk/dpdk/+/420448/
[11:30:44] <jimharris> i removed the common_base changes though - i couldn't see why there were needed
[12:44:39] *** Quits: tkulasek_ (~tkulasek@192.55.54.42) (Ping timeout: 256 seconds)
[13:21:21] *** Joins: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241)
[13:59:00] *** Quits: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) (Ping timeout: 252 seconds)
[14:05:06] *** Joins: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241)
[14:33:17] <klateck> @jimharris fyi - replied to your comment @ https://review.gerrithub.io/#/c/spdk/spdk/+/419510/
[14:35:22] <jimharris> klateck: looking now...
[14:36:35] <jimharris> for configuring subsystems, what are you thinking that would look like?
[14:36:47] <jimharris> would the subsystems show up in the ls/tree output, with the parameters that can be changed?
[14:36:58] <jimharris> or just rpc names that you execute from the command line?
[14:48:48] *** Joins: philipp_sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca)
[14:48:48] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Read error: Connection reset by peer)
[14:50:59] *** Quits: Guest27255 (~lyan@2605:a000:160e:2dd:4a4d:7eff:fef2:eea3) (Quit: Leaving)
[15:07:59] <alekseymmm> Hello
[15:08:17] <alekseymmm> in bdev module  funcitions there is callback void (*init_complete)(void);
[15:08:31] <alekseymmm> in its description it is said
[15:08:33] <alekseymmm> 	 * Optional callback for modules that require notification of when 	 * the bdev subsystem has completed initialization.
[15:08:47] <alekseymmm> but what does it mean the bdev subsystem has completed initialization ?
[15:09:35] <alekseymmm>  I mean will it be called after all .module_init of all the modules are  called ?
[15:10:17] <alekseymmm> or is it about this particular module ?
[15:10:18] <bwalker> not only after they're called, but after they've reported they're done initializing
[15:10:36] <bwalker> it's global - that notification is called on each module after all modules have completed their initialization
[15:10:45] <alekseymmm> when do the modules notify that they are done initializing?
[15:11:11] <alekseymmm> i mean which function notifies
[15:11:41] <bwalker> if a bdev module's module_init function returns 1 (greater than 0 really), that means it is asynchronously initializing
[15:11:48] <alekseymmm> I have noticed that none of the modules use this .init_complete callback
[15:11:56] <alekseymmm> oooh
[15:12:07] <bwalker> when a module is done asynchronously initializing, it must call spdk_bdev_module_init_done
[15:12:37] <bwalker> the init_complete thing will be called on each module only after all bdev modules have finished initializing (account for asynchronous initialization)
[15:13:15] <alekseymmm> do any of the existing modules perform initialization asynchronously?
[15:14:18] <bwalker> I feel like we added that because some did
[15:14:23] <bwalker> but I am still looking for one
[15:14:59] <alekseymmm> i greped .init_complete in lib/bdev .It seems none
[15:15:25] <bwalker> nothing uses init_complete I don't think - that was added for someone that wanted that particular notification for their own bdev module
[15:15:34] <bwalker> but I thought some module initialized asynchronously
[15:15:44] <bwalker> however, this is older than the examine callback mechanism we have now
[15:15:58] <bwalker> so now I'm thinking we may have moved everything to that examine callback and the initializations are all synchronous
[15:16:11] <alekseymmm> It is better to use examine mechanizm instead of init_complete?
[15:16:51] <jimharris> yeah - the init_complete should have been removed from the bdev_module
[15:17:39] <bwalker> there are flags in the bdev module that indicate whether module_init and module_fini can be asynchronous
[15:17:50] <bwalker> and I see two modules set async_init to true
[15:17:50] <alekseymmm> We have spdk_bdev_module_examine_done(); I think it could make the logic of init_complete
[15:18:01] <bwalker> the iscsi initiator and the virtio initiator
[15:19:09] <alekseymmm> So you gonna leave init_complete as is , but you suggest me not to use it. Use examine instead ?
[15:19:14] <jimharris> one sec
[15:19:50] <bwalker> I think there is a lot of duplicated functionality here - the iscsi initiator is reading the configuration file and doing asynchronous connects in .module_init
[15:20:38] <bwalker> well, I guess that's the only time that can run
[15:20:48] <jimharris> the init_complete was added so that modules could know when all initial bdevs had been identified - then modules like RAID could make decisions about partially discovered volumes
[15:22:05] <jimharris> see a4a497d5b and cbb8f4657
[15:22:26] <alekseymmm> you mean initial_complete is called after all the examine done?
[15:22:45] <jimharris> correct - that's the idea (although nothing is using it yet, including the newly added raid bdev module)
[15:23:31] <bwalker> I think everything is in order with the API now that I look it over (not sure the async_init flag is actually required, but that's minor)
[15:23:35] <bwalker> except, of course, the documentation
[15:24:03] <bwalker> we just created the bdev_module.h header not long ago to try to formalize the bdev module API
[15:24:22] <bwalker> now that the API is in a single header, we need to write a bunch of documentation on what everything means
[15:26:33] <bwalker> a lot of the movement on this front is all fallout from the lab we did at the SPDK summit about how to write a bdev module
[15:26:56] <bwalker> we'll have a full programming guide eventually
[15:27:12] <alekseymmm> unfortunately I missed this summit and workshow
[15:27:26] <alekseymmm> waiting for it
[15:28:46] <bwalker> if you have any more questions just ask and I'll add it to my notes about what the guide needs to include
[15:32:24] <alekseymmm> This question was from the fact that I was trying creating my own bdev and test it  with fio. And if I register my bdev (it is similar to passthru) in examine it works well, but if i register in init_cmplete I got coredump of fio when it runs with numjobs >1. So i a,m trying to understand the difference
[15:32:58] <alekseymmm> thanks for explanations though
[15:33:03] <bwalker> if your bdev claims another bdev
[15:33:09] <alekseymmm> it is
[15:33:18] <bwalker> you should be registering it in the examine callback for the bdev you are claiming
[15:33:48] <alekseymmm> why cannot I register in init_complete of my module ?
[15:34:11] <alekseymmm> I mean if init_complete is called after all the examine then it sounds posssible
[15:34:13] <bwalker> I'd have to look at the crash - honestly I think that would work too
[15:34:39] <bwalker> the examine mechanism is specifically for this case, but init_complete is called late enough in the process that everything should be ready
[15:34:46] *** Quits: philipp_sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Ping timeout: 256 seconds)
[15:36:11] <alekseymmm> what does it mean "late enough" ? Is it called by timer or smthg like it ?
[15:36:28] <bwalker> I mean it is called after all the examines have run
[15:36:37] <bwalker> so I would think it is fine to claim a bdev there
[15:36:47] <bwalker> I'd have to see your specific crash - it could be a bug
[15:37:12] <alekseymmm> may be it is a bag. I will try reading it more carefully
[15:37:42] <alekseymmm> thanks for all the advices. I think I just switch to using examine instead of init_complete as everyone do
[15:42:43] <alekseymmm> In the code in spdk_bdev_modules_init there is a line  rc = module->module_init(); And if as you said some module init returns 1 (something about async init) then this rc  will be returned to spdk_bdev_initialize()
[15:42:57] <alekseymmm> and result will be SPDK_ERRLOG("bdev modules init failed\n");
[15:43:02] <alekseymmm> spdk_bdev_init_complete(-1);
[15:43:06] <alekseymmm>  is it ok ?
[15:44:51] <bwalker> it looks like I was wrong - we changed the behavior I think
[15:45:02] <bwalker> it isn't based on the return code from module_init anymore
[15:45:16] <bwalker> probably changed it a year ago or something and I just don't remember
[15:45:33] <bwalker> there is another flag called "async_init" in the module structure
[15:45:43] <bwalker> if you set that to true, then module_init is treated as asynchronous
[15:46:06] <bwalker> so .module_init returns 0 on success, non-zero on failure
[15:46:14] <bwalker> and the asynchronous part is based on that other .async_init flag
[15:47:22] <alekseymmm> ok thanks
[15:52:52] *** Joins: tomzawadzki (~tomzawadz@134.134.139.73)
[15:54:40] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97)
[15:54:56] <peluse> been meaning to ask this for months.  When running w/gdb an assert() that hits does a SIGABRT and I can't break at the point however if I temp hack in an if and raise(SIGINT) for the condition I can a nice break in gdb and can step on through.  Is there a better way to get this preferred behavior or am I doing something wrong?
[16:04:18] <lhodev> Anyone do any tests with fio lately with numjobs > 4.   I'm chasing down a report of a segfault occurring (ioengine=spdk).   I found that once I started bumping up the numjobs over 4, then I started seeing segfaults.   When I run with 8, I get the segfault every time.
[16:07:42] <lhodev> The I/Os appear to run ok, but as I watch the fio timer click down to zero, it's at that point I consistently see the segfault.
[16:07:48] <alekseymmm> @lnodev i have simillar issues but it is probably due to bugs in my code. What bdevs do you test?
[16:08:21] <lhodev> alekseymmm:   I'm not using any bdev's.   Just fio straight to an nvme drive.
[16:09:36] <alekseymmm> How many cores on your system ?
[16:10:48] <lhodev> alekseymmm:   I've got 24 cores on this particular system, but the one on which it was reported had many more.
[16:11:21] <lhodev> Examining the core files with gdb, I consistently see the same backtrace.   It's dying in spdk_fio_cleanup().
[16:11:59] <lhodev> The call to pthread_cancel() crashes.
[16:13:21] <bwalker> is there an errno that it prints out
[16:13:26] <bwalker> or that you can get to
[16:13:50] <lhodev> bwalker:  no
[16:14:48] <bwalker> it actually crashes in pthread_cancel?
[16:14:59] <bwalker> is it trying to cancel itself or something?
[16:15:18] <lhodev> bwalker:  yes, it dies in pthread_cancel().   I'm left to speculate that the passed in g_ctrlr_thread_id is bad.
[16:15:49] <bwalker> sure - but the function is supposed to return ESRCH on a bad thread id
[16:16:04] <bwalker> so crashing is probably something much worse
[16:16:13] <bwalker> the only thing I can think of is cancelling self
[16:16:33] <bwalker> could add a check for whether g_ctrlr_thread_id is pthread_self right before the call
[16:16:35] *** Quits: tomzawadzki (~tomzawadz@134.134.139.73) (Ping timeout: 240 seconds)
[16:16:37] <bwalker> and see if it hits
[16:16:44] <jimharris> lhodev: this is the nvme fio_plugin?  not bdev fio_plugin?
[16:17:06] <lhodev> I've never written a custom fio engine, so I'm completely in the dark starting off with this.
[16:17:22] <lhodev> jimharris:   it's the nvme fio_plugin   -- not the bdev.
[16:18:05] <jimharris> everything runs ok until the end of the test run?
[16:19:02] <lhodev> jimharris:  yes, it appears to run fine up until runtime expires, then it segfaults, but only (so far) if numjobs > 4.
[16:19:14] <lhodev> if numjobs is 6, then sometimes it completed fine, but sometimes not.
[16:19:24] <lhodev> if numjobs is 8, it segfaults consistently.
[16:26:39] <lhodev> bwalker:  I tried your experiment testing pthread_self() against g_ctrlr_thread_id.   In several instances, they do *not* match.
[16:32:14] *** Quits: alekseymmm (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) (Quit: Page closed)
[16:36:57] <lhodev> Given that spdk_fio_cleanup() apparently is invoked multiple times -- once per thread ??? -- I tried adding the modifier "__thread" to the declaration of g_ctrlr_thread_id, but that doesn't change the behavior.
[17:05:55] <lhodev> bwalker:   I've yet to see an instance where a thread attempted to cancel itself.
[17:28:58] *** Joins: lyan (~lyan@2605:a000:160e:2dd:4a4d:7eff:fef2:eea3)
[17:29:30] *** lyan is now known as Guest25520
[17:38:08] *** Quits: peter_turschm (~peter_tur@66.193.132.66) (Remote host closed the connection)
[18:41:08] *** Joins: dlw (~Thunderbi@114.255.44.143)
[18:52:42] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 252 seconds)
[19:05:28] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97)
[20:01:29] *** Joins: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca)
[20:31:09] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 252 seconds)
[20:44:28] *** Quits: philipp-sk (~Philipp@ktnron0916w-lp130-02-76-66-162-159.dsl.bell.ca) (Ping timeout: 244 seconds)
[20:52:19] *** Quits: Guest25520 (~lyan@2605:a000:160e:2dd:4a4d:7eff:fef2:eea3) (Quit: Leaving)
[21:52:04] *** Joins: tomzawadzki (~tomzawadz@192.55.54.42)
[22:43:18] *** Quits: tomzawadzki (~tomzawadz@192.55.54.42) (Ping timeout: 256 seconds)
[23:54:26] *** Joins: tomzawadzki (~tomzawadz@192.55.54.44)