Monday, 2016-01-04

*** zengyingzhe has quit IRC00:48
*** gampel has quit IRC01:45
*** wangfeng_yellow has joined #openstack-smaug03:20
*** zengyingzhe has joined #openstack-smaug06:07
*** zhonghua-lee has quit IRC06:29
*** zhonghua-lee has joined #openstack-smaug06:29
*** gampel has joined #openstack-smaug07:22
*** gampel1 has joined #openstack-smaug08:20
yinwei@gampel on line?08:21
yinweiwe're thinking about checkpoint lock mechanism08:23
*** gampel has quit IRC08:23
yinwei scenario like: operation is under execution of protection service, and checkpoint is created. Here we need lock the checkpoint until execution of protection fishishes.08:23
yinweiThe lock is to avoid delete chekcpoint on parallel and will be used when service restart and the workflow of the checkpoint rolling back: we need check no other service is working on this checkpoint on parallel. If we need this distributed lock, I'm thinking whether we need introduce a lock service plugin and its backend, say zookeeper or, since the lock is not competed frequemently, and doesn't have high performance requirement, we could implement08:25
yinweia distributed lock based on our bank it self, like S3, swift. What do you think? the latter approach doesn't need introduce another component.08:25
*** gampel1 has quit IRC08:56
*** gampel has joined #openstack-smaug08:57
zengyingzheyinwei, if lock the checkpoint all time while protection progress is going on, how can we read the status of this checkpoint?09:47
zengyingzheI'm not sure whether S3 or swift is suit for implementing a distributed lock, but I do know DB can, cause heat implements lock mechanism by DB.09:50
*** zengyingzhe has quit IRC09:55
*** zengyingzhe has joined #openstack-smaug09:56
*** zengyingzhe has quit IRC09:57
*** openstackgerrit has quit IRC10:02
*** openstackgerrit has joined #openstack-smaug10:02
yinweisame mechanism as DB, put and check.  But swift as a distributed storage should scale better than DB, except you're talking about distributed db like cassandra.10:15
*** saggi has joined #openstack-smaug10:17
yinweiguys are also discussing use zk to synchronize in heat :http://blogs.rdoproject.org/7685/zookeeper-part-1-the-swiss-army-knife-of-the-distributed-system-engineer-210:17
saggiyinwei: I missed the beginning. What are we all talking about?10:18
yinweiif we don't mind introducing another component, zk would be more reliable way and nova service group has already accepted zk as one of its lock backend.10:18
yinweihi, saggi10:19
saggihi :)10:19
yinweiwe're talking about checkpoint lock10:19
yinweiwhich seems to be a distributed lock10:19
yinweiscenario like: operation is under execution of protection service, and checkpoint is created. Here we need lock the checkpoint until execution of protection fishishes.10:20
yinweiThe lock is to avoid delete chekcpoint on parallel and will be used when service restart and the workflow of the checkpoint rolling back: we need check no other service is working on this checkpoint on parallel.10:20
yinweiI think I got the scenario from your bank.md :)10:21
saggiYes, how it's implemented depends on the bank.10:21
saggiSo object store implementations support mechanisms that enable this locking.10:22
yinweiso two options, shall we introduce another lock service plugin and its backend and make zk as its lock10:22
yinweisorry, by what semantics do you mean object store supports?10:23
yinweiAFAK, object storage only ensure atomic put and last one wins10:23
saggiFirst of all, the lock is only really important for deletion. Since the checkpoint will be marked as "in progress" until the checkpoint is done. This means we know that we can't restore from that point if it's not in "done" state. The only problem is deletion.10:24
saggiWe want to know that: a) The protect operation crashed so we can delete an "in progress" checkpoint. b) there are no restoration in progress so we can delete a "done" checkpoint.10:24
yinweiIMHO, either we introduce lock service like zk, or we implement it based on bank in bankplugin.10:25
saggiI don't think we can use ZK since it needs to work cross site.10:25
yinweishall we deploy swift across sites?10:26
saggiWhen restoring on site B site A needs to know it can't delete this checkpoint.10:26
saggiI'd assume you use the target's swift using it's northbound API. When you backup for DR you need to save off-site. That target site will need to run the object store. If the object store is in the local site we will loose it in the disaster.10:28
yinweihow about this case: we deploy multiple protection services, when one service crashes while the checkpoint is 'in progress'. Later, shall we pick up this checkpoint to continue in other service instance when the crash detected?10:29
saggiYou can't continue. You have to create a new checkpoint from scratch since the tree might have changed while you were down.10:30
yinweiAFAK, swift has implemented geo replication10:30
yinweiwhy one site failue will lose whole swift?10:31
saggiIf you don't have it replicated of course10:31
yinweihere I mean why not use its geo replication feature, but build two swift sites and replicate data by ourselves.10:32
saggihow are collisions resolved?10:33
saggigampel: Suggested that for simplicity we might lock a bank to a single site. That would remove the need for cross site locking. Each site could put it's ID in the root of the bank. If you want write access you will need to use that site or steal access. We are vulnerable only during a forced transition, but we will warn the user when that happens.10:33
saggiyinwei, I have to, will you be here in 45 minutes?10:34
yinweiI don't think so, maybe later10:34
saggiyinwei, you are asking good questions and I would like to continue10:34
yinwei45 minutes later is my dinner time10:34
yinweithanks, saggi10:34
saggiyinwei, ping me when you have time10:34
yinweisure10:34
yinweiI still think we need check all zombie checkpoints (to cleanup, yes, delete again), and we need redo those operations10:38
yinweito check zombie checkpoints, distributed lock would be a good semantic.10:38
yinwei@zengyingzhe, to answer read status question, we build index for checkpoint status reading. BTW, the lock should allow repeate repeated entrance of the same lock owner10:41
*** zengyingzhe has joined #openstack-smaug10:51
*** zengyingzhe_ has joined #openstack-smaug12:05
*** zengyingzhe has quit IRC12:08
*** yinweiphone has joined #openstack-smaug12:15
yinweiphonesaggi: hello12:16
saggiyinweiphone: Hello12:16
yinweiphonehappy12:16
yinweiphonehappy you're  here12:16
saggiyinweiphone: You are here in 3 different forms :)12:18
yinweiphonehmm, I'm trying iPhone app12:18
yinweiphoneseems hard to print12:18
saggiyinweiphone: wrt locking.  We would like to avoid distributed locking. What I thought was to use Swift's auto deletion feature to create lease objects. While they exists the checkpoint it locked. They will autodelete if the host crashes and no longer extends their lifetime.12:22
*** yinweiphone has quit IRC12:23
saggiyinweiphone: It still doesn't solve the cross site use case as it will take time for georeplication to copy this objects so we can't trust them across sites. To solve this I suggest marking a checkpoint as deleted first and only deleting it after enough time has passed that we are sure that all sites are up to date.12:24
*** wei__ has joined #openstack-smaug12:25
wei__wow, now i can login through mac, cool12:26
wei__much better to print12:27
wei__saggi, shall we continue?12:28
*** yinweiphone has joined #openstack-smaug12:28
*** yinweiphone has quit IRC12:28
saggiwei__: wrt locking.  We would like to avoid distributed locking. What I thought was to use Swift's auto deletion feature to create lease objects. While they exists the checkpoint it locked. They will autodelete if the host crashes and no longer extends their lifetime.12:29
saggiwei__: It still doesn't solve the cross site use case as it will take time for georeplication to copy this objects so we can't trust them across sites. To solve this I suggest marking a checkpoint as deleted first and only deleting it after enough time has passed that we are sure that all sites are up to date.12:29
wei__auto deletion feature? sounds like empheral node in zk12:31
saggiwei__: I don't think we should deploy ZK cross site. I don't think it's built for that. I'd rather have the option to have a restore fail because someone deleted it (because it's not that bad) than have to configure and maintain a cross site ZK configuration.12:33
wei__I need check more about auto deletion. AFAK in S3, auto deletion only means object could be deleted sometime later as user set, like 3 days later.12:33
wei__I understand your point, the cost to introduce another service.12:34
wei__Just not sure whether auto deletion of swift could make it12:34
wei__as you said, it seems to maintain a heartbeat between client and cluster, once client crashes, the ephemeral lock unlocks.12:36
saggiwei__: The only thing I'm really worried about is detecting abandoned (zombie) checkpoints.12:36
saggiDeleting while restoring isn't very important as the restore would fail an it's the user's problem if it decides to delete the checkpoint.12:37
wei__oh oh oh12:37
wei__I got it12:37
wei__you mean client will update the lifetime of checkpoint key again and again12:38
wei__if client crashes, the checkpoint is auto deleted12:38
wei__ok12:38
saggiwei__: Just the lease file. Since the checkpoint might be many objects and updating them all is to much work.12:39
*** zengyingzhe_ has quit IRC12:39
wei__the lease file is kept per service instance?12:42
saggiPer checkpoint.12:42
wei__ok.12:42
wei__then what's abandoned checkpoint do you mean?12:43
wei__checkpoints under execution while service crashes?12:43
saggiThe server stopped the checkpoint process spontaneously either by a bug or by a crash12:44
saggiSo the checkpoint is still "in progress" but it will never finish12:44
saggiwe need to clean it up12:44
saggiwei__: OK?12:47
wei__hmm, do you mean the lock client who will update the lease file will be a separate process other than protection service? Otherwise, I couldn't see why when the service crashes, the lease will still be kept12:47
wei__hmm, if there's a bug, where protection service lives, but lease still continues...if this is the case, it's really hard to detect the zombie since even if you make lease updating logic into the task flow, you can't make sure the fineness is small enough to track bugs.12:52
saggiThe lease will expire correctly. But another service wouldn't know which of the server leases are responsible for which checkpoints.12:52
wei__why?12:53
saggiHow would they know?12:53
wei__I thought the lease should be some key like /container/checkpoint/lease/client_id12:54
wei__client_id is a sha256 or sth. like that generated once the service initialized according to timestamp12:54
saggihmm12:55
saggiAnd than we only need to update one lease12:55
wei__swift get -path /container/checkpoint/lease could get if there's any lock12:55
saggiI would call it owner_service_id12:55
wei__yes, much better12:56
saggiand than we check that this owner is alive12:56
saggiIt's a good idea wei__12:56
wei__thanks, saggi12:56
wei__I like your auto deletion idea too12:56
saggiSo we have one per service.12:57
wei__actually, I'm thinking implement sth. like this in client side but forgot such a magic usage12:57
wei__ok, let me see12:57
saggiAnd we make it long12:57
saggiMuch longer than the update time.12:57
saggiSo if the service fails to update for N minutes it abandons all checkpoints.12:58
saggiSince we can't attest their integrity12:58
wei__how could we tell 'all checkpoints' belonging to one server owner?12:58
saggiWe don't need that12:59
saggiwe just go over all the in progress checkpoints and check their owners12:59
saggicollect all the abandoned ones12:59
saggiand delete them12:59
wei__I mean how could we tell the owner of each checkpoint?12:59
saggiwei__: Could you comment on bank.md so I will remember to add it to the file.13:00
saggiyes13:00
saggithat is what you suggested13:00
saggiputting it in the checkpiont MD13:00
wei__metadata of checkpoint object?13:00
saggiin the bank13:00
wei__sure, i will comment it in bank.md13:01
saggi/checkpoints/<checkpoint id>/owner_id13:01
saggiand we will have /leases/clients/<client_id>13:01
saggior maybe /leases/owners/owner_id13:01
wei__you mean make another key under the prefix under checkpoint_id13:01
saggiyes13:02
saggiWe create it when we create the checkpoint13:02
wei__ok13:02
saggiThat way we only maintain one lease13:02
wei__hmm, got the mapping: checkpoint->owner-id->lease13:03
saggiWe would make the leases long. Since I don't foresee a lot of contention13:03
wei__yes, same here13:03
saggiyes13:03
wei__another question, when you said we actually delete checkpoint after some time long enough until all sites updated13:04
wei__do you mean the geo replication of swift is an eventually consistency model?13:05
saggiwei__: Just what I wanted to talk about now :)13:05
wei__nice13:05
wei__:)13:05
saggiYes, the problem is that we can't ensure consistency of the leases across side. So leases don't work.13:06
wei__anyway in swift to check whether the consistency has achieved?13:06
saggiBut this is only a problem for the delete while restore case13:06
wei__yes, only happens when delete from one site but read from another site13:07
saggiwei__: I would prefer if swift wouldn't do anything. Since we also need to synchronize with resources outside of swift.13:07
saggiThere is also the issue of double delete13:08
wei__hmm, what swift offers is almost what other object storage offers. They all aligns with S3.13:08
saggiSo what we suggest is that deleting will only change the state of the checkpoint but wouldn't actually delete it. Then there will be a single process that we need to make sure only runs at one place and actually deletes the checkpoints.13:09
saggiLike a garbage collector.13:09
wei__yes, could only be that13:10
wei__eventually consistency introduces dirty read13:10
saggiBut how do we ensure that it only runs on one site?13:10
saggiThat is what we haven't solved yet13:10
wei__sorry, could describe the problem with more details?13:11
saggiLet's say someone at site A and site B decided to delete the same checkpoint. You will obviously encounter issues.13:13
saggiEven with the GC approach we need to somehow make sure only one site runs the GC13:13
saggiOr we get issues while cleaning up resources outside the bank13:13
saggiwei__: Do you understand the issue?13:14
wei__yeach13:14
wei__Or we get issues while cleaning up resources outside the bank---what issue? delete error 404 not found?13:15
wei__hmm, could be early exit since the key hasn't be replicated there yet?13:16
saggiLets say we also have a volume backed up somewhere else. Where we delete the checkpoint we also need to delete this volume from the other storage. If two process try and do it at once one of them will fail.13:17
wei__but we wait enough to delete, right? we suppose that key has been replicated already, then we start GC13:17
saggiwei__: But than two sites can start the GC at once.13:17
wei__why not delete checkpoint key first, only the one succeeds delete checkpoint key will do following steps to cleanup backup resources?13:19
wei__swift will only allow one client succeeds to delete the key, others should get 404, isn't it?13:19
saggiwei__: But that doesn't immediately with geo replication13:20
wei__but we wait enough to delete, right? we suppose that key has been replicated already, then we start GC13:20
wei__that's the assumption, isn't it?13:20
* saggi is thinking13:21
wei__ok, you mean the delete is not immediately with geo replication13:21
wei__thinking13:22
saggiWhat I'm saying is that two servers can decide to act on the deletion at once13:22
saggiyes13:22
wei__saggi, the condition GC could delete the checkpoint is whether lease of this checkpoint is still there13:25
saggiYes, if it's missing we can delete13:25
saggiSince we know it was abandoned13:26
saggigampel: Suggested having a root lease that can only actually delete checkpoints. Everyone can mark deletions but only it can actually delete.13:27
wei__so the root lease should locate in which site?13:28
wei__we need ensure this root lease won't fail in any site failure13:28
saggiIt's in the bank. If it expires someone else will become root.13:28
wei__actually i was thinking we should have each site one GC to collect its owners' checkpoints garbage, so when it fails, others need take the ownership13:30
wei__there need some arbitration service to tell who is the root or who is the owner ?13:31
wei__what if we just tolerate the delete failure?13:32
saggiwei__: We will need to write the plugins around that. Since we don't want to have leftover data outside the bank unreachable.13:32
wei__yes, we don't leave garbage13:33
wei__just let the looser of the GC competitors to tolerate the delete error, what's the cons here?13:33
wei__sorry, have to leave. shall we continue tommorrow?13:34
saggisure13:34
*** wei__ has quit IRC13:54
*** wei__ has joined #openstack-smaug14:27
*** wei__ has quit IRC14:30
*** chenying has quit IRC14:30
*** chenying has joined #openstack-smaug14:30
openstackgerritEran Gampel proposed openstack/smaug: First draft of the API documentation  https://review.openstack.org/25521115:49
openstackgerritEran Gampel proposed openstack/smaug: Add Smaug spec directory  https://review.openstack.org/26191316:00
*** smcginnis has joined #openstack-smaug16:12
*** gampel has quit IRC16:18
openstackgerritMerged openstack/smaug: First draft of the API documentation  https://review.openstack.org/25521116:43
openstackgerritSaggi Mizrahi proposed openstack/smaug: Pluggable protection provider doc  https://review.openstack.org/26226416:54
openstackgerritMerged openstack/smaug: Add Smaug spec directory  https://review.openstack.org/26191316:58
*** openstackgerrit has quit IRC18:32
*** openstackgerrit has joined #openstack-smaug18:32
*** zhonghua-lee has quit IRC22:13
*** zhonghua-lee has joined #openstack-smaug22:14
*** saggi has quit IRC23:11
*** saggi has joined #openstack-smaug23:28

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!