Tuesday, 2024-09-10

opendevreview	Mridula Joshi proposed openstack/glance stable/2024.1: Fixed typo in discovery.py https://review.opendev.org/c/openstack/glance/+/928757	05:36
opendevreview	Merged openstack/glance master: Add metadef value for stateless firmware https://review.opendev.org/c/openstack/glance/+/926599	07:58
opendevreview	Merged openstack/glance master: api-ref: add 'compressed' container_format https://review.opendev.org/c/openstack/glance/+/928717	07:58
opendevreview	Merged openstack/glance stable/2024.1: Fixed typo in discovery.py https://review.opendev.org/c/openstack/glance/+/928757	11:34
opendevreview	Mridula Joshi proposed openstack/glance stable/2023.2: Fixed typo in discovery.py https://review.opendev.org/c/openstack/glance/+/928786	11:42
opendevreview	Brian Rosmaita proposed openstack/glance master: api-ref: Fix 'name' reference https://review.opendev.org/c/openstack/glance/+/857735	12:19
opendevreview	Balazs Gibizer proposed openstack/glance master: [doc]Update hw_vif_model as it supports igb now https://review.opendev.org/c/openstack/glance/+/928788	12:23
dansmith	abhishekk: we're failing to delete the image after some of the format testing: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_659/927291/5/check/tempest-integrated-storage-import/659b93e/testr_results.html	14:02
dansmith	but I'm not sure why that would be... it's just a delete on the image after the import would have failed	14:03
abhishekk	looking	14:03
dansmith	the actual fail is this: https://zuul.opendev.org/t/openstack/build/659b93ec4243482b8e95b09e5c729db7/log/controller/logs/screen-g-api.txt#14053	14:04
dansmith	which seems DB related and thus I'm struggling to find the relation to what's going on	14:04
dansmith	but it's always one of these failed-because-format tests	14:04
dansmith	all stemming from image_repo.remove(), which is a different request from the failed task and everything	14:08
dansmith	although looks maybe like the task is still running, so maybe there's a bit of a race in glance here between the delete and a running task?	14:09
abhishekk	I am not able to recollect that we delete the image if conversion fails, is it while cleanup from tempest?	14:09
abhishekk	I can see a race condition message in the log while deleting property from image_properties table	14:10
abhishekk	Sep 06 16:46:15.982625 np0038417198 devstack@g-api.service[87670]: DEBUG glance.db.sqlalchemy.api [None req-72866585-8bed-424b-a843-0b3f8fc97b49 tempest-ImagesFormatTest-1430253273 tempest-ImagesFormatTest-1430253273-project-member] StaleDataError while deleting property 'os_glance_import_task' from image 'ec12a1f2-46a5-4c0c-82a3-19a57cc895d5' likely means we raced during delete: UPDATE statement on table 'image_properties' expe	14:10
abhishekk	cted to update 1 row(s); 0 were matched. {{(pid=87670) _image_property_delete /opt/stack/glance/glance/db/sqlalchemy/api.py:1328}}	14:10
dansmith	yeah, in the tempest test I delete bad images right away, which is definitely no longer needed since I split them all into different tests I suppose, but it's still a legit cleanup activity that is failing	14:12
abhishekk	this is the recent chagne, I think here it is racing?	14:13
abhishekk	https://github.com/openstack/glance/blob/master/glance/async_/flows/plugins/image_conversion.py#L238	14:13
abhishekk	this I added to revert image back to queued state	14:13
dansmith	...which we need also	14:16
dansmith	but IIRC none of those changes get made until the action_wrapper context manager exits, right?	14:16
* dansmith looks		14:16
abhishekk	right	14:18
dansmith	so maybe we need a retry only on db error in the delete path?	14:22
abhishekk	likely, Or should we check task progress before delete?	14:24
dansmith	I think for delete we should always assume it can work, so we want to delete harder, not say "try delete later"	14:28
dansmith	the error message is a little confusing though	14:28
dansmith	says the transaction is closed	14:29
dansmith	oh actually,	14:29
dansmith	I wonder if this is already retrying, but the retrying is happening within the transaction	14:29
abhishekk	but We don't have any retry call on delete	14:30
dansmith	it's retrying on image_update	14:31
dansmith	https://github.com/openstack/glance/blob/master/glance/db/sqlalchemy/api.py#L166	14:31
dansmith	and I bet it's re-using the same session from the context passed in or something	14:31
abhishekk	_retry_on_deadlock	14:31
abhishekk	remove is internally calling update	14:32
dansmith	right	14:32
abhishekk	I think image_property_delete is creating new session which is called from image_update	14:39
dansmith	also the deadlock retrying logs, which isn't in the log, so no retry I think	14:40
abhishekk	image_update --> set_image_properties --> image_property_delete	14:40
dansmith	and is _that_ the transaction it's complaining about?	14:41
abhishekk	let me confirm	14:41
abhishekk	image_update --> set_image_properties --> _image_property_delete, we are passing same session throughout but _image_property_delete is the method where we are getting that error	14:43
dansmith	we don't keep the same session in image_property_delete right?	14:45
dansmith	oh, we call _image_property_delete I see	14:45
abhishekk	hmm and that is the last method call from glance and then it raises InvalidRequestError	14:47
abhishekk	and this debug log suggests, we returned from that method, Sep 06 16:46:15.982625 np0038417198 devstack@g-api.service[87670]: DEBUG glance.db.sqlalchemy.api [None req-72866585-8bed-424b-a843-0b3f8fc97b49 tempest-ImagesFormatTest-1430253273 tempest-ImagesFormatTest-1430253273-project-member] StaleDataError while deleting property 'os_glance_import_task' from image 'ec12a1f2-46a5-4c0c-82a3-19a57cc895d5' likely means we raced durin	14:49
abhishekk	g delete: UPDATE statement on table 'image_properties' expected to update 1 row(s); 0 were matched. {{(pid=87670) _image_property_delete /opt/stack/glance/glance/db/sqlalchemy/api.py:1328}}	14:49
abhishekk	but how come it again raises error from that method only	14:49
dansmith	hmm, are we going through and deleting each property individually?	14:54
abhishekk	I think so that is what we are doing	14:55
dansmith	so we make a list of properties, then go through and delete each one, but if we race with the task revert, one or more of those is gone when we try to delete it and explode?	14:57
abhishekk	likely, that is the only rational logic I can see in this scenario	14:57
dansmith	ugh	14:59
abhishekk	and from remove we are passing purge_props=True	14:59
dansmith	yeah	14:59
dansmith	so I wonder if we can just retry on InvalidRequestError as well	15:00
dansmith	although I assume we need something more specific than just that exception	15:01
abhishekk	I think for time being we can retry on that error	15:02
abhishekk	But we need refactor on _set_properties_for_image method	15:03
dansmith	I dunno, I assume we have to retry the whole delete right? because the transaction is closed so I assume we're toast	15:03
dansmith	I found a thread about retrying InvalidRequestErrors being a common practice	15:06
opendevreview	Dan Smith proposed openstack/glance master: WIP: Retry image update on closed transaction https://review.opendev.org/c/openstack/glance/+/928820	15:09
dansmith	maybe we recheck this ^ a few times and see if we get that log message but it succeeds?	15:09
dansmith	and if so we can ask for confirmation from mike	15:09
abhishekk	sounds good	15:09
dansmith	I'm definitely puzzled about why this is related to the oslo conversion.. must not be or the oslo code is a little slower/faster and is opening up the race	15:14
abhishekk	yeah, I missed that part that earlier it was working when format_insepctor was in glance	15:27
dansmith	it's not 100% fail, so it's possible it was failing and just rechecked into the tree I suppose	15:30
dansmith	i.e. perhaps it's failing in master	15:30
dansmith	of course, no fail and no hits of that issue on the first run	17:01
dansmith	will recheck when it's done	17:01
abhishekk	Ack, let me know if few checks needs to be added tomorrow my time	17:26

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!