Anatomy of ObjectContext, the Ceph in core representation of an object

An ObjectContext is created when a ReplicatedPG applies operations on an object.

read/write mutual exclusion

The C_OSD_OndiskWriteUnlock callback is registered to be called after a transaction (read in this case) completes. It will signal the writes and reads waiting if all writes are done.
Before adding an entry that will write an object to a transaction ( for instance when ReplicatedPG::mark_object_lost sets the object_info_t::lost data member to true ) the ObjectContext::ondisk_write_lock method is called and will Cond::Wait until all reads complete. The caller of mark_object_lost adds the ObjectContext to a list that is used to build the C_OSD_OndiskWriteUnlockList callback that will be called when the transaction completes and call ondisk_write_unlock on each object.

The logic is similar when reverting an object to a prior version, applying a replica operation, or RepliatedPG::handle_pull_response.

ondisk_read_lock is called by ReplicatedPG::do_op and unlocked after calling prepare_transaction. ReplicatedPG::recover_object_replicas and ReplicatedPG::push_backfill_object do the same.

ondisk_write_lock will wait until there are no more read operations waiting ( readers_waiting ) or read being processed ( readers ). ondisk_read_lock will wait for ongoing writes to finish ( unstable_writes ) but will take the lock even if writers are waiting ( writers_waiting ) therefore taking precedence over write.There can be any number of simultaneous write ( unstable_writes > 1 ) as long as there are no ongoing reads ( readers < 1 ). There can be any number of simultaneous readers ( readers > 1 ) as long as there are no ongoing writes ( unstable_writes < 1 ). ondisk_read_unlock will signal waiting writers if there is no more readers ( !readers ). ondisk_write_unlock will signal waiting readers if there is no more writers ( !unstable_writers ).

blocking and blocked_by

ObjectContext has a blocking and a blocked_by data members. When an operation on an object is made of multiple operations, all of them must be about an object by the same name but can be about different versions. If a variation of the object is degraded, it is blocked by the degraded object and the is added to the list blocked by the degraded object.
Before peering the ReplicatedPG::on_change is called and for each object in the waiting_for_degraded_object list it will loop over the objects it is blocking, remove it from the list and unblock it. The same happens whenever an object has been pushed.

This entry was posted in Code path, ceph. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>