diff --git a/adoc/chapters/architecture.adoc b/adoc/chapters/architecture.adoc
index 193ed972..12b80e82 100644
--- a/adoc/chapters/architecture.adoc
+++ b/adoc/chapters/architecture.adoc
@@ -461,7 +461,7 @@ each other without side effects (i.e., given two requirements _r~j~_ and _r~k~_,
 _(r~j~, r~k~)_ {equiv} _(r~k~, r~j~)_). The intersection of two
 actions is not necessarily empty.
 *Actions* can include (but are not limited to): memory copy operations,
-mapping operations, host side synchronization, or implementation-specific
+memory mapping operations, coordination with the host, or implementation-specific
 behavior.
 
 Finally, _Performing an action_ ({SYCLperform}(_a~i~_)) executes the
@@ -562,10 +562,10 @@ image::{images}/three-cg-three-queue.svg[align="center",opts="{imageopts}"]
 ==== Controlling execution order with events
 
 Submitting an action for execution returns an [code]#event# object. Programmers
-may use these events to explicitly synchronize programs. Host code can wait for an
-event to complete, which will block execution on the host until the action represented
-by the event has completed.  The [code]#event# class is described in greater detail
-in <<sec:interface.event>>.
+may use these events to explicitly coordinate host and device execution. Host
+code can wait for an event to complete, which will block execution on the host
+until the action(s) represented by the event have completed.  The [code]#event#
+class is described in greater detail in <<sec:interface.event>>.
 
 Events may also be used to explicitly order the execution of kernels. Host code may
 wait for the completion of specific event, which blocks execution on the host until
@@ -627,7 +627,7 @@ Each work-item in the ND-range is identified by a value of type
 [code]#nd_item<N>#.  The type [code]#nd_item<N># encapsulates a
 global id, local id and work-group id, all of type [code]#id<N>#
 (the iteration space offset also of type [code]#id<N>#, but this is deprecated in SYCL 2020), as well as
-global and local ranges and synchronization operations necessary to
+global and local ranges and coordination mechanisms necessary to
 make work-groups useful. Work-groups are assigned ids using a similar
 approach to that used for work-item global ids.  Work-items are
 assigned to a work-group and given a local id with components in the
@@ -872,7 +872,7 @@ execute concurrently with the barrier.
 Finally, _CG~c~_ will be enqueued after _H(b1~RW~)_ is finished,
 but still has to wait for _CG~b~_ to conclude for all its requirements to
 be satisfied.
-See <<sec:synchronization>> for details on synchronization rules.
+See <<sec:coordination>> for details on host-device coordination.
 
 // Formerly in host_acc.tex
 // Uses same definitions for cga, cgb, cgc, and hostA in code and picture,
@@ -1011,11 +1011,11 @@ made consistent across the work-items in that set through the use of
 <<mem-fence>> and atomic operations.
 
 Memory consistency between the host and SYCL device(s), or different SYCL
-devices in the same context, can be guaranteed through synchronization in
-the host application as defined in <<sec:synchronization>>.  On SYCL devices
-supporting concurrent atomic accesses to USM allocations and acquire-release or sequentially
-consistent memory orderings, cross-device memory consistency can be
-enforced through the use of <<mem-fence>> and atomic operations.
+devices in the same context, can be guaranteed through library calls in the
+host application, as defined in <<sec:coordination>>. On SYCL devices
+supporting concurrent atomic accesses to USM allocations and acquire-release or
+sequentially consistent memory orderings, cross-device memory consistency can
+be enforced through the use of <<mem-fence>> and atomic operations.
 
 [[sec:memory-ordering]]
 ==== Memory ordering
@@ -1207,10 +1207,10 @@ sections of this specification.
 === Basic data parallel kernels
 
 Data-parallel <<kernel,kernels>> that execute as
-multiple <<work-item,work-items>> and where no local synchronization is required are enqueued
-with the [code]#sycl::parallel_for# function parameterized by a
-[code]#sycl::range# parameter. These kernels will execute the kernel
-function body once for each work-item in the specified <<range>>.
+multiple <<work-item,work-items>> and where no work-group-local coordination is
+required are enqueued with the [code]#sycl::parallel_for# function
+parameterized by a [code]#sycl::range# parameter. These kernels will execute
+the kernel function body once for each work-item in the specified <<range>>.
 
 Functionality tied to <<group, groups>> of work-items, including
 <<group-barrier, group barriers>> and <<local-memory>>, must not be used
@@ -1228,10 +1228,10 @@ work-group size as parameters to the [code]#sycl::parallel_for# function with a
 [code]#sycl::nd_range# parameter. In this mode of execution,
 kernels execute over the <<nd-range>> in work-groups of the specified
 size. It is possible to share data among work-items within the same
-work-group in <<local-memory,local>> or <<global-memory>> and to synchronize between
-work-items in the same work-group by calling the
-[code]#group_barrier# function. All work-groups in a given
-[code]#parallel_for# will be the same size, and the global size
+work-group in <<local-memory,local>> or <<global-memory>>, and the
+[code]#group_barrier# function can be used to block a work-item until all
+work-items in the same work-group arrive at the barrier. All work-groups in a
+given [code]#parallel_for# will be the same size, and the global size
 defined in the nd-range must either be a multiple of the work-group size in
 each dimension, or the global size must be zero. When the global size
 is zero, the kernel function is not executed, the local size is ignored, and
@@ -1246,8 +1246,8 @@ work-items in any sub-group in a kernel is based on a combination of the kernel
 and its dispatch dimensions.  The size of any sub-group in the dispatch is
 between 1 and this maximum sub-group size, and the size of an individual
 sub-group is invariant for the duration of a kernel's execution.  Similarly
-to work-groups, the work-items within the same sub-group can be synchronized
-by calling the [code]#group_barrier# function.
+to work-groups, the [code]#group_barrier# function can be used to block a
+work-item until all work-items in the same sub-group arrive at the barrier.
 
 Portable device code must not assume that work-items within a sub-group execute
 in any particular order, that work-groups are subdivided into sub-groups in a
@@ -1319,55 +1319,61 @@ These kernels are not programmable, hence they are not bound by the SYCL
 implementation-defined.
 
 
-[[sec:synchronization]]
-=== Synchronization
+[[sec:coordination]]
+=== Coordination and Synchronization
 
-Synchronization of processing elements executing inside a device is handled
-by the SYCL device kernel following the SYCL kernel execution model.
-The synchronization of the different SYCL device kernels executing with
-the host memory is handled by the SYCL application via the SYCL runtime.
+Coordination between the host and any devices can be expressed in the host SYCL
+application using calls into the SYCL runtime. Coordination between work-items
+executing inside of device code can be expressed using group barriers.
 
-==== Synchronization in the SYCL application
+Some function calls synchronize with other function calls performed by another
+thread (potentially on another device). Other functions are defined in terms of
+their synchronization operations. Such functions can be used to ensure that
+the host and any devices do not access data concurrently, and/or to reason
+about the ordering of operations across the host and any devices.
 
-Synchronization points between host and device(s) are exposed through
-the following operations:
+==== Host-Device Coordination
+
+The following operations can be used to coordinate host and device(s):
 
   * _Buffer destruction_: The destructors for
     [code]#sycl::buffer#, [code]#sycl::unsampled_image# and
-    [code]#sycl::sampled_image# objects wait for all submitted work on
-    those objects to complete and to copy the data back to host memory
-    before returning. These destructors only wait if the object was
-    constructed with attached host memory and if data needs to be copied
-    back to the host.
+    [code]#sycl::sampled_image# objects block until all submitted work on
+    those objects completes and copy the data back to host memory before
+    returning. These destructors only block if the object was constructed with
+    attached host memory and if data needs to be copied back to the host.
 +
 --
-More complex forms of synchronization on buffer destruction
-can be specified by the user by constructing buffers with other kinds of
-references to memory, such as [code]#shared_ptr# and [code]#unique_ptr#.
+More complex forms of buffer destruction can be specified by the user by
+constructing buffers with other kinds of references to memory, such as
+[code]#shared_ptr# and [code]#unique_ptr#.
 --
-  * _Host Accessors_: The constructor for a host accessor waits for all
-    kernels that modify the same buffer (or image) in any queues to complete
+  * _Host Accessors_: The constructor for a host accessor blocks until all
+    kernels that modify the same buffer (or image) in any queues complete
     and then copies data back to host memory before the constructor returns.
     Any command groups with requirements to the same memory object cannot
     execute until the host accessor is destroyed as shown on <<fig:host-acc>>.
   * _Command group enqueue_: The <<sycl-runtime>> internally ensures
     that any command groups added to queues have the correct event
     dependencies added to those queues to ensure correct operation. Adding
-    command groups to queues never blocks. Instead any required
-    synchronization is added to the queue and events of type
-    [code]#sycl::event# are returned by the queue's submit function
-    that contain event information related to the specific command group.
+    command groups to queues never blocks, and the [code]#sycl::event# returned
+    by the queue's submit function contains event information related to the
+    specific command group.
   * _Queue operations_: The user can manually use queue operations,
     such as [code]#sycl::queue::wait()# to block execution of the calling thread until all
     the command groups submitted to the queue have finished execution. Note
     that this will also affect the dependencies of those command groups in
     other queues.
   * _SYCL event objects_: SYCL provides [code]#sycl::event#
-    objects which can be used for synchronization. If synchronization is
-    required across SYCL contexts from different <<backend, SYCL backends>>, then the
-    <<sycl-runtime>> ensures that extra host-based synchronization is
-    added to enable the SYCL event objects to operate between contexts
-    correctly.
+    objects which can be used to track and specify dependencies. The
+    <<sycl-runtime>> must ensure that these objects can be used to
+    enforce dependencies that span SYCL contexts from different <<backend, SYCL
+    backends>>.
+
+The specification for each of these blocking functions defines some set of
+operations that cause the function to unblock. These operations always happen
+before the blocking function returns (using the definition of "happens before"
+from the C++ specification).
 
 Note that the destructors of other SYCL objects
 ([code]#sycl::queue#, [code]#sycl::context#,{ldots}) do
@@ -1380,22 +1386,18 @@ cases if this does not fit the goal.
 See <<sec:managing-object-lifetimes>> for more information
 on object life time.
 
-==== Synchronization in SYCL kernels
-
-In SYCL, synchronization can be either global or local within a
-group of work-items.  Synchronization between work-items in a single
-group is achieved using a <<group-barrier>>.
+==== Work-item Coordination
 
-All the work-items of a group must execute the barrier before any are
-allowed to continue execution beyond the barrier. Note that the
-group barrier must be encountered by all work-items of a
-group executing the kernel or by none at all. In SYCL,
-<<work-group-barrier>> and <<sub-group-barrier>> functionality is
+A <<group-barrier>> provides a mechanism to coordinate all work-items in the
+same group. All work-items in a group must execute the barrier before any are
+allowed to continue execution beyond the barrier. Note that the group barrier
+must be encountered by all work-items of a group executing the kernel or by
+none at all. <<work-group-barrier>> and <<sub-group-barrier>> functionality is
 exposed via the [code]#group_barrier# function.
 
-Synchronization between work-items in different work-groups via atomic
-operations is possible only on SYCL devices with certain capabilities,
-as described in <<sec:memoryconsistency>>.
+Coordination between work-items in different work-groups must take place via
+atomic operations, and is possible only on SYCL device with certain
+capabilities, as described in <<sec:memoryconsistency>>.
 
 === Error handling
 
@@ -1501,7 +1503,7 @@ provided, which uses move semantics for initializing and using the
 associated host memory. In this case, the behavior of the buffer in
 relation to the user application will be non-blocking on destruction.
 
-As said in <<sec:synchronization>>, the only blocking
+As said in <<sec:coordination>>, the only blocking
 operations in SYCL (apart from explicit wait operations) are:
 
   * host accessor constructor, which waits for any kernels enqueued before
@@ -1624,19 +1626,18 @@ the appropriate image format specified such as four 8-bit
 elements, for example.
 --
 
-Users may want fine-grained control of the synchronization, memory management
-and storage semantics of SYCL image or buffer objects. For example, a user may
-wish to specify the host memory for a memory object to use, but may not want the
-memory object to block on destruction.
+Users may want fine-grained control of the memory management and storage
+semantics of SYCL image or buffer objects. For example, a user may wish to
+specify the host memory for a memory object to use, but may not want the memory
+object to block on destruction.
 
 Depending on the control and the use cases of the SYCL applications,
 well established {cpp} classes and patterns can be used for reference counting and
 sharing data between user applications and the <<sycl-runtime>>. For control over
 memory allocation on the host and mapping between host and device memory,
 pre-defined or user-defined {cpp} [code]#std::allocator# classes are
-used. For better control of synchronization between a SYCL and a non SYCL
-application that share data, [code]#std::shared_ptr# and
-[code]#std::mutex# classes are used.
+used. To avoid data races when sharing data between SYCL and non-SYCL
+applications, [code]#std::shared_ptr# and [code]#std::mutex# classes are used.
 
 
 == Multi-dimensional objects and linearization
diff --git a/adoc/chapters/glossary.adoc b/adoc/chapters/glossary.adoc
index 658ce23b..a83050a9 100644
--- a/adoc/chapters/glossary.adoc
+++ b/adoc/chapters/glossary.adoc
@@ -49,9 +49,9 @@
     <<subsec:exception.class>>.
 
 [[barrier]]barrier::
-    A barrier is either a <<queue-barrier>>, or a kernel execution
-    <<group-barrier>> depending on whether it is a synchronization point on
-    the command queue or on a group of work-items in a kernel execution.
+    A barrier may refer to either a <<queue-barrier>> used for host-device
+    coordination, or a <<group-barrier>> used to coordinate work-items in
+    a kernel.
 
 [[blocking-accessor]]blocking accessor::
     A blocking accessor is an <<accessor>> which provides immediate access
@@ -112,13 +112,9 @@ object. For the full description please refer to <<subsec:buffers>>.
     details see <<sec:command.group.scope>>.
 
 [[queue-barrier]]command queue barrier::
-    The SYCL API provides two variants for functions that force
-    synchronization on a SYCL command queue. The
-    [code]#sycl::queue::wait()# and
-    [code]#sycl::queue::wait_and_throw()# functions force the SYCL
-    command queue to wait for the execution of the
-    <<command-group-function-object>> before it is able to continue
-    executing.
+    The [code]#sycl::queue::wait()# and [code]#sycl::queue::wait_and_throw()#
+    functions block the calling thread until the execution of a
+    <<command-group-function-object>> completes.
 
 [[constant-memory]]constant memory::
     A region of memory that remains constant during the execution of
@@ -212,14 +208,8 @@ object. For the full description please refer to <<subsec:buffers>>.
     such as a <<work-group>> or <<sub-group>>.
 
 [[group-barrier]]group barrier::
-    A synchronization function within a group of <<work-item,work-items>>. All the
-    <<work-item,work-items>> of a group must execute the barrier construct before any
-    <<work-item>> continues execution beyond the barrier. Additionally all work-items
-    in the group execute a release <<mem-fence>> prior to synchronizing at the
-    barrier, all work-items in the group execute an acquire <<mem-fence>> after
-    synchronizing at the barrier, and there is an implicit synchronization between
-    these acquire and release fences as if through an atomic operation on an
-    atomic object internal to the barrier implementation.
+    A coordination mechanism for all <<work-item,work-items>> of a group.
+    See the definition of the [code]#group_barrier# function.
 
 [[h-item]]h-item::
     A unique identifier representing a single <<work-item>> within the
@@ -368,13 +358,8 @@ object. For the full description please refer to <<subsec:buffers>>.
 
 [[mem-fence]]mem-fence::
     A memory fence provides control over re-ordering of memory load
-    and store operations when coupled with an atomic operation that
-    synchronizes two fences with each other (or when the fences are part of
-    a <<group-barrier>> in which case there is implicit synchronization
-    as if an atomic operation has synchronized the fences). The
-    [code]#sycl::atomic_fence# function acts as a fence across all
-    work-items and devices specified by a [code]#memory_scope#
-    argument.
+    and store operations when coupled with an atomic operation.
+    See the definition of the [code]#sycl::atomic_fence# function.
 
 [[object]]object::
     A state which a <<kernel-bundle>> can be in, representing
diff --git a/adoc/chapters/introduction.adoc b/adoc/chapters/introduction.adoc
index 35a36d41..f75fb1b1 100644
--- a/adoc/chapters/introduction.adoc
+++ b/adoc/chapters/introduction.adoc
@@ -25,7 +25,7 @@ features:
   * execution of parallel kernels on a heterogeneous device is made
     simultaneously convenient and flexible. Common parallel patterns are
     prioritized with simple syntax, which through a series {cpp} types allow
-    the programmer to express additional requirements, such as synchronization,
+    the programmer to express additional requirements, such as dependencies,
     if needed;
   * when using buffers and accessors, data access in SYCL is separated from
     data storage. By relying on the {cpp}-style resource acquisition is
@@ -43,9 +43,8 @@ features:
   * the hierarchical parallelism syntax offers a way of expressing
     data parallelism similar to the OpenCL device or OpenMP target
     device execution model in an easy-to-understand modern {cpp} form.  It
-    more cleanly layers parallel loops and synchronization points to
-    avoid fragmentation of code and to more efficiently map to CPU-style
-    architectures.
+    more cleanly layers parallel loops to avoid fragmentation of code and to
+    more efficiently map to CPU-style architectures.
 
 SYCL retains the execution model, runtime feature set and device
 capabilities inspired by the OpenCL standard. This standard imposes
diff --git a/adoc/chapters/programming_interface.adoc b/adoc/chapters/programming_interface.adoc
index 5792aa17..e0e9c20c 100644
--- a/adoc/chapters/programming_interface.adoc
+++ b/adoc/chapters/programming_interface.adoc
@@ -3823,7 +3823,7 @@ of the execution, or want to retrieve properties of a command that is running.
 
 Note that, although an event represents the status of a particular operation,
 the dependencies of a certain event can be used to keep track of multiple steps
-required to synchronize said operation.
+required to block on the results of said operation.
 
 A SYCL event is returned by the submission of a <<command-group>>.
 The dependencies of the event returned via the submission of the command group
@@ -4199,9 +4199,9 @@ new memory positions every call.
 The default allocator for const buffers will remove the const-ness of the
 type (therefore, the default allocator for a buffer of type [code]#const int#
 will be an [code]#Allocator<int>)#.
-This implies that host <<accessor,accessors>> will not synchronize with the pointer given
-by the user in the buffer/image constructor, but will use the memory
-returned by the [code]#Allocator# itself for that purpose.
+This implies that host <<accessor,accessors>> will not share memory with the
+pointer given by the user in the buffer/image constructor, but will use the
+memory returned by the [code]#Allocator# itself for that purpose.
 The user can implement an allocator that returns the same address as the
 one passed in the buffer constructor, but it is the responsibility of the
 user to handle the potential race conditions.
@@ -4233,7 +4233,8 @@ image_allocator
 |====
 
 
-See <<subsec:mutex>> for details on manual host-device synchronization.
+See <<subsec:mutex>> for details of using manual synchronization to avoid
+data races between host and device.
 
 
 [[subsec:buffers]]
@@ -4948,8 +4949,9 @@ property::buffer::use_mutex
       [code]#buffer# can be shared with the application via a
       [code]#std::mutex# provided to the property. The mutex
       [code]#m# is locked by the runtime whenever the data is in
-      use and unlocked otherwise. Data is synchronized with
-      [code]#hostData#, when the mutex is unlocked by the runtime.
+      use and unlocked otherwise. The contents of [code]#hostData# are
+      guaranteed to reflect the contents of the buffer when the
+      [code]#std::mutex# is unlocked by the runtime.
 
 a@
 [source]
@@ -5023,7 +5025,7 @@ context property::buffer::context_bound::get_context() const
 
 
 [[sec:buf-sync-rules]]
-==== Buffer synchronization rules
+==== Buffer destruction rules
 
 Buffers are reference-counted. When a buffer value is constructed
 from another buffer, the two values reference the same buffer and a
@@ -5961,8 +5963,9 @@ property::image::use_mutex
       application via a [code]#std::mutex# provided to the
       property. The [code]#std::mutex# is locked by
       the runtime whenever the data is in use and unlocked
-      otherwise. Data is synchronized with [code]#hostData#, when
-      the [code]#std::mutex# is unlocked by the runtime.
+      otherwise. The contents of [code]#hostData# are
+      guaranteed to reflect the contents of the image when the
+      [code]#std::mutex# is unlocked by the runtime.
 
 a@
 [source]
@@ -6035,7 +6038,7 @@ context property::image::context_bound::get_context() const
 
 
 [[sec:image-sync-rules]]
-==== Image synchronization rules
+==== Image destruction rules
 
 The rules are similar to those described in <<sec:buf-sync-rules>>.
 
@@ -6054,7 +6057,7 @@ will be copied back, if necessary, to the associated host memory.
 Any errors occurring during destruction are reported to any associated
 context's asynchronous error handler. If an image object is constructed
 with a storage object, then the storage object defines what
-synchronization or copying behavior occurs on image object destruction.
+blocking or copying behavior occurs on image object destruction.
 
 
 [[sec:sharing-host-memory-with-dm]]
@@ -6078,8 +6081,8 @@ of the class is, by default, passed to <<sycl-runtime>>, and that pointer cannot
 on the host side until the buffer or image is destroyed.
 A SYCL application can access the contents of the memory managed by a SYCL buffer
 by using a [code]#host_accessor# as defined in <<subsec:accessors>>.
-However, there is no guarantee that the host accessor synchronizes with the
-original host address used in its constructor.
+However, there is no guarantee that the host accessor will copy data back to
+the original host address used in its constructor.
 
 The pointer passed in is the one used to copy data back to the host, if needed,
 before buffer destruction.  The memory pointed by <<host-pointer>>
@@ -6253,7 +6256,7 @@ There are several different {cpp} classes that implement accessors:
 
 Accessor objects must always be constructed in host code, either in
 <<command-group-scope>> or in <<application-scope>>.  Whether the constructor
-blocks waiting for data to synchronize depends on the type of accessor.  Those
+blocks until data is available depends on the type of accessor.  Those
 accessors which provide access to data within a <<command>> do not block.
 Instead, these accessors define a requirement which influences the scheduling
 of the <<command>>.  Those accessors which provide access to data from host
@@ -13836,7 +13839,7 @@ include::{code_dir}/parallelfor.cpp[lines=4..-1]
 
 The last case of a [code]#parallel_for# invocation enables low-level functionality
 of work-items and work-groups. This becomes valuable when an execution
-requires groups of work-items that communicate and synchronize. These are
+requires groups of work-items to coordinate with one another. These are
 exposed in SYCL through [code]#+parallel_for (nd_range,...)+# and the
 [code]#nd_item# class. In this case, the developer needs to define the
 [code]#nd_range# that the kernel will execute on in order to have fine
@@ -13852,8 +13855,7 @@ from SYCL [code]#nd_item#, representing the currently executing work-item
 within the range specified by the [code]#nd_range# parameter. The
 [code]#nd_item# parameter makes all information about the work-item and
 its position in the range available, and provides access to functions
-enabling the use of a <<work-group-barrier>> to synchronize between the
-<<work-item,work-items>> in the <<work-group>>.
+enabling the use of a <<work-group-barrier>>.
 
 [NOTE]
 ====
@@ -13863,8 +13865,8 @@ Case 3) above includes user-defined types that can be constructed from
 
 The following example shows how sixty-four work-items may be launched
 in a three-dimensional grid with four in each dimension, and divided
-into eight work-groups. Each group of work-items synchronizes with a
-<<work-group-barrier>>.
+into eight work-groups. Each group of work-items uses a
+<<work-group-barrier>> for coordination.
 
 [source,,linenums]
 ----
@@ -13988,11 +13990,12 @@ understanding of which instances of a
 underlying variable.
 
 Also within the lambda body can be a sequence of calls to
-[code]#parallel_for_work_item#. At the edges of these inner parallel
-executions the work-group synchronizes. As a result the pair of
-[code]#parallel_for_work_item# calls in the code below is equivalent to
-the parallel execution with a <<work-group-barrier>> in the earlier
-example.
+[code]#parallel_for_work_item#. No work-item can begin executing a
+[code]#parallel_for_work_item# until all work-items in the group have
+completed executing the previous [code]#parallel_for_work_item#.
+As a result the pair of [code]#parallel_for_work_item# calls in the code below
+is equivalent to the parallel execution with a <<work-group-barrier>> in the
+earlier example.
 
 [source,,linenums]
 ----
@@ -14080,8 +14083,8 @@ of 10 elements of [code]#int# type, the host pointer must at least have
 A special case is the [code]#update_host# member function.
 This member function only requires an accessor, and instructs the runtime to update
 the internal copy of the data in the host, if any. This is particularly
-useful when users use manual synchronization with host pointers, e.g.
-via mutex objects on the [code]#buffer# constructors.
+useful when used in conjunction with the [code]#buffer# constructor overloads
+which accept mutex objects.
 
 <<table.members.handler.copy>> describes the interface for the
 explicit copy operations.
@@ -18227,22 +18230,21 @@ stored in [code]#std::array<DataT, NumElements>#.
 The available features are:
 
   * Accessor classes: Accessor classes specify acquisition and release of
-    buffer and image data structures to provide points at which underlying
-    queue synchronization primitives must be generated.
+    buffer and image data structures to provide points at which a SYCL runtime
+    must guarantee memory consistency.
   * Atomic operations: SYCL devices support a restricted subset of {cpp}
     atomics and SYCL uses the library syntax from the next {cpp} specification
     to make this available.
   * Fences: Fence primitives are made available to order loads and stores.
     They are exposed through the [code]#atomic_fence# function. Fences
     can have acquire semantics, release semantics or both.
-  * Barriers: Barrier primitives are made available to synchronize sets of
-    work-items within individual <<group,groups>>. They are exposed through the
-    [code]#group_barrier# function.
+  * Barriers: Barrier primitives are made available as a coordination mechanism
+    for work-items within individual <<group,groups>>. They are exposed through
+    the [code]#group_barrier# function.
   * Hierarchical parallel dispatch: In the hierarchical parallelism model of
-    describing computations, synchronization within the work-group is made
-    explicit through multiple instances of the
-    [code]#parallel_for_work_item# function call, rather than through
-    the use of explicit <<work-group-barrier>> operations.
+    describing computations, work-items within a work-group may coordinate
+    via multiple instances of the [code]#parallel_for_work_item# function call,
+    rather than through the use of explicit <<work-group-barrier>> operations.
   * Device event: they are used inside SYCL kernel functions to wait for
     asynchronous operations within a SYCL kernel function to complete.
 
@@ -18273,7 +18275,7 @@ the [code]#order# parameter:
     and release fence
 
 A <<group-barrier>> acts as both an acquire fence and a release fence: all
-work-items in the group execute a release fence prior to synchronizing at
+work-items in the group execute a release fence prior to signaling arrival at
 the barrier, and all work-items in the group execute an acquire fence
 afterwards.  A <<group-barrier>> provides implicit atomic synchronization
 as if through an internal atomic object, such that the acquire and release fences
@@ -19433,9 +19435,9 @@ a@
 ----
 flush
 ----
-   a@ Triggers a flush operation, which synchronizes the work-item stream buffer
-      with the global stream buffer, and then empties the work-item stream
-      buffer. After a flush, the full [code]#workItemBufferSize# is
+   a@ Triggers a flush operation, which copies the contents of the work-item
+      stream buffer to the global stream buffer, and then empties the work-item
+      stream buffer. After a flush, the full [code]#workItemBufferSize# is
       available again for subsequent streaming within the work-item.
 
 a@
@@ -19614,15 +19616,15 @@ template <typename T> const stream& operator<<(const stream& os, const T& rhs)
 
 
 
-=== Synchronization
+=== Output
 
-An instance of the SYCL [code]#stream# class is required to synchronize with the host, and must output
-everything that is streamed to it via the [code]#operator<<()# operator before a flush operation (that
-doesn't exceed the [code]#workItemBufferSize# or [code]#totalBufferSize# limits) within a SYCL
-kernel function by the time that the event associated with a command group submission enters the completed
-state. The point at which this synchronization occurs and the member function by which this synchronization is
-performed are implementation-defined. For example it is valid for an implementation to use
-[code]#printf()#.
+An instance of the SYCL [code]#stream# class is required to output everything
+that is streamed to it via the [code]#operator<<()# operator before a flush
+operation (that doesn't exceed the [code]#workItemBufferSize# or
+[code]#totalBufferSize# limits) within a SYCL kernel function by the time that
+the event associated with a command group submission enters the completed
+state. The point at which the flush operation is performed is
+implementation-defined.
 
 The SYCL [code]#stream# class is required to output the content of each stream, between flushes (up to
 [code]#workItemBufferSize)#, without mixing with content from the same stream in other work-items.
@@ -19817,7 +19819,7 @@ T operator()(const T& x, const T& y) const
 
 SYCL provides a number of functions that expose functionality tied to groups of
 work-items (such as <<group-barrier,group barriers>> and collective operations).
-These group functions act as synchronization points and must be encountered in
+These group functions act as _synchronization points_ and must be encountered in
 converged <<control-flow>> by all work-items in the group.  If one work-item in
 a group calls a group function, then all work-items in that group must call
 exactly the same function under the same set of conditions --- calling the same
@@ -19910,8 +19912,8 @@ within group [code]#g#.
 
 ==== [code]#group_barrier#
 
-The [code]#group_barrier# function synchronizes all work-items in a group,
-using a <<group-barrier>>.
+The [code]#group_barrier# function is a coordination mechanism for all
+work-items in a group.
 
 [source,,linenums]
 ----
@@ -19922,15 +19924,14 @@ include::{header_dir}/groups/barrier.h[lines=4..-1]
     [code]#sycl::is_group_v<std::decay_t<Group>># is true.
 +
 --
-_Effects:_ Synchronizes all work-items in group [code]#g#. The current
-work-item will wait at the barrier until all work-items in group [code]#g# have
-reached the barrier. In addition, the barrier performs <<mem-fence>> operations ensuring that
-memory accesses issued before the barrier are not re-ordered with those issued
-after the barrier: all work-items in group [code]#g# execute a release fence
-prior to synchronizing at the barrier, all work-items in group [code]#g#
-execute an acquire fence afterwards, and there is an implicit synchronization
-of these fences as if provided by an explicit atomic operation on an atomic
-object.
+_Effects:_ The current work-item will wait at the barrier until all work-items
+in group [code]#g# have reached the barrier. In addition, the barrier performs
+<<mem-fence>> operations ensuring that memory accesses issued before the
+barrier are not re-ordered with those issued after the barrier: all work-items
+in group [code]#g# execute a release fence prior to arriving at the
+barrier, all work-items in group [code]#g# execute an acquire fence afterwards,
+and there is an implicit synchronization of these fences as if provided by an
+explicit atomic operation on an atomic object.
 
 By default, the scope of these fences is set to the narrowest
 scope including all work-items in group [code]#g# (as reported by
diff --git a/adoc/chapters/what_changed.adoc b/adoc/chapters/what_changed.adoc
index 090d23f9..a1954808 100644
--- a/adoc/chapters/what_changed.adoc
+++ b/adoc/chapters/what_changed.adoc
@@ -252,8 +252,9 @@ member functions were updated to encapsulate all functionality tied to
 The [code]#barrier# and [code]#mem_fence# member functions of the
 [code]#nd_item# class have been removed.  The [code]#barrier# member
 function has been replaced by the [code]#group_barrier()# function, which
-can be used to synchronize either <<work-group, work-groups>> or <<sub-group, sub-groups>>.  The
-[code]#mem_fence# member function has been replaced by the
+can be used to block work-items in either <<work-group, work-groups>> or
+<<sub-group, sub-groups>> until all work-items in the group arrive at the
+barrier. The [code]#mem_fence# member function has been replaced by the
 [code]#atomic_fence# function, which is more closely aligned with
 [code]#std::atomic_thread_fence# and offers control over memory ordering
 and scope.
diff --git a/adoc/code/parallelforbarrier.cpp b/adoc/code/parallelforbarrier.cpp
index 67df9f0b..ce51a332 100644
--- a/adoc/code/parallelforbarrier.cpp
+++ b/adoc/code/parallelforbarrier.cpp
@@ -5,7 +5,6 @@ myQueue.submit([&](handler& cgh) {
   cgh.parallel_for(nd_range<3>(range<3>(4, 4, 4), range<3>(2, 2, 2)),
                    [=](nd_item<3> item) {
                      //[kernel code]
-                     // Internal synchronization
                      group_barrier(item.get_group());
                      //[kernel code]
                    });