-
Notifications
You must be signed in to change notification settings - Fork 10
The programming interface
SMI exposes primitives for both point-to-point and collective
communications. To use them, the user has to include the smi.h
header in her own device programs.
Communication in SMI codes is based on transient channels: when established, a streaming interface is exposed at the specified port at either end, allowing data to be streamed across the network using FIFO semantics, with an optional finite amount of buffer space at each endpoint. A streaming message consists of one or more elements with a specified data type. The communication endpoints are uniquely identified by their rank and port parameters. Ranks uniquely identify FPGA devices, and ports distinguish distinct communication endpoints within a rank. Once established, channels exist in code in the form of channel descriptors that the user can employ for performing communications. A single rank can exist per FPGA. Ranks involved in communication and the total number of ranks can then be dynamically altered without recompiling the program, by simply updating the routing configuration at each rank (see Development Workflow).
The user can declare a send or *receive *channel by specifying the number of elements to send, the data type of the elements, the source or destination rank, the port, and the communicator. Channels are implicitly closed when the specified number of elements have been sent or received.
SMI_Channel SMI_Open_send_channel(int count, SMI_Datatype type, int destination, int port, SMI_Comm comm);
SMI_Channel SMI_Open_recv_channel(int count, SMI_Datatype type, int source, int port, SMI_Comm comm);
Analogously to MPI, communicators allow communication to be further organized into logical groups.
Please Note: In the current implementation only a global communicator is supported
To send and receive data elements from within the pipelined HLS code, SMI
provides the SMI_Push
and SMI_Pop
primitives:
void SMI_Push(SMI_Channel* chan, void* data);
void SMI_Pop(SMI_Channel* chan, void* data);
Both functions operate on a channel descriptor of a previously opened channel,
and a pointer either to the data to be sent, or to the target at which to store
the data.
These primitives are blocking, such that SMI_Push
does not return before the
the data element has been safely sent to the network, and the sender is free to
modify it, and SMI_Pop
returns only after the output buffer contains the newly
received data element.
Additionally, the type specified by the SMI_Push/_Pop
operations must match
the ones defined in the Open_Channel
primitives. With these
primitives, communication is programmed in the same way that data is normally
streamed between intra-FPGA modules.
In SMI, communication channels are characterized by an asynchronicity degree k > 0, meaning that the sender can run ahead of the receiver by up to k data elements. If the sender tries to push the k+1-th element before an element is popped by the receiver, the sender will stall. Because of this asynchronicity, an SMI send is non-local: it can be started whether or not the receiver is ready to receive, but its completion may depend on the receiver, if the message size is larger than k.
The user can define the asynchronicity degree of a channel while opening it using the functions:
SMI_Channel SMI_Open_send_channel_ad(int count, SMI_Datatype type, int destination, int port, SMI_Comm comm, int asynch_degree);
SMI_Channel SMI_Open_recv_channel_ad(int count, SMI_Datatype type, int source, int port, SMI_Comm comm, int asynch_degree);
Collective communication is key to develop distributed applications that
can scale to a large number of nodes. In collective operations, all ranks in a
given communicator must be involved in communicating data. SMI defines the
Broadcast
, Reduce
, Scatter
, and Gather
collective operation primitives analogous to their MPI counterparts.
Each collective operation defined by SMI implies a distinct channel
type, open channel operation, and communication primitive.
SMI allows multiple collective communications to execute in parallel, provided that they use separate ports.
Please Note: to prevent the compiler to allocate the channel descriptor in BRAM rather than in logic (resulting in lower performance), the channel descriptor must be declared as register-resident data. For example: SMI_ScatterChannel __attribute__((register)) chan= SMI_Open_scatter_channel(...)
The communicator object will be built on the host side of the application. On the device, the user can access to its content by using the following functions:
int SMI_Comm_size(SMI_Comm comm);
int SMI_Comm_rank(SMI_Comm comm);
The former can be used to obtain the size of the communicator. The latter returns the rank of the caller.
For all the following collectives, the asynchronicity degree can be specified in a similar way to the case of point-to-point communications.
To perform a Broadcast
, each rank opens a
broadcast-specific channel (SMI_BChannel
), indicating the count and
the data type of the message elements, the rank of the root, the
port, and the communicator:
SMI_BChannel SMI_Open_broadcast_channel(
int count, SMI_Datatype type, int port, int root, SMI_Comm comm);
To participate in the broadcast operation, each rank will use the associated primitive:
void SMI_Broadcast(SMI_BChannel* chan, void* data);
If the caller is the root, it will push the data towards the other ranks. Otherwise, the caller will pop data elements from the network.
The reduce channel is opened with the respective primitive:
SMI_RChannel SMI_Open_reduce_channel(int count, SMI_Datatype data_type, SMI_Op op, int port, int root, SMI_Comm comm)
in which is indicated the length of the message (count
), the data type, the reduction operation to apply, the port number, the root rank, and the communicator.
Currently, SMI support SMI_Add
, SMI_Max
and SMI_Min
as reduce operation.
Each rank uses then the associated communication primitive:
void SMI_Reduce(SMI_RChannel *chan, volatile void* data_snd, volatile void* data_rcv)
in which data_snd
is the data that must be reduced and data_rcv
is the result of the reduction (valid only on the root rank).
For opening a Scatter channel, the user can invoke:
SMI_ScatterChannel SMI_Open_scatter_channel(int send_count, int recv_count, SMI_Datatype data_type, int port, int root, SMI_Comm comm)
in which send_count
indicates the number of elements sent by the root to each rank, and recv_count
represents the number of data elements received by each rank.
Communication is performed by means of the primitive:
void SMI_Scatter(SMI_ScatterChannel *chan, void* data_snd, void* data_rcv)
where data_snd
is the pointer to the data elements that must be sent (root side) and data_rcv
is the pointer to the memory area in which the received data element is stored.
The user must consider the asymmetric nature of scatter (root and non-root ranks send/receive different data elements), in writing her own program. For example:
///...
SMI_ScatterChannel __attribute__((register)) chan= SMI_Open_scatter_channel(N,N, SMI_INT, 0,root,comm);
const int my_rank=SMI_Comm_rank(comm);
const int num_ranks=SMI_Comm_size(comm);
const int loop_bound=(my_rank==root)?N*num_ranks:N;//consider different loop bounds for the root and non_root
for(int i=0;i<loop_bound;i++) //perform pipelined communication
{
//<root prepares data to send>
SMI_Scatter(&chan,&to_send, &to_rcv);
//...
}
A Gather channel is opened by using :
SMI_GatherChannel SMI_Open_gather_channel(int send_count, int recv_count, SMI_Datatype data_type, int port, int root, SMI_Comm comm)
while the communication is performed with the primitive:
void SMI_Gather(SMI_GatherChannel *chan, void* send_data, void* rcv_data)
Similarly to Scatter
, in the user program, different loop bounds must be considered to perform communication in root and non-root ranks.