The Hash Component Model

The Hash Component Model was proposed in the mid-2000s as an attempt to improve the practices in parallel programming from the perspective of best practices in software engineering/architecture. At that time, important research initiatives from the HPC and computational science communities tried to advance the state-of-the-art of component models to meet the requirements of emerging HPC applications. This was a response to the increased complexity and scale of applications of interest in computational sciences leveraged by internet infrastructure and the needs of better technologies for interoperability between legacy code implemented by different research teams. These initiatives led to the development of Common Component Architecture (CCA), Fractal, and Grid Component Model (GCM), with a series of compliant platforms and frameworks that have been validated in real-world applications. The proposal of the Hash Component Model was motivated by the opinion that these models failed to reach a general notion of parallel components, which could efficiently and simply accommodate the various distributed-memory parallel programming patterns.

The Hash Component Model has the orthogonality between processes and concerns as its first design principle. It states that the concerns that guide the decomposition of software in the current practice of software architecture, are interspersed across the implementation of processes in a parallel program. Such a characteristic has two consequences. Firstly, it makes it difficult to use modern software decomposition techniques as the scale of parallel programs increases, moving from small-scale to large-scale software, where concern-oriented decomposition by means of some advanced notion of software component becomes essential. Secondly, designers of component platforms and frameworks that target HPC applications tend to decompose large-scale software with dominant parallelism requirements according to the process dimension (instead of the concern dimension), dealing with the well-known consequences of poor modularity in large-scale software.

In component models without a general notion of parallel components, such as CCA and Fractal, it is assumed that component-based parallelism is achieved by a team (or cohort, in CCA jargon) of component instances, each one placed in a node of a cluster. Such components may be distinct, in the case of MCMD (multiple components multiple data) parallel programming patterns, so that we have a concern crosscutting the implementation of a set of components. The communication among parallel component instances may be performed through either message-passing or regular client/server component interfaces. In the first case, message-passing may be viewed as a kind of backdoor communication between components, breaking their functional independence. In the second case, client/server interfaces are not adequate for the usual peer-to-peer relations between processes in a parallel program, motivating, for example, Fractal designers to propose client-server-based collective interfaces for Fractal, inherited by GCM.

In the Hash Component Model, a parallel program is decomposed into components according to the dimension of concerns, following the usual practice of software architecture. By consequence, a component is defined by a set of slices of different processes, so-called units, which implement the concern addressed by the component collaboratively. In such a way, communication operations between processes are encapsulated inside each parallel component, and the communication between components is only possible through their regular interfaces, generally based on client-server-based relations.

Definition

A component-based parallel programming framework/platform complies with Hash Component Model if it implements three abstractions: units, overlapping composition and component kinds.

Units

A Hash component is formed by a set of units, each one placed in a processing node of a distributed-memory parallel computing platform (e.g., clusters and MPPs). Each unit corresponds to a slice of one of the processes of the parallel program. So, the set of units correspond to a set of slices, of different processes, which address the implementation of a parallel concern in the parallel program. Such a parallel concern is the concern addressed by the component.

Overlapping Composition

By overlapping composition, a Hash component, so-called host component, may be built from other Hash components, so-called inner components. Formally, it may be interpreted as an algebra of Hash components. More simply, one may see overlapping composition as a function that maps units of inner components to the units of the host component (host units), where units of the same inner component cannot be mapped to the same host unit. We say that the units, of the inner components, mapped to a host unit are the slices of the host unit.

Component Kinds

A component kind represents a set of components that shares the same connection and deployment models with respect to other components and the framework/platform, respectively. In other words, they may be viewed as different kinds of building blocks for the parallel computing systems supported by the framework/platform. For example, a framework/platform may use component kinds to define a special-purpose component composition language for building its target systems. In the case of HPC Shelf, components kinds target general-purpose multi-cluster multi-cloud parallel computing systems.

Implementation

Haskell#

The Hash Component Model has its origins with Haskell# in the early 2000s. Haskell# was a parallel extension to the Haskell functional language based on message-passing, where units of a component are functional processes that communicate through lazy streams. In fact, the Hash Component Model was the coordination medium that allowed functional processes written in pure Haskell to exchange messages through lazy streams connecting them. At the coordination level, functional processes are connected through an architectural description language called Hash Configuration Language (HCL).

HPE (Hash Programming Environment)

HPE was the first reference implementation of the Hash Component Model. It was a general-purpose platform for component-oriented parallel programming targeting cluster computing environments. For that purpose, the component kinds of HPE are applications, the final parallel programs; computations, implementing parallel algorithms; synchronizers, implementing communication/synchronization patterns; environments, encapsulating parallelism enabling libraries; platforms, representing characteristics of the cluster platform; and qualifiers. In fact, HPC Shelf has been implemented as an extension of HPE meeting multi-cluster and multi-cloud requirements.

Hierarchical Parallelism (next step, soon)

The Hash Component Model has been originally designed with cluster computing in mind, i.e., dealing with parallelism only at the distributed-memory level of a parallel computing system, among the processing nodes of a cluster (or MPP). Therefore, it is not aware of multiprocessor and multicore parallelism within the processing nodes, neither the multicluster parallelism (of HPC Shelf).

The exploration of parallelism in many hierarchies is an important requirement in the design of modern parallel computing systems, helping to meet heterogeneous computing requirements. In fact, today, hierarchical parallelism and heterogeneous computing are key technologies to meet the challenge of achieving the so-called exascale systems of the near future. However, there are more pragmatic reasons to increase the expressiveness of Hash Component Model with hierarchical parallelism.

Because HPC Shelf is designed as an extension to HPE, where Hash components reside within clusters, HPC Shelf suffers from modularization problems similar to component-based computational frameworks targeted at clusters of the 2000s. In order to implement parallel computing involving multiple clusters, a parallel computing system of HPC Shelf employs a cohort of computation component instances connected through an appropriate connector to intermediate the communication and synchronization among them. In fact, such a multicluster parallel computation is not encapsulated in a parallel component, which is a modularization principle of the Hash Component Model.

With the objective of encapsulating multicluster parallel computations inside parallel computing systems of HPC Shelf, the Hash component mode is being extended to support hierarchical parallelism. In fact, such an extension will make it possible to deal with parallelism at multiple hierarchies, including multicluster, multiprocessing, multicore, and accelerator-based parallelism.

The Hash Component Model augmented with hierarchical parallelism will be presented in a further article, under preparation.

Journal Papers

Hash Programming Environment (HPE)

Hash Component Model

Conference Papers

Hash Programming Environment (HPE)

Hash Component Model