DATAFLOW-BASED DISTRIBUTED COMPUTING SYSTEM
Информатика, кибернетика и программирование
The method is bsed on the following concepts: storge tsk nd rule. Storge stores nmed dt on which three opertions could be pplied crete write red nd delete. Every item in the storge is selfsufficient nd contins dt some metinformtion nd hs unique nme. The term tsk identifies the progrm which could red dt with specific nmes from the storge nd generte new dt items which will be written into the storge s result of tsk execution.
DATAFLOW-BASED DISTRIBUTED COMPUTING SYSTEM
Senior Software Engineer
Miami, FL, USA
Usage of parallel and distributed computing systems is accompanied with big expenditures related to programming for such systems. The problem is that modern popular parallel programming tools, like MPI and OpenMP, are quite complex to use software developers have to take care of distribution of computational tasks, synchronization, data exchange and so on.
Different methods have been used to simplify development and execution of parallel programs: generic tools for automatic program parallelization (for both shared memory and multicomputer systems); frameworks and libraries to solve particular classes of tasks (in general, it refers to applications with high level of data parallelism), and universal tools which are trying to simplify the technical side of the process of parallel and distributed systems programming.
Sometimes developers are using nonstandard calculus paradigms in their tools and libraries. One of such paradigms is Dataflow . Some variations of Dataflow are used in microprocessor architectures, supercomputers, computational threads organization in multithreaded environment, and inter-process communications on distributed systems.
In this paper, the authors describe methods and tools for programming in parallel and distributed environments relying on analysis of different Dataflow models including their own. The goal of proposed design is to simplify parallel software development without significant loss of algorithms execution efficiency.
The current method of computations arose from long theoretical research of distributed computing operational system described in . The method is based on the following concepts: storage, task and rule. Storage stores named data on which three operations could be applied create (write), read and delete. Every item in the storage is self-sufficient and contains data, some meta-information, and has a unique name. The term task identifies the program which could read data with specific names from the storage and generate new data items which will be written into the storage as a result of task execution. The term rule designates the construction which defines conditions and parameters of the program execution. Each rule contains:
1. A list of prerequisite data items required to execute tasks;
2. A list of correspondence between global data names (stored in the storage) and local names (which will be used inside the program);
3. A list of tasks (programs) which should be executed;
4. Actions to be performed in the case of successful execution of tasks defined in (3).
The rule is considered to be ready for execution when all data items with names from the list (1) are present in the storage. The rule is marked to be removed from the list of the rules to be executed, after all tasks defined in it successfully finished.
Software development process and carrying out calculations is unfolding in the following manner: The programmer develops quite simple and small programs (binaries or some kind of scripts) which will be used to represent different tasks. The developer can use any combinations of programming languages and target hardware, as some tasks could be executed on x86 architecture, some on RISC and even on GPU. The programmer also generates an initialization file in which initial rules of the system are described. The set of active rules could be increased later on during execution. Beside the set of initial rules, the initialization file contains initial data items which will be written into the storage.
Further on, the developer executes a run command. The computing environment finds rules ready to execute during the execution stage, and schedules the specified tasks to be executed on available resources. As a result, some rules complete, generate new data and release resources for other rules. The environment continues to search and execute rules until the rules set is exhausted, the job is being suspended from the outside process, or an error is exposed.
The Architecture of the system consists of two main parts: runtime and storage facilities. Runtime is an engine that allows the system to operate and storage is responsible for storing all data consumed and produced by tasks. Storage is also used to sustain some internal runtime data.
Runtime contains the following main elements:
The system execution model could be described as follows. Runner reads the initial rules. It also reads the list of addresses of runsides available for the current task. Runner detects rules which could be scheduled for execution, and detects runsides with enough available computing resources of appropriate type (with corresponding hardware etc) for this rule. If such runsides are found, the rule is being scheduled to one of the found runsides. If there were no runsides found, runner skips this rule and returns to it on the next cycle. When the rule is scheduled to some runside, that runside: a) synchronizes the calculation program (binary or script file) with the runner and downloads a new version if necessary; b) invokes runvisor and assigns the rule. Runvisor performs some local preparations like environment settings and rule initialization commands and then invokes programs specified in the rule. Runvisor evaluates rule finalization commands when all tasks are finished and if there were no errors detected. Rule initialization and finalization commands may contain data storage renaming and mapping operations, local assignments, and new rule creation actions. Finally, runvisor marks the rule as completed in the global space.
The second part of the system is the storage subsystem. It is obvious that such subsystem must satisfy to quite specific requirements. We investigated available storage projects, including distributed databases and file systems as well as distributed hash tables and came to a conclusion that at the moment there is no system exists which conforms to all of our requirements. So currently we are in the progress of developing distributed storage systems capable to:
We are confident that a storage facility with the described features will be efficient for the needs of described computing system. At the current moment a partial solution is already developed persistent single-node storage with the remote data access API.
The described method possesses a number of valuable capabilities, such as the ability to carry out computational experiments on hybrid architectures, ability to alter the amount of computing nodes during runtime, ability to support applications in the globally-distributed environment, ability to automatically create checkpoints and suspend and resume computation in the transparent to user manner, ability to use distributed data storages, and so on.
The authors are developing the prototype of the system based on the suggested method the RIDE project, available at www.ridehq.net. The early samples of programs show the realizability of the system and the elegance of the code for rules description. Authors believe that evolution of this project will achieve the main goal - make the process of distributed parallel programs development simple and more effective.