Научная статья

Информатика, кибернетика и программирование

The method is bsed on the following concepts: storge tsk nd rule. Storge stores nmed dt on which three opertions could be pplied – crete write red nd delete. Every item in the storge is selfsufficient nd contins dt some metinformtion nd hs unique nme. The term tsk identifies the progrm which could red dt with specific nmes from the storge nd generte new dt items which will be written into the storge s result of tsk execution.



39.5 KB

0 чел.


Mihail Bakhterev
System Software Dept., IMM UrB RAS

Yekaterinburg, Russia


Pavel Vasev
System Software Dept., IMM UrB RAS

Yekaterinburg, Russia


Alexey Kazantsev
System Software Dept., IMM UrB RAS

Yekaterinburg, Russia


Ilya Albrekht

Senior Software Engineer

Miami, FL, USA


  1.  Introduction

Usage of parallel and distributed computing systems is accompanied with big expenditures related to programming for such systems. The problem is that modern popular parallel programming tools, like MPI and OpenMP, are quite complex to use – software developers have to take care of distribution of computational tasks, synchronization, data exchange and so on.

Different methods have been used to simplify development and execution of parallel programs: generic tools for automatic program parallelization (for both shared memory and multicomputer systems); frameworks and libraries to solve particular classes of tasks (in general, it refers to applications with high level of data parallelism), and universal tools which are trying to simplify the technical side of the process of parallel and distributed systems programming.

Sometimes developers are using nonstandard calculus paradigms in their tools and libraries. One of such paradigms is Dataflow [1]. Some variations of Dataflow are used in microprocessor architectures, supercomputers, computational threads organization in multithreaded environment, and inter-process communications on distributed systems.

In this paper, the authors describe methods and tools for programming in parallel and distributed environments relying on analysis of different Dataflow models including their own. The goal of proposed design is to simplify parallel software development without significant loss of algorithms execution efficiency.

  1.  system concept

The current method of computations arose from long theoretical research of distributed computing operational system described in [2]. The method is based on the following concepts: storage, task and rule. Storage stores named data on which three operations could be applied – create (write), read and delete. Every item in the storage is self-sufficient and contains data, some meta-information, and has a unique name. The term task identifies the program which could read data with specific names from the storage and generate new data items which will be written into the storage as a result of task execution. The term rule designates the construction which defines conditions and parameters of the program execution. Each rule contains:

1.  A list of prerequisite data items required to execute tasks;

2.  A list of correspondence between global data names (stored in the storage) and local names (which will be used inside the program);

3. A list of tasks (programs) which should be executed;

4. Actions to be performed in the case of successful execution of tasks defined in (3).

 The rule is considered to be ready for execution when all data items with names from the list (1) are present in the storage.  The rule is marked to be removed from the list of the rules to be executed, after all tasks defined in it successfully finished.

Software development process and carrying out calculations is unfolding in the following manner:  The programmer develops quite simple and small programs (binaries or some kind of scripts) which will be used to represent different tasks.  The developer can use any combinations of programming languages and target hardware, as some tasks could be executed on x86 architecture, some on RISC and even on GPU.  The programmer also generates an initialization file in which initial rules of the system are described.  The set of active rules could be increased later on during execution. Beside the set of initial rules, the initialization file contains initial data items which will be written into the storage.

Further on, the developer executes a run command.  The computing environment finds rules ready to execute during the execution stage, and schedules the specified tasks to be executed on available resources. As a result, some rules complete, generate new data and release resources for other rules. The environment continues to search and execute rules until the rules set is exhausted, the job is being suspended from the outside process, or an error is exposed.

  1.  system structure

      The Architecture of the system consists of two main parts: runtime and storage facilities. Runtime is an engine that allows the system to operate and storage is responsible for storing all data consumed and produced by tasks. Storage is also used to sustain some internal runtime data.

     Runtime contains the following main elements:

  •  Runner – evaluates the initial rules file, detects rules ready for execution and schedules them on available runsides;
  •  Runside – is a representation of a computing node or a group of nodes. Runside maintains a list of available resources (processor cores, GPUs, etc), their types and capacities (architecture, OS, memory, etc);
  •  Runvisor – is a supervising module dedicated to evaluation of a single rule. Runvisors are started by runside when runner schedules a new rule to that runside.

The system execution model could be described as follows. Runner reads the initial rules. It also reads the list of addresses of runsides available for the current task. Runner detects rules which could be scheduled for execution, and detects runsides with enough available computing resources of appropriate type (with corresponding hardware etc) for this rule. If such runsides are found, the rule is being scheduled to one of the found runsides. If there were no runsides found, runner skips this rule and returns to it on the next cycle. When the rule is scheduled to some runside, that runside: a) synchronizes the calculation program (binary or script file) with the runner and downloads a new version if necessary; b) invokes runvisor and assigns the rule. Runvisor performs some local preparations like environment settings and rule initialization commands and then invokes programs specified in the rule. Runvisor evaluates rule finalization commands when all tasks are finished and if there were no errors detected. Rule initialization and finalization commands may contain data storage renaming and mapping operations, local assignments, and new rule creation actions. Finally, runvisor marks the rule as completed in the global space.

The second part of the system is the storage subsystem. It is obvious that such subsystem must satisfy to quite specific requirements. We investigated available storage projects, including distributed databases and file systems as well as distributed hash tables and came to a conclusion that at the moment there is no system exists which conforms to all of our requirements. So currently we are in the progress of developing distributed storage systems capable to:

  •  Process small (from 1 byte) and large data items (up to gigabytes);
  •  Process large amount of data items (billions);
  •  Work in distributed mode;
  •  Work in multi-user/namespace mode;
  •  Provide smart caching strategies;
  •  Provide efficient search functionality by name and by mask;
  •  Hot-swap storage nodes addition and removal;
  •  Ability to use minimal amount of disk operations for efficient shared memory computing.

We are confident that a storage facility with the described features will be efficient for the needs of described computing system. At the current moment a partial solution is already developed – persistent single-node storage with the remote data access API.


The described method possesses a number of valuable capabilities, such as the ability to carry out computational experiments on hybrid architectures, ability to alter the amount of computing nodes during runtime, ability to support applications in the globally-distributed environment, ability to automatically create checkpoints and suspend and resume computation in the transparent to user manner, ability to use distributed data storages, and so on.

     The authors are developing the prototype of the system based on the suggested method – the RIDE project, available at www.ridehq.net. The early samples of programs show the realizability of the system and the elegance of the code for rules description. Authors believe that evolution of this project will achieve the main goal - make the process of distributed parallel programs development simple and more effective.


  1.  1. Dennis J., Data Flow Supercomputers // Computer, Vol.13, No.11, pp.48-56, 1980
  2.  Bakhterev M.O. The description of parallel computations in the terms of closures // 10Th International Workshop "Supercomputing and Mathematical Simulations", RFNC-VNIIEF, Sarov, p. 31-32, 2008.


А также другие работы, которые могут Вас заинтересовать

34993. Препринимательство 30.5 KB
  Самостоятельность в организации производства дополняется коммёрческой свободой. Она предполагает поиск новых путей развития производства. Предприниматель второго типа для достижения намеченной цели имеет во внешней среде альтернативные возможности сопоставляет их с имеющимися ресурсами и выбирает новую комбинацию факторов производства. Предприниматель берет на себя инициативу соединения и координацию факторов производства труда земли и капитала.
34994. Основные понятия собственности. Формы деловых предприятий 30.5 KB
  Основные понятия собственности. В отношении собственности выделяют юридические правовые и экономические отношения собственности. Первые характеризуют отношения субъектов собственности собственников к объектам собственности имуществу. Экономические отношения собственности возникают только там где собственность служит источником дохода.
34996. Бухгалтерские издержки 30.5 KB
  Бухгалтерские издержки определяются фактическими затратами на заработную плату рабочих и служащих расходами на сырье и материалы величиной амортизационных отчислений по капитальному оборудованию и т. Бухгалтерские издержки лежат в основе определения бухгалтерской фактической прибыли предприятия которая определяется как общая выручка минус бухгалтерские издержки. Инструментами экономического анализа выступают так называемые экономические издержки. Экономические издержки фирмы это издержки рассчитанные с учетом альтернативного...
34998. Закон убывающей производительности (отдачи) 29 KB
  Дело в том что в краткосрочном периоде когда технологический процесс остается неизменным а величина хотя бы одного фактора является фиксированной неизменной неизбежно наступает такой момент когда каждая новая вовлекаемая в производство единица переменного фактора будет обеспечивать меньшее увеличение выпуска продукции чем предыдущая. Но по закону убывающей предельной производительности последовательное увеличение переменного ресурса при неизменности других ведет к убывающей отдаче данного фактора то есть к снижению предельного...
34999. Издержки производства в долгосрочном периоде. Эффекты мас 32 KB
  Хозяйственная практика показывает что на каждой ступени расширения производственных мощностей и увеличения объемов производства происходит постепенное снижение издержек производства на единицу продукции. Эта закономерность проявляющаяся слабее или сильнее практически в любом виде производства объясняется действием так называемых эффектов масштаба. Если долговременные средние издержки падают с ростом выпуска говорят что фирма имеет экономию обусловленную ростом масштабов производства.
35000. Принцип формирования и виды доходов населения 32 KB
  Признаются равно справедливыми и приемлемыми и высокие доходы тех кто преуспел в конкуренции и низкие доходы а то и отсутствие таковых тех кто потерпел неудачу. Доходы населения принято классифицировать в соответствии с разными признаками. доходы за вычетом налогов и взносов. Номинальные доходы сумма денег полученная человеком за определенный период времени.
35001. Проблема неравенства доходов. Кривая Лоренца 32 KB
  На потребительском рынке это неравенство возможностей проявляется в неравной платежеспособности покупателей в основе которой лежит неравенство доходов. Очевидно что при равном распределении доходов какими бы благими намерениями оно ни оправдывалось в обществе не будут производить предметы роскоши ибо их некому будет купить. И наоборот в обществе с неравным распределением доходов выпускаемая продукция и оказываемые услуги будут значительнее разнообразнее а структура потребления разных доходных групп будет существенно различаться.