77279

TAG CLOUD FOR THE INFORMATION DATA FILTRATION

Научная статья

Информатика, кибернетика и программирование

The ppliction of the theory of rough sets is considered to solve the problems of visuliztion nd processing of dt. The theory of rough sets cn be considered to be one of the wys of developing the Freges ide of uncertinty. In this pproch uncertinty is defined through the boundry of set. If our knowledge is not enough for strict definition of set then its boundry is not null otherwise the set is stndrd.

Английский

2015-02-02

27.5 KB

0 чел.

TAG CLOUD FOR THE INFORMATION DATA FILTRATION

D.V. Manakov, R.O. Sudarikov

IMM UrB RAS, UrFU, Ekaterinburg

The parallel filtration of the data together with parallel rendering is widely applied to reduce volume of the visualized data.

The question of efficiency is the key question for parallel computing. The essence of the matter is when you should use parallel rendering or filtration of data for the best results in case of visualization large amounts of data. Raster data is generated as the result of rendering, thus reducing the available methods of interaction with visualized objects. Decreasing the amount of data helps to speed up the interaction with computational model.

A quick selection of information of interest is the key feature of the filtration. One can use different methods to solve this problem, for example, data restructuring with k-trees and item processing in data flow model. Obviously, it is easy to use hashes in a software implementation.

A context tag cloud, the metaphor of visualization and interaction, is proposed as a solution of that problem in this work. A context tag cloud designed for efficient interpretation of the search results in the Internet. Interaction between selected and filtered data is implemented through the hashes.

The correct solution of the efficiency problem can only be based on a formal model. “It is nearly impossible to show the fundamental laws or variational principles of an object in order to create its model. One of the most useful approaches for such an object is applying the analogs of studied objects.” [1]

Visualization as an object of study is poorly formalized, but one can declare a semi-model or a base for visualization theory for creation of the assessment of efficiency needed. The declaration can be started with the definitions and analogs searching.

Optimal control can be one of such analogs. Thus filtration can be defined as an interactional process, which aim is to show maximum data with minimal costs. In general case, filtration would be a solution for the problem of getting minimum of the cognitive distance, which shows users efforts to convert the amount of actions to input data and its visualization into the operations and objects of application area. MapReduce algorithm is used for the search. If type of mapping is chosen in the same way, then the cognitive distance will possibly decrease.

The application of topological analysis to visualization is demonstrated in the work [Choudhury], where the trace of program is mapped with a cloud of points.

The application of the theory of rough sets is considered to solve the problems of visualization and processing of data. The theory of rough sets can be considered to be one of the ways of developing the Freges idea of uncertainty. In this approach uncertainty is defined through the boundary of a set. If our knowledge is not enough for a strict definition of a set, then its boundary is not null, otherwise the set is standard. A boundary as a topological concept is the difference between the closure and the interior of a set. Granularity is also a key concept of this theory. A set is defined as a aggregation of elementary items, for example, in the visualization they are the graphic primitives.

Declaration of membership function for a set can be considered as a method of ordering of that set. The parallel with hashes seems to be obvious. The discussion of a rough sets membership function as a metric of assessment of efficiency seems to be of a great interest. A rough set X can be considered to have a membership function X(x)[0,1] for each of its items. This function defines the probability of membership instead of classical theory where elements is strictly a member of a set or not. Tag cloud can be considered as a set of words, ordered with the membership function, with computed frequency features.

The problem of the algorithm definition is equivalent to the problem of computable function. The theorem of equivalence computable function and membership function can be formulated.

Metaphor of context tag cloud expands the standard tag cloud by defining several membership functions for the search results set. Rough set X, which contains the search results, can be represented as an array of hashes, where the central element of that array is the search string. Two membership functions are declared: the width of a context, which shows the amount of words displayed on the right and on the left from search string, and frequency of occurrence of the word, depends on the document type(doc, pdf or html) and shown with RGB-color gradation. The search results also includes title and hyperlink of the document, which can be marked out through the interaction with particular word (a key).

The program is implemented with Ruby and its architecture corresponds to the Cloud Computing. The program intermediary (proxy) get search results through Google API and restructures data to hashes. The interaction on the client is based on that hashes, thus the amount of re-computing actions is reduced to minimum. The next step of this work is the integration of the implemented system with Hadoop distributed computing system.

Literature

1. Samarskiy A.A., Mihajlov A.P.mathematical modeling: Ideas. Methods. Examples. – М. PhysMathGiz. 1993 (in Rusian)/

2. Choudhury, A.N.M.I.; Bei Wang; Rosen, P.; Pascucci, V. Topological analysis and visualization of cyclical behavior in memory reference traces // Pacific Visualization Symposium (PacificVis), 2012 IEEE,vol., no., pp.9-16, Feb. 28 2012-March 2 2012.


 

А также другие работы, которые могут Вас заинтересовать

1150. Табулирование трансцендентных функций 460 KB
  Изучение и сравнение различных способов приближенного вычисления заданной функции. Вычисление погрешности интерполирования. Корни полинома Чебышева. Построение графиков погрешностей. Вычисление интегралов с помощью формулы трапеций.
1151. Субмаринная разрузка пресных подземных вод 285 KB
  Технические средства системы поиска субмаринных источников. Технические средства системы управления волновой энергоустановки. Описание алгоритма поиска субмаринных источников. Волнонасос поршневого типа. Гидротурбина с радиально-осевым приводом.
1152. Преобразование Хартли и Габора, косинусное преобразование 74 KB
  Непрерывное и дискретное преобразование Хартли. Непрерывное преобразование Габора. Непрерывное и дискретное косинусное преобразование.
1153. Расчёт смесительного каскада 249.5 KB
  Найдем частоту гетеродина и расположим частоты каналов приёма в линейном режиме преобразования частоты и, соблюдая масштаб, сделаем график спектра. Проходная ВАХ транзистора КТ321В. Рассчитаем значения амплитуды первой гармоники тока коллектора. Методом пяти точек вычисляют шумовые параметры транзистора в смесительном режиме.
1154. Изучение основных принципов языка Delphi и C++ 436.5 KB
  Разработка приложений с графическим интерфейсом пользователя. Изучение принципов процедурного программирования. Сравнение языков С++ и Delphi. Объявление класса и инкапсуляция, наследование. Графическая среда Delphi. Сравнение графических оболочек и текстовых редакторов Visual Studio и Delphi 7.
1155. Основы электроники 1.27 MB
  Изучением физических принципов функционирования электронных элементов. Изучением принципов построения, особенностью действия, основ характеристик электронных устройств и систем. Теоретическим и экспериментальным исследованием элементов, устройств и систем.
1156. Отношение молодежи к Великой Отечественной Войне 2.01 MB
  Исследовательская. Отношение молодежи к Великой Отечественной Войне по Москве. Социологическое исследование среди молодежи города Зеленограда с целью выявления отношения и уровня знаний молодежи о Великой Отечественной Войне.
1157. Программирование приложений для WINDOWS с использованием функций WinAPI 114.5 KB
  Программирование на С++. Общие положения программирования в среде Windows. Создание приложений Windows с использованием OWL. Отличительные особенности Borland C++. Общие положения создания и обработки окон приложений. Решение проблемы корректного вывода.
1158. Понятие граф в математике 360 KB
  Примеры построения диаграммных графов. Степень вершины графов и их изолированность. Изображение одного и того же графа. Эйлеровы графы. Решение задачи о семи кенигсбергских мостах. Двудольные графы. Планарные и плоские графы. Графы с цветными ребрами.