77279

TAG CLOUD FOR THE INFORMATION DATA FILTRATION

Научная статья

Информатика, кибернетика и программирование

The ppliction of the theory of rough sets is considered to solve the problems of visuliztion nd processing of dt. The theory of rough sets cn be considered to be one of the wys of developing the Freges ide of uncertinty. In this pproch uncertinty is defined through the boundry of set. If our knowledge is not enough for strict definition of set then its boundry is not null otherwise the set is stndrd.

Английский

2015-02-02

27.5 KB

0 чел.

TAG CLOUD FOR THE INFORMATION DATA FILTRATION

D.V. Manakov, R.O. Sudarikov

IMM UrB RAS, UrFU, Ekaterinburg

The parallel filtration of the data together with parallel rendering is widely applied to reduce volume of the visualized data.

The question of efficiency is the key question for parallel computing. The essence of the matter is when you should use parallel rendering or filtration of data for the best results in case of visualization large amounts of data. Raster data is generated as the result of rendering, thus reducing the available methods of interaction with visualized objects. Decreasing the amount of data helps to speed up the interaction with computational model.

A quick selection of information of interest is the key feature of the filtration. One can use different methods to solve this problem, for example, data restructuring with k-trees and item processing in data flow model. Obviously, it is easy to use hashes in a software implementation.

A context tag cloud, the metaphor of visualization and interaction, is proposed as a solution of that problem in this work. A context tag cloud designed for efficient interpretation of the search results in the Internet. Interaction between selected and filtered data is implemented through the hashes.

The correct solution of the efficiency problem can only be based on a formal model. “It is nearly impossible to show the fundamental laws or variational principles of an object in order to create its model. One of the most useful approaches for such an object is applying the analogs of studied objects.” [1]

Visualization as an object of study is poorly formalized, but one can declare a semi-model or a base for visualization theory for creation of the assessment of efficiency needed. The declaration can be started with the definitions and analogs searching.

Optimal control can be one of such analogs. Thus filtration can be defined as an interactional process, which aim is to show maximum data with minimal costs. In general case, filtration would be a solution for the problem of getting minimum of the cognitive distance, which shows users efforts to convert the amount of actions to input data and its visualization into the operations and objects of application area. MapReduce algorithm is used for the search. If type of mapping is chosen in the same way, then the cognitive distance will possibly decrease.

The application of topological analysis to visualization is demonstrated in the work [Choudhury], where the trace of program is mapped with a cloud of points.

The application of the theory of rough sets is considered to solve the problems of visualization and processing of data. The theory of rough sets can be considered to be one of the ways of developing the Freges idea of uncertainty. In this approach uncertainty is defined through the boundary of a set. If our knowledge is not enough for a strict definition of a set, then its boundary is not null, otherwise the set is standard. A boundary as a topological concept is the difference between the closure and the interior of a set. Granularity is also a key concept of this theory. A set is defined as a aggregation of elementary items, for example, in the visualization they are the graphic primitives.

Declaration of membership function for a set can be considered as a method of ordering of that set. The parallel with hashes seems to be obvious. The discussion of a rough sets membership function as a metric of assessment of efficiency seems to be of a great interest. A rough set X can be considered to have a membership function X(x)[0,1] for each of its items. This function defines the probability of membership instead of classical theory where elements is strictly a member of a set or not. Tag cloud can be considered as a set of words, ordered with the membership function, with computed frequency features.

The problem of the algorithm definition is equivalent to the problem of computable function. The theorem of equivalence computable function and membership function can be formulated.

Metaphor of context tag cloud expands the standard tag cloud by defining several membership functions for the search results set. Rough set X, which contains the search results, can be represented as an array of hashes, where the central element of that array is the search string. Two membership functions are declared: the width of a context, which shows the amount of words displayed on the right and on the left from search string, and frequency of occurrence of the word, depends on the document type(doc, pdf or html) and shown with RGB-color gradation. The search results also includes title and hyperlink of the document, which can be marked out through the interaction with particular word (a key).

The program is implemented with Ruby and its architecture corresponds to the Cloud Computing. The program intermediary (proxy) get search results through Google API and restructures data to hashes. The interaction on the client is based on that hashes, thus the amount of re-computing actions is reduced to minimum. The next step of this work is the integration of the implemented system with Hadoop distributed computing system.

Literature

1. Samarskiy A.A., Mihajlov A.P.mathematical modeling: Ideas. Methods. Examples. – М. PhysMathGiz. 1993 (in Rusian)/

2. Choudhury, A.N.M.I.; Bei Wang; Rosen, P.; Pascucci, V. Topological analysis and visualization of cyclical behavior in memory reference traces // Pacific Visualization Symposium (PacificVis), 2012 IEEE,vol., no., pp.9-16, Feb. 28 2012-March 2 2012.