Technical feasibility

SECTION I: Testing

 

1

Has the R & D result been tested?

YES

 

 

1a

In what mode has the result been tested?

                Prototype

                Pilot Application

                Alpha/BETA testing

 

The Web as a graph can be embedded in a low dimensional space where its geometry can be visualized and studied in order to mine interesting patterns such as Web communities. The existing algorithms operate on small-to-medium scale graphs, thus, we propose a close to linear time algorithm called Mani-Web suitable for large-scale graphs. The result is similar to the one produced by the manifold-learning technique Laplacian Eigenmap tested on artificial manifolds and real Web-graphs. Mani-Web can also be used as a general purpose manifold-learning/dimensionality-reduction technique as long as the data can be represented as a graph. For experimentation purposes the research team has constructed low density random graphs with |V | = 2(2:22) . 

 

b.

Please describe and discuss the testing results

The testing had crawled two Web sites, the National Geographic with |V | = 58347 and the Wikipedia on DVD with |V | = 19488 and run Mani-Web for Tolerance = 0.01.

It demonstrates the true strength of Mani-Web (Tolerance = 0.01), close to linear complexity even for very large graphs (4 million nodes). As can be seen, the original Laplacian Eigenmap, managed to run only for at most |V | = 32768 both because of the time and space complexity (Matlab eigs implementation).

 

 

SECTION 2: Current Stage of Development

 

2a

To what extent does the development team have technical resources for supporting the production of a new product? (Researchers, human resources, hardware, etc. )

 

OSWINDS group has carried out work on Web data mining with emphasis on Web 2.0 social networks, such as Social Tagging Systems, studying the :
(i) dynamics driving the generation and popularity of content in social networks, 
(ii) representation and storage of relationships between entities, such as users, resources, and metadata, with appropriate models (such as graphs, communities etc). 
(iii) detection algorithms for uncovering implicit groups of entities (users, tags, or combinations of entities) with similar characteristics or behaviors (e.g. tags describing same concepts, users sharing common interests etc) 
(iv) identification of user behavioral and emotional patterns emerging on a social networking application (in collaboration with colleagues from Dept of Psychology in Crete).
Recent efforts have focused on social applications mining with a focus on evolving information streams, as they emerge from particular microblogging activities.
Microblogging (among social networks) offers an obvious ground for expressing opinions, and views. Twitter has become a majorly popular application and certainly its relationships (followers, referencing, etc) embed human behavioral and emotional actions. The real anytime and anywhere posting of information offers a wide range of opportunities for detecting trends, analyzing opinions and emotions, and finally capturing the so-called “wisdom of the crowds”.

 

2b

What are the technical issues that need to be tackled for full deployment, if needed?

Many real-world domains can be represented as large node-link graphs: backbone Internet routers connect with 70,000 other hosts, mid-sized Web servers handle between 20,000 and 200,000 hyperlinked documents, and dictionaries contain millions of words defined in terms of each other. Computational manipulation of such large graphs is common, but previous tools for graph visualization have been limited to datasets of a few thousand nodes. Visual depictions of graphs and networks are external representations that exploit human visual processing to reduce the cognitive load of many tasks that require understanding of global or local structure. We assert that the two key advantages of computer-based systems for information visualization over traditional paper-based visual exposition are interactivity and scalability. We also argue that designing visualization software by taking the characteristics of a target user’s task domain into account leads to systems that are more effective and scale to larger datasets than previous work.Mani-Web could be exploited by almost every topic relevant to dimensionality reduction, graph embedding, data mining, and visualization. This is true since when- ever Laplacian Eigenmap is suitable, Mani-Web can be applied too with appropriate tuning. For full deployment a visualization tool should be constructed with a tuning tool. The full deployment involves system scalbility. System scalability and dataset size. Previous graph drawing systems, shown in blue, fall far short of many large real-world datasets, shown in green. The three systems in this thesis, shown in red, start to close this gap by aiming at datasets ranging from thousands to over one hundred thousand nodes. Very small graphs can be laid out and drawn by hand, but automatic layout and drawing by a computer program can scale to much larger graphs, and provides the possibility of fluid interaction with resulting drawings. The goal of these automatic graph layout systems is to help humans understand the graph structure, as opposed to some other context such as VLSI layout. Researchers have begun to codify aesthetic criteria of helpful drawings, such as minimizing edge crossings and emphasizing symmetry.

 

 

2c

What additional technical resources are needed for the production of this new product?

 

The technical resources required are mainly Java Programmers for implementing the visualization tool with the following options:

  • rendering at a guaranteed framerate regardless of graph size
  • coloring nodes and links with a fixed color, or by RGB values stored in attributes
  • labelling nodes
  • picking nodes to examine attribute values
  • displaying a subset of nodes or links based on a user-supplied boolean attribute
  • interactive pruning of the graph to temporarily reduce clutter and occlusion
  • zooming in and out
  • more options for coloring objects (such as with a perceptually uniform colorscale)
  • filtering and other interactive processing

 

 

2d

Overall assessment of the current stage of technical development.

Many classes of image data span a low dimensional nonlin- ear space embedded in the natural high dimensional image space. We adopt and generalize a recently proposed dimen- sionality reduction method for computing approximate regu- larized Laplacian eigenmaps on large data sets and examine for the first time its application in a variety of image analy- sis examples. These experiments demonstrate the potential of regularized Laplacian eigenmaps in developing new learning algorithms and improving performance of existing systems.   The suggested  general purpose manifold learning technique, called Mani-Web (http://oswinds.csd.auth.gr/maniweb), a linear- time approximation of the Laplacian Eigenmap. Mani Web has managed to develop the linear time approximation  and benchmark the scalability and the correctness of the algorithm experimentally using artificial mani- folds and real Web graphs and we provide prelim- inary evidence that Mani-Web could be applied in the context of a Content Distribution Network for latency and content management improvement.

 

 

SECTION 3: Deployment

3a

Define the demands for large scale production in terms of

·       Materials

There are no plans for mass production since the access to Mani Web is online via web.

·       technologies, tools, machineries

 

·       Staff effort

 

 

SECTION 4: Overall Assessment

1

What is you overall assessment of the technical feasibility of the research result?

 

The notion that many naturally high dimensional data sets cluster around a low dimensional curved subspace or man- ifold has now been well established in the machine learning community.  Bringing together, and in the context of Web, manifold- learning theory, graph-embedding and data-reduction, we proposed Mani-Web as an efficient algorithm for embedding large scale graphs within a low dimensional coordinate space. Mani-Web produces a map faithful to the Laplacian Eigenmap but of close to linear time complexity with acceptable error.

KEYWORDS QUANTITATIVE ASSESSMENT (0-5).

 

Please put X as appropriate.

1

2

3

4

5

Adequacy of testing activity undertaken so far

 

 

 

 

x

Adequacy and availability of technical resources of the development team

 

 

 

 

x

Current development stage

 

 

 

 

x

Overall technical feasibility

 

 

 

 

x

 

 

Bookmark the permalink. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback: Trackback URL.

Post a Comment

You must be logged in to post a comment.

Request a proposal

Valorisation Plan Authors

Related Documents

There in no related documents

Visit the other applications of the INTERVALUE Platform: R&D Repository | IP Agreements

© 2009-2010 INTERVALUE Project