The Web is an enormous information domain with its physical structure best described in the form of a graph. Distinct pages are represented via nodes and the related hyper-links are represented via graph edges. The graph is readily available to be studied via Web-link-mining with its outcomes expected to improve the overall experience for the users, save resources for the Web site owners, and provide a better understanding of the Web. Given a large scale graph we should be able to mine interesting patterns if they exist and represent them in way that they can be exploited. A (Web) community can be considered as a coherent cluster of pages that has significantly more hyper-links pointing among pages belonging to the community itself than the rest of the graph. This structural attribute emerges from the fact that pages with common subject often reference one another as related sources. A user, that enters a Web community, is “trapped” in the sense that the probability of visiting a page outside the com- munity is lower, because of the fewer outgoing hyper- links.
A manifold is a topological space that only locally exhibits flat conventional geometry. On global scale, it demonstrates profound structural hyper-organization. In the space of a manifold the conventional Euclidean distances are less important as they do not capture the manifold’s geometry. In order to perform a walk on the manifold, a graph that connects nearby points in the space is formed. The graph encodes local relations, building essentially a graph of the data. The graph itself limits the ways in which the data points can be accessed from one another since “communication” is possible only through the available edges.
Applications (existing / potential)
Mani-Web could be exploited by almost every topic relevant to dimensionality reduction, graph embedding, data mining, and visualization. This is true since when- ever Laplacian Eigenmap is suitable, Mani-Web can be applied too with appropriate tuning. On the Web scene, we identify two major target groups that could benefit from Mani-Web: a) the Web site owners and b) the users accessing the Web site, in the following areas: • Web site administration. For instance, a Web site that contains thematic categories and embeds com- munity organization could be more easily navigated by users, since once a user reaches a page of in- terest, relevant pages could be easily accessed(due to the communities increased linkage density). In a community-absent Web site, inferior user experience can be easily handled by the administrators who could use the Mani-Web maps to tune usage mining. • Caching and prefetching. Once a 2D display is pro- duced, the administrator, by using a “lasso” tool, the revealed communities can be selected and become available as outsourcing units for the CDN. The selected communities are also suitable for caching and prefetching since they can predict users nav- igation, due to their dense linkage and the fact that they deal with coherent topics. Therefore, the communities may reduce the latency significantly, if they are placed in a CDN or in a traditional Proxy server. • Information retrieval. The reduced-graph, pro- duced by Mani-Web can be exploited as an index structure of the original full data representation graph. In the context of search engines, an algorithm could use the Mani-Web reduced-graph to speed up the initial filtering stage of a query and at the refinement stage would have to partially retrieve more nodes according to the distributed flow. Another use of Mani-Web could be in interactive recommendation engines since given an initial user choice, the system can quickly examine the position of this choice in the Mani-Web map and then provide a set of relevant choices ranked by the flow.