|1||Please provide a short description of the state-of-the-art and/or current trends in the field? How does the result fit into it?|
|Multimedia content, which is available over the Internet, is increasing at an incredible pace. It is expected that by 2010 it will reach the capacity of 998 exabytes. This is due to the widespread availability of digital recording devices, improved modeling tools, advanced scanning mechanisms as well as display and rendering devices. This growth in popularity of media is not accompanied by similar rapid development of media search technologies. The most popular media services on the Web are typically limited to textual search. However, providing textual keywords as query may not always be the most efficient way to search for specific media. In the ideal case, the search engine should allow the user to enter as query any type of media object and search for similar media over the Internet.
Multimodal search and retrieval has been already addressed by numerous commercial applications. Several “closed” industry standards are available today, mainly for mobile devices like Apple’s iPhone or smart phones based on Google’s Android operating system. Due to the small hard- or software keyboards on mobile devices, alternative input types are desirable. In Android devices, a “search by voice recognition” functionality is available (sound is transformed into a textual query). The same functionality is available in several applications for iPhone, e.g. Bing’s, Yahoo!’s, and Google’s own search applications. Google Goggles goes one step further by allowing for images and GPS location to be search queries. This can be used to search for e.g. text contained in images, landmarks or attractions in images, or product search. The returned search results are of multimodal form (map results, image results, video results, and obviously text results). These tools are addressing a limited number of application areas and in some cases their retrieval accuracy is questionable.
The proposed multimodal search engine is fully in line with the current trends in the field of multimedia retrieval. It is also expected to cover a wider variety of application areas, comparing to the aforementioned commercial products, while it will achieve higher accuracy in multimedia retrieval thanks to the cutting-edge technologies for efficient multimedia similarity matching that will be utilized.
|2||What is the problem/need/knowledge gap that the research result is responding to? How was it addressed before?|
|- A user’s need to be able to combine various query types (e.g. text, images, 3D models, sounds etc.) when searching for the desired content.
The vast majority of search engines (either small or large-scale) are based on search by textual queries. This can be inefficient taking into account the fact that some keywords cannot always represent what the user has in mind. Moreover, “a picture is worth a thousand words”, which means that a query can be better expressed by an image than a set of keywords.
Current progress in research on content-based multimedia search has led to the creation of numerous tools able to search for images, videos, 3D objects and sounds (environmental or music) without using textual queries. In these cases, the queries are multimedia items similar to the items to be retrieved (e.g. a query image is used for retrieval of similar images; a 3D object is used for 3D object retrieval and so on).
Going one step further, the proposed multimodal search engine enables search and retrieval of relevant multimedia using as query different types of media (either separately or simultaneously). As an example, if the user is searching for the 3D object of a car, he/she can enter as query either text, or a 3D object of a similar car, or an image of a similar car, or the sound of the car, or even all these queries simultaneously. This gives users the freedom to choose which type of query is more convenient and retrieve relevant multimedia results.
|3||What is the potential for further research?|
|Since multimodal search has been introduced only recently, it is still an open research field. Further research will focus on the following:
a) Support of a high number of different modalities
b) Improvement of existing multimodal retrieval approaches in order to achieve higher retrieval accuracy. Even if algorithms for content-based search using a single modality can achieve high accuracy, merging of multiple modalities may decrease the performance.
c) Existing methods should be capable of searching and retrieving multimedia even from large-scale datasets.
|4||What is the proposed method of IPR-protection? (patent, license, trademark etc.)|
|The R&D result consists of
- novel algorithms
- the proposed process of multimodal search
- software that realizes the respective functionality
Thus, the proposed methods of IPR-protection are:
- patents for the algorithms (to the extent an algorithm can be patented in Europe),
- a patent for the multimodal process, and
- a license for the software (the exact type of the license – e.g. Freeware, Limited License, Single License, Volume Purchase Agreement etc. – will be decided in cooperation with the interested parties).
|5||What are the steps that need to be taken in order to secure the IPR-protection? What is the cost of IPR-protection?|
|- Apply for 2-3 patents for the respective algorithms at a European level (estimated cost in the order of 5,000€ per patent, www.epo.org).
- Apply for a patent for the multimodal process at a European level (estimated cost in the order of 5,000€).
- Decide which type of software license (e.g. Freeware, Limited License, Single License, Volume Purchase Agreement etc.) will be used (no cost).
|6||What is you overall assessment of the scientific maturity of the research result?|
|Research is in good progress. It is quite mature, while further improvements in the algorithms are expected within a short period of time.|
KEYWORDS QUANTITATIVE ASSESSMENT (0-5).
|Please put X as appropriate.||1||2||3||4||5|