Pesquisa multimodal de imagens em dispositivos móveis

Carvalho, José Ricardo de Abreu

http://hdl.handle.net/10400.13/3984

Use this identifier to reference this record.

Name:	Description:	Size:	Format:
Dissertacao_MEI_JoseRicardoDeAbreuCarvalho.pdf		7.13 MB	Adobe PDF	Download

Send Feedback

Authors

Carvalho, José Ricardo de Abreu

Advisor(s)

Campos, Pedro Filipe Pereira

Cabral, Diogo Nuno Crespo Ribeiro

Abstract(s)

Apesar das evoluções no campo de Reverse Image Search, com algoritmos cada vez mais robustos e eficazes, continua a haver interesse para que as técnicas de pesquisa possam ser aprimoradas, melhorando a experiência do utilizador na procura das imagens que tem em mente. O objetivo principal deste trabalho foi desenvolver uma aplicação para dispositivos móveis (smartphones) que permitisse ao utilizador encontrar imagens através de inputs multimodais. Assim, esta dissertação, para além de propor pesquisas por diversos modos (palavras-chave, desenho, e imagens da câmara ou existentes no dispositivo), propõe que o utilizador consiga criar uma imagem por si só através de desenho, ou editar/alterar uma imagem existente, tendo feedback no momento aquando de cada alteração/interação. Ao longo da experiência de pesquisa, o utilizador consegue usar as imagens encontradas (que achar relevantes) e ir aprimorando a pesquisa através dessa edição, indo de encontro ao que pensa encontrar. A implementação desta proposta teve como base a Cloud Vision API da Google responsável pela obtenção dos resultados através do input de imagem, a Google Custom Search API para a obtenção de imagens através do input por texto, e a framework ATsketchkit que permitia a criação de desenho, para o sistema iOS da Apple. Foram realizados testes com um conjunto de utilizadores com diversos níveis de experiência em pesquisa de imagens e na habilidade de desenho, permitindo aferir a preferência nos diferentes métodos de input, a satisfação na obtenção dos resultados, bem como da usabilidade do protótipo.

Despite the evolution in the field of reverse image search, with algorithms becoming more robust and effective, there still interest for improving search techniques, improving the user experience when searching for the images the user has in mind. The main goal of this work was to develop an application for mobile devices (smartphones) that would allow the user to find images through multimodal inputs. Thus, this dissertation, in addition to propose the search for images in different ways (keywords, drawing/sketching, and camera or device images), proposes that the user can create an image by himself through drawing, editing / changing an existing image, having feedback at the time of each change / interaction. Throughout the search experience, the user can use the images found (which it finds relevant) and improve the search through its edition, going against what it thinks to find. The implementation of this proposal was based on a Google Cloud Vision API responsible for obtaining the results, and the ATsketchkit framework that allowed the creation of drawings, for Apple's iOS system. Tests were carried out with a set of users with different levels of experience in image research and different drawing ability, allowing to assess preference in different input methods, satisfaction with the images retrieved, as well as the usability of the prototype.

Keywords

Pesquisa multimodal Reverse image search Visão computacional Multimodal search Reverse image search Computer vision Content-based image retrieval Engenharia Informática . Faculdade de Ciências Exatas e da Engenharia

URI

http://hdl.handle.net/10400.13/3984

Collections

Dissertações de Mestrado

Full item page