Abstract: We have developed an application called Social Media Illustrator that allows for finding media items on multiple social networks, clustering them by visual similarity, ranking them by different criteria, and finally arranging them in media galleries that were evaluated to be perceived as aesthetically pleasing. In this paper, we focus on the ranking aspect and show how, for a given set of media items, the most adequate ranking criterion combination can be found by interactively applying different criteria and seeing their effect on-the-fly. This leads us to an empirically optimized media item ranking formula, which takes social network interactions into account. While the ranking formula is not universally applicable, it can serve as a good starting point for an individually adapted formula, all within the context of Social Media Illustrator. A demo of the application is available publicly online at the URL http://social-media-illustrator.herokuapp.com/. [PDF]
Abstract: Functionality makes APIs unique and therefore helps humans and machines decide what service they need. However, if two APIs offer similar functionality, quality attributes such as performance and ease-of-use might become a decisive factor. Several of these quality attributes are inherently subjective, and hence exist within a social context. These social parameters should be taken into account when creating personalized mashups and service compositions. The Web API description format RESTdesc already captures functionality in an elegant way, so in this paper we will demonstrate how it can be extended to include social parameters. We indicate the role these parameters can play in generating functional compositions that fulfill specified quality attributes. Finally, we show how descriptions can be personalized by exploring a user's social graph. This ultimately leads to a more focused, on-demand use of Web APIs, driven by functionality and social parameters. [PDF]
Abstract: Twitter Trends allows for a global or local view on "what's happening in my world right now" from a tweet producers' point of view. In this paper, we explore a way to complete Twitter Trends by having a closer look at the other side: the tweet consumers' point of view. While Twitter Trends works by analyzing the frequency of terms and their velocity of appearance in tweets being written, our approach is based on the popularity of extracted named entities in tweets being read. [PDF]
Abstract: The social networking website Facebook offers to its users a feature called "status updates" (or just "status"), which allows users to create microposts directed to all their contacts, or a subset thereof. Readers can respond to microposts, or in addition to that also click a "Like" button to show their appreciation for a certain micropost. Adding semantic meaning in the sense of unambiguous intended ideas to such microposts can, for example, be achieved via Natural Language Processing (NLP). Therefore, we have implemented a RESTful mash-up NLP API, which is based on a combination of several third party NLP APIs in order to retrieve more accurate results in the sense of emergence. In consequence, our API uses third party APIs opaquely in the background in order to deliver its output. In this paper, we describe how one can keep track of provenance, and credit back the contributions of each single API to the combined result of all APIs. In addition to that, we show how the existence of provenance metadata can help understand the way a combined result is formed, and optimize the result combination process. Therefore, we use the HTTP Vocabulary in RDF and the Provenance Vocabulary. The main contribution of our work is a description of how provenance metadata can be automatically added to the output of mash-up APIs like the one presented here. [PDF]
Abstract: Social networking sites such as Facebook or Twitter let their users create microposts directed to all, or a subset of their contacts. Users can respond to microposts, or in addition to that, also click a Like or ReTweet button to show their appreciation for a certain micropost. Adding semantic meaning in the sense of unambiguous intended ideas to such microposts can, for example, be achieved via Natural Language Processing (NLP) and named entity disambiguation. Therefore, we have implemented a mash-up NLP API, which is based on a combination of several third party NLP APIs in order to retrieve more accurate results in the sense of emergence. In consequence, our API uses third party APIs opaquely in the background to deliver its output. In this paper, we describe how one can keep track of data provenance and credit back the contributions of each single API to the joint result of the combined mash-up API. Therefore, we use the HTTP Vocabulary in RDF and the Provenance Vocabulary. In addition to that, we show how provenance metadata can help understand the way a combined result is formed, and optimize the result formation process. [PDF]
Abstract: In May 2012, the Web search engine Google has introduced the so-called Knowledge Graph, a graph that understands real-world entities and their relationships to one another. Entities covered by the Knowledge Graph include landmarks, celebrities, cities, sports teams, buildings, movies, celestial objects, works of art, and more. The graph enhances Google search in three main ways: by disambiguation of search queries, by search log-based summarization of key facts, and by explorative search suggestions. With this paper, we suggest a fourth way of enhancing Web search: through the addition of realtime coverage of what people say about real-world entities on social networks. We report on a browser extension that seamlessly adds relevant microposts from the social networking sites Google+, Facebook, and Twitter in form of a panel to Knowledge Graph entities. In a true Linked Data fashion, we interlink detected concepts in microposts with Freebase entities, and evaluate our approach for both relevancy and usefulness. The extension is freely available, we invite the reader to reconstruct the examples of this paper to see how realtime opinions may have changed since time of writing. [PDF]
Abstract: A scientific conference is a type of event where attendees have a tremendous activity on social media platforms. Participants tweet or post longer status messages, engage in discussions with comments, share slides and other media captured during the conference. This information can be used to generate informative reports of what is happening, where (which specific room) and when (which time slot), and who are the active participants. However, this information is locked in different data silos and platforms forcing the user to monitor many different channels at the same time to fully benefit from the event. In this paper, we propose a framework named Confomaton that aggregates in real-time social media shared by conference attendees and aligns it with event descriptions. Developed with Semantic Web technologies, this framework enables to relive past events and to follow live conferences. A demonstrator is available at http://eventmedia.eurecom.fr/confomaton. [PDF]
Abstract: Many have left their footprints on the field of semantic RESTful Web service description. Albeit some of the propositions are even W3C Recommendations, none of the proposed standards could gain significant adoption with Web service providers. Some approaches were supposedly too complex and verbose, others were considered not RESTful, and some failed to reach a significant majority of API providers for a combination of the reasons above. While we neither have the silver bullet for universal Web service description, with this paper, we want to suggest a lightweight approach called RESTdesc. It expresses the semantics of Web services by pre- and postconditions in simple N3 rules, and integrates existing standards and conventions such as Link headers, HTTP OPTIONS, and URI templates for discovery and interaction. This approach keeps the complexity to a minimum, yet still enables service descriptions with full semantic expressiveness. A sample implementation on the topic of multimedia Web services verifies the effectiveness of our approach. [PDF]
Abstract: We have created and evaluated an algorithm capable of deduplicating and clustering exact- and near-duplicate media items that get published and shared on multiple social networks in the context of events. This algorithm works in an entirely ad-hoc manner, without any pre-calculation. When people attend events, they more and more share event-related media items publicly on social networks to let their social network contacts relive and witness the attended events. In the past, we have worked on methods to accumulate such public user-generated multimedia content in order to summarize events visually, for example, in the form of media galleries or slideshows. In this paper, first, we introduce social-network-specific reasons and challenges that cause near-duplicate media items. Second, we detail an algorithm for the task of deduplicating and clustering exactand near-duplicate media items stemming from multiple social networks. Finally, we evaluate the algorithm's strengths and weaknesses and show ways to address the weaknesses efficiently. [PDF]
Abstract: A scientific conference is a type of event for which the structured program is generally known in advance. The Semantic Web community has setup a so-called Semantic Web dog food server that exposes structured data about the detailed program of more and more conferences and their sub-events (e.g. sessions). Conferences are also events that trigger a tremendous activity on social media. Participants tweet or post longer status messages, engage in discussion with comments, share slides and other media captured during the conference. This information is spread over multiple platforms forcing the user to monitor many different channels at the same time to fully benefit of the event. In this paper, we present Confomaton, a semantic web application that aggregates and reconciles information such as tweets, slides, photos and videos shared on social media that could potentially be attached to a scientific conference. [PDF]
Abstract: Multimodal interaction provides the user with multiple modes of interacting with a system, such as gestures, speech, text, video, audio, etc. A multimodal system allows for several distinct means for input and output of data. In this paper, we present our work in the context of the I-SEARCH project, which aims at enabling context-aware querying of a multimodal search framework including real-world data such as user location or temperature. We introduce the concepts of MuSeBag for multimodal query interfaces, UIIFace for multimodal interaction handling, and CoFind for collaborative search as the core components behind the I-SEARCH multimodal user interface, which we evaluate via a user study. [PDF]
Abstract: Considerable efforts have been put into making video content on the Web more accessible, searchable, and navigable by research on both textual and visual analysis of the actual video content and the accompanying metadata. Nevertheless, most of the time, videos are opaque objects in websites. With Web browsers gaining more support for the HTML5 <video> element, videos are becoming first class citizens on the Web. In this paper we show how events can be detected on-the-fly through crowdsourcing (i) textual, (ii) visual, and (iii) behavioral analysis in YouTube videos, at scale. The main contribution of this paper is a generic crowdsourcing framework for automatic and scalable semantic annotations of HTML5 videos. Eventually, we discuss our preliminary results using traditional server-based approaches to video event detection as a baseline. [PDF]
Abstract: Mobile devices like smartphones together with social networks enable people to generate, share, and consume enormous amounts of media content. Common search operations, for example searching for a music clip based on artist name and song title on video platforms such as YouTube, can be achieved both based on potentially shallow humangenerated metadata, or based on more profound content analysis, driven by Optical Character Recognition (OCR) or Automatic Speech Recognition (ASR). However, more advanced use cases, such as summaries or compilations of several pieces of media content covering a certain event, are hard, if not impossible to fulfill at large scale. One example of such event can be a keynote speech held at a conference, where, given a stable network connection, media content is published on social networks while the event is still going on. In our thesis, we develop a framework for media content processing, leveraging social networks, utilizing the Web of Data and fine-grained media content addressing schemes like Media Fragments URIs to provide a scalable and sophisticated solution to realize the above use cases: media content summaries and compilations. We evaluate our approach on the entity level against social media platform APIs in conjunction with Linked (Open) Data sources, comparing the current manual approaches against our semi-automated approach. Our proposed framework can be used as an extension for existing video platforms. [PDF]
Abstract: In this paper, we present and define aesthetic principles for the automatic generation of media galleries based on media items retrieved from social networks that—after a ranking and pruning step—can serve to authentically summarize events and their atmosphere from a visual and an audial standpoint. [PDF]
Abstract: Many have left their footprints on the field of semantic RESTful Web service description. Albeit some of the propositions are even W3C Recommendations, none of the proposed standards could gain significant adoption with Web service providers. Some approaches were supposedly too complex and verbose, others were considered not RESTful, and some failed to reach a significant majority of API providers for a combination of the reasons above. While we neither have the silver bullet for universal Web service description, with this paper, we want to suggest a lightweight approach called RESTdesc. It expresses the semantics of Web services by pre- and postconditions in simple N3 rules, and integrates existing standards and conventions such as Link headers, HTTP OPTIONS, and URI templates for discovery and interaction. This approach keeps the complexity to a minimum, yet still enables service descriptions with full semantic expressiveness. A sample implementation on the topic of multimedia Web services verifies the effectiveness of our approach. [PDF]
Abstract: Hypermedia links and controls drive the Web by transforming information into affordances through which users can choose actions. However, publishers of information cannot predict all actions their users might want to perform and therefore, hypermedia can only serve as the engine of application state to the extent the user's intentions align with those envisioned by the publisher. In this paper, we introduce distributed affordance, a concept and architecture that extends application state to the entire Web. It combines information inside the representation with knowledge of action providers to generate affordance from the user's perspective. Unlike similar approaches such as Web Intents, distributed affordance scales both in the number of actions and the number of action providers, because it is resource-oriented instead of action-oriented. A proof-of-concept shows that distributed affordance is a feasible strategy on today's Web. [PDF]
Abstract: Hyperlinks and forms let humans navigate with ease through websites they have never seen before. In contrast, automated agents can only perform preprogrammed actions on Web services, reducing their generality and restricting their usefulness to a specialized domain. Many of the employed services call themselves RESTful, although they neglect the hypermedia constraint as defined by Roy T. Fielding, stating that the application state should be driven by hypertext. This lack of link usage on the Web of services severely limits agents in what they can do, while connectedness forms a primary feature of the human Web. An urgent need for more intelligent agents becomes apparent, and in this paper, we demonstrate how the conjunction of functional service descriptions and hypermedia links leads to advanced, interactive agent behavior. We propose a new mode for our previously introduced semantic service description format RESTdesc, providing the mechanisms for agents to consume Web services based on links, similar to human browsing strategies. We illustrate the potential of these descriptions by a use case that shows the enhanced capabilities they offer to automated agents, and explain how this is vital for the future Web. [PDF]
Abstract: Video shot detection is the processor-intensive task of splitting a video into continuous shots, with hard or soft cuts as the boundaries. In this paper, we present a client-side on-the-fly approach to this challenge based on modern HTML5enabled Web APIs. We show how video shot detection can be seamlessly embedded into video platforms like YouTube using browser extensions. Once a video has been split into shots, shot-based video navigation gets enabled and more fine-grained playing statistics can be created. a tremendous gift", a caption from Randy Pausch's famous last lecture Achieving Your Childhood Dreams 1 , reveals the video of his lecture. If no closed captions are available, nor can be automatically generated, keyword-based search is still available over tags, video descriptions, and titles. Presented with a potentially long list of results, preview thumbnails based on video still frames help users decide on the most promising result. YouTube uses an unpublished computer vision-based algorithm for the generation of smart thumbnails on YouTube and lets video owners choose one out of three automatically suggested thumbnails. In this paper, we introduce on-the-fly shot detection for YouTube videos as a third means besides keyword-based search and thumbnail preview for deciding on a video from the haystack. As a user starts watching a video, we detect shots in the video by visually analyzing its content. We do this with the help of a browser extension, i.e., the whole process runs dynamically on the client-side, using modern HTML5 JavaScript APIs of the <video> and <canvas> elements [8]. As soon as the shots have been detected, we offer the user the choice to quickly jump into a specific shot by clicking on a representative still frame. Figure 1 shows the seamless integration of the detected shots into the YouTube website enabled by the browser extension. The main contributions of this paper are the browser extension itself and improved video navigability by shot navigation. A screencast and demo of our approach are available. [PDF]
Abstract: Figure 1: xkcd #37 I do this constantly (R. Munroe. Hyphen. http://xkcd.com/37/.) [PDF]
Abstract: One of the main REST design principles is the focus on media types as the core of contracts on the Web. However, not always is the service designer free to select the most appropriate media type for a task, sometimes a generic media type like application/rdf+xml (or in the worst case a binary format like image/png) with no defined or possible hypermedia controls at all has to be chosen. With this position paper we present a way how the hypermedia constraint of REST can still be fulfilled using a combination of Link headers, the OPTIONS method, and the HTTP Vocabulary in RDF. [PDF]
Abstract: Web APIs are becoming an increasingly popular alternative to the more heavy-weight Web services. Recently, they also have been used in the context of sensor networks. However, making different Web apis (and thus sensors) cooperate often requires a significant amount of manual configuration. Ideally, we want Web apis to behave like Linked Data, where data from different sources can be combined in a straightforward way. Therefore, in this paper, we show how Web apis, semantically described by the light-weight format restdesc, can be composed automatically based on their functionality. Moreover, the composition process does not require specific tools, as compositions are created by generic Semantic Web reasoners as part of a proof. We then indicate how the composition in this proof can be executed. We describe our architecture and implementation, and validate that proof-based composition is a feasible strategy on a Web scale. Our measurements indicate that current reasoners can integrate compositions of more than 200 Web apis in under one second. This makes proof-based composition a practical choice for today's Web APIs. [PDF]
Abstract: The early visions for the Semantic Web, from the famous 2001 Scientific American article by Berners-Lee et al., feature intelligent agents that can autonomously perform tasks like discovering information, scheduling events, finding execution plans for complex operations, and in general, use reasoning techniques to come up with sense-making and traceable decisions. While today—more than ten years later—the building blocks (1) resource-oriented rest infrastructure, (2) Web apis, and (3) Linked Data are in place, the envisioned intelligent agents have not landed yet. In this paper, we explain why capturing functionality is the connection between those three building blocks, and introduce the functional api description format restdesc that creates this bridge between hypermedia apis and the Semantic Web. Rather than adding yet another component to the Semantic Web stack, restdesc offers instead concise descriptions that reuse existing vocabularies to guide hypermedia-driven agents. Its versatile capabilities are illustrated by a real-life agent use case for Web browsers wherein we demonstrate that restdesc functional descriptions are capable of fulfilling the promise of autonomous agents on the Web. [PDF]
Abstract: The Future Internet (FI) is expected to be a communication and delivery ecosystem, which will interface, interconnect, integrate and expand today's Internet, public and private intranets and communication networks of any type and scale, in order to provide efficiently, transparently and securely highly demanding services to humans and systems. This complex networking environment may be considered from various interrelated perspectives: the networks & infrastructure viewpoint, the services viewpoint, the media & information viewpoint. This document has been produces by a discussion forum of experts in the area of media and networks, named: Future Media Internet - Think Tank (FMIA-TT). FMIA-TT aims to create a reference model of a "Future Media Internet Architecture", covering delivery, in the network adaptation/enrichment and consumption of media over the Future Internet ecosystem. This white paper concludes the first phase of the working group by proposing a FMIA reference model. [PDF]
Abstract: In this position paper, we first discuss how modern search engines, such as Google, make use of Linked Data spread in Web pages for displaying Rich Snippets. We present an example of the technology and we analyze its current uptake. We then sketch some ideas on how Rich Snippets could be extended in the future, in particular for multimedia documents. We outline bottlenecks in the current Internet architecture that require fixing in order to enable our vision to work at Web scale. [PDF]
Abstract: In this position paper, we describe how analogue recording artifacts stemming from digitalized vhs tapes such as grainy noises, ghosting, or synchronization issues can be identified at Web-scale via crowdsourcing in order to identify adult content digitalized by amateurs. [PDF]
Abstract: Without any exaggeration, the Linked Data movement has significantly changed the Semantic Web world. Meanwhile, intelligent services—the other pillar of the initial Semantic Web vision—have not undergone a similar revolution. Although several important steps were taken and significant milestones were reached, we are far from our envisioned destination. What makes the Web so difficult for machines? So far, we have only seen successful clients for specific purposes, mostly tailored to the API of a certain site or service. This contrasts with human behavior: we surf the Web for several different purposes on a variety of websites. The discrepancy originates in two related aspects: semantics and hyperlinks. The Resource Description Framework (RDF) and the Linked Data effort help to overcome the problem of data semantics by providing machine-interpretable data with linked concepts. On the other hand, services tend not to provide semantics or links, neither to internal pages nor to external sites. As a consequence, in order to consume a Web service, automated agents currently must have an implementation of the API and know how to construct URIs. In contrast, humans do not need a manual to browse a website, because they can interpret text and follow links. Summarizing, if we want automated agents to consume the Web, services should provide: functional semantics — what can an agent do with the service? meaningful hyperlinks — in what directions can an agent proceed? The above two aspects form a necessary condition to integrate services into the semantic and interlinked world of Linked Data. As we will argue in Section 2, today's service descriptions are unable to fulfill this premise. Therefore, we discuss the lightweight semantic service description method RESTdesc [4] in Section 3, enabling a linked Web of Services through the linked Web of Data. We continue by indicating how RESTdesc could be an ideal counterpart for the integration of data and services. [PDF]
Abstract: In this paper, a novel framework for description of rich media content is introduced. Firstly, the concept of `content objects' is provided. Content objects are rich media presentations, enclosing different types of media, along with real-world information and user-related information. These highly complex presentations require a suitable description scheme in order to be searched and retrieved by end users. Therefore, a novel rich unified content description is analysed, which provides a uniform descriptor for all types of content objects irrespective of the underlying media and accompanying information. [PDF]
Abstract: In this paper, we report on work around the I-SEARCH EU (FP7 ICT STREP) project whose objective is the development of a multimodal search engine. We present the project's objectives, and detail the achieved results, amongst which a Rich Unified Content Description format. [PDF]
Abstract: In this article, a unified framework for multimodal search and retrieval is introduced. The framework is an outcome of the research that took place within the I-SEARCH European Project. The proposed system covers all aspects of a search and retrieval process, namely low-level descriptor extraction, indexing, query formulation, retrieval and visualisation of the search results. All I-SEARCH components advance the state of the art in the corresponding scientific fields. The I-SEARCH multimodal search engine is dynamically adapted to end-user's devices, which can vary from a simple mobile phone to a high-performance PC. [PDF]
Abstract: In an often retweeted Twitter post, entrepreneur and software architect Inge Henriksen described the relation of Web 1.0 to Web 3.0 as: "Web 1.0 connected humans with machines. Web 2.0 connected humans with humans. Web 3.0 connects machines with machines." On the one hand, an incredible amount of valuable data is described by billions of triples, machine-accessible and interconnected thanks to the promises of Linked Data. On the other hand, rest is a scalable, resource-oriented architectural style that, like the Linked Data vision, recognizes the importance of links between resources. Hypermedia apis are resources, too— albeit dynamic ones—and unfortunately, neither Linked Data principles, nor the rest-implied self-descriptiveness of hypermedia apis sufficiently describe them to allow for long-envisioned realizations like automatic service discovery and composition. We argue that describing inter-resource links—similarly to what the Linked Data movement has done for data— is the key to machine-driven consumption of apis. In this paper, we explain how the description format restdesc captures the functionality of apis by explaining the effect of dynamic interactions, effectively complementing the Linked Data vision. [PDF]
Abstract: Social platforms constantly record streams of heterogeneous data about human's activities, feelings, emotions and conversations opening a window to the world in real-time. Trends can be computed but making sense out of them is an extremely challenging task due to the heterogeneity of the data and its dynamics making often short-lived phenomena. We develop a framework which collects microposts shared on social platforms that contain media items as a result of a query, for example a trending event. It automatically creates different visual storyboards that reflect what users have shared about this particular event. More precisely it leverages on: (i) visual features from media items for neardeduplication, and (ii) textual features from status updates to interpret, cluster, and visualize media items. A screencast showing an example of these functionalities is published at: http://youtu.be/8iRiwz7cDYY while the prototype is publicly available at http://mediafinder.eurecom.fr. [PDF]
Abstract: This document describes the Media Fragments 1.0 (basic) specification. It specifies the syntax for constructing media fragment URIs and explains how to handle them when used over the HTTP protocol. The syntax is based on the specification of particular name-value pairs that can be used in URI fragment and URI query requests to restrict a media resource to a certain fragment. The Media Fragment WG has no authority to update registries of all targeted media types. We recommend media type owners to harmonize their existing schemes with the ones proposed in this document and update or add the fragment semantics specification to their media type registration. [PDF]
Abstract: We have developed an application called Wikipedia Live Monitor that monitors article edits on different language versions of Wikipedia—as they happen in realtime. Wikipedia articles in different languages are highly interlinked. For example, the English article "en:2013_Russian_meteor_event" on the topic of the February 15 meteoroid that exploded over the region of Chelyabinsk Oblast, Russia, is interlinked with "ru:Падение_метеорита_на_Урале_в_2013_году", the Russian article on the same topic. As we monitor multiple language versions of Wikipedia in parallel, we can exploit this fact to detect concurrent edit spikes of Wikipedia articles covering the same topics, both in only one, and in different languages. We treat such concurrent edit spikes as signals for potential breaking news events, whose plausibility we then check with full-text cross-language searches on multiple social networks. Unlike the reverse approach of monitoring social networks first, and potentially checking plausibility on Wikipedia second, the approach proposed in this paper has the advantage of being less prone to false positive alerts, while being equally sensitive to true-positive events, however, at only a fraction of the processing cost. A live demo of our application is available online at the URL http://wikipedia-irc.herokuapp.com/, the source code is available under the terms of the Apache 2.0 license at https://github.com/tomayac/wikipedia-irc. [PDF]
Abstract: Unstructured metadata fields such as `description' offer tremendous value for users to understand cultural heritage objects. However, this type of narrative information is of little direct use within a machine-readable context due to its unstructured nature. This paper explores the possibilities and limitations of Named-Entity Recognition (NER) to mine such unstructured metadata for meaningful concepts. These concepts can be used to leverage otherwise limited searching and browsing operations, but they can also play an important role to foster Digital Humanities research. In order to catalyze experimentation with NER, the paper proposes an evaluation of the performance of three third-party NER APIs through a comprehensive case study, based on the descriptive fields of the Smithsonian Cooper-Hewitt National Design Museum in New York. A manual analysis is performed of the precision, recall, and F-score of the concepts identified by the third party NER APIs. Based on the outcomes of the analysis, the conclusions present the added value of NER services, but also point out to the dangers of uncritically using NER, and by extension Linked Data principles, within the Digital Humanities. All metadata and tools used within the paper are freely available, making it possible for researchers and practitioners to repeat the methodology. By doing so, the paper offers a significant contribution towards understanding the value of NER for the Digital Humanities. [PDF]
Abstract: In this paper, we report on the task of near-duplicate photo detection in the context of events that get shared on multiple social networks. When people attend events, they more and more share event-related photos publicly on social networks to let their social network contacts relive and witness the attended events. In the past, we have worked on methods to accumulate such public user-generated multimedia content so that it can be used to summarize events visually in the form of media galleries or slideshows. Therefore, methods for the deduplication of near-duplicate photos of the same event are required in order to ensure the diversity of the generated media galleries or slideshows. First, we introduce the social-network-specific reasons and challenges that cause near-duplicate photos. Second, we introduce an algorithm for the task of deduplicating near-duplicate photos stemming from social networks. Finally, we evaluate the algorithm's results and shortcomings. [PDF]
Abstract: In this paper, we report on work around the I-SEARCH EU (FP7 ICT STREP) project whose objective is the development of a multimodal search engine targeted at mobile and desktop devices. Each of these device classes has its specific hardware capabilities and set of supported features. In order to provide a common multimodal search experience across device classes, one size does not fit all. We highlight ways to achieve the same functionality agnostic of the device class being used for the search, and present concrete use cases. [PDF]
Abstract: Many providers offer Web APIs that expose their services to an ever increasing number of mobile and desktop applications. However, all interactions have to be explicitly programmed by humans. Automated composition of those Web APIs could make it considerably easier to integrate different services from different providers. In this paper, we therefore present an automated Web API composition method, based on theoremproving principles. The method works with existing Semantic Web reasoners at a Web-scale performance. This makes proofbased composition a good choice for Web API integration. We envision this method for use in different fields, such as multimedia service and social service composition. [PDF]
Abstract: If we want automated agents to consume the Web, they need to understand what a certain service does and how it relates to other services and data. The shortcoming of existing service description paradigms is their focus on technical aspects instead of the functional aspect—what task does a service perform, and is this a match for my needs? This paper summarizes our recent work on restdesc, a semantic service description approach that centers on functionality. It has a solid foundation in logics, which enables advanced service matching and composition, while providing elegant and concise descriptions, responding to the demands of automated clients on the future Web of Agents. [PDF]
Abstract: In May 2012, the Web search engine Google has introduced the so-called Knowledge Graph, a graph that understands real-world entities and their relationships to one another. It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. Soon after its announcement, people started to ask for a programmatic method to access the data in the Knowledge Graph, however, as of today, Google does not provide one. With SEKI@home, which stands for Search for Embedded Knowledge Items, we propose a browser extension-based approach to crowdsource the task of populating a data store to build an Open Knowledge Graph. As people with the extension installed search on Google.com, the extension sends extracted anonymous Knowledge Graph facts from Search Engine Results Pages (SERPs) to a centralized, publicly accessible triple store, and thus over time creates a SPARQL-queryable Open Knowledge Graph. We have implemented and made available a prototype browser extension tailored to the Google Knowledge Graph, however, note that the concept of SEKI@home is generalizable for other knowledge bases. [PDF]
Abstract: With SEKI@home, which stands for Search for Embedded Knowledge Items, we propose a generic, browser extension-based approach for crowdsourcing the task of knowledge extraction from arbitrary Web pages. As people with the extension installed browse a targeted Web page, the extension sends extracted knowledge items according to the customizable extraction rules to a centralized, optionally publicly accessible triple store. Thereby, simply by browsing the Web as usual, participants in the knowledge extraction task can help make previously locked-in knowledge openly accessible, e.g., via the standard SPARQL protocol. We have implemented and made available a prototype browser extension, which, after customization and adaptation, can serve as the basis for future knowledge extraction tasks. [PDF]
Abstract: The rest architectural style assumes that client and server form a contract with content negotiation, not only on the data format but implicitly also on the semantics of the communicated data, i.e., an agreement on how the data have to be interpreted [55]. In different application scenarios such an agreement requires vendor-specific content types for the individual services to convey the meaning of the communicated data. The idea behind vendor-specific content types is that service providers can reuse content types and service consumers can make use of specific processors for the individual content types. In practice however, we see that many restful apis on the Web simply make use of standard non-specific content types, e.g., text/xml or application/json [33]. Since the agreement on the semantics is only implicit, programmers developing client applications have to manually gain a deep understanding of several APIs from multiple providers. Common Web apis are typically either exclusively described textually5 , or far less frequently—and usually based on third-party contributions—a machinereadable wadl [18] api description exists6 . However, neither human-focused textual, nor machine-focused wadl api descriptions carry any machine-processable semantics, i.e., do not describe what a certain api does. Instead, they limit themselves to a description of machine-readable in- and output parameters in the case of wadl, or a non-machine-readable prose- and/or example-driven description of the api in the case of textual descriptions. While this may suffice the requirements of developers in practice, the lack of semantic descriptions hinders many more advanced use cases such as api discovery or api composition. Machine-interpretable descriptions can serve several purposes when developing client applications: One can generate textual documentation from the standardised machineinterpretable descriptions, which leads to a more coherent presentation of the APIs, similar to what JavaDoc has achieved in the Java world (see also Knuth's idea of literal programming [22]). [PDF]
Abstract: SemWebVid is an online Ajax application that allows for the automatic generation of Resource Description Framework (RDF) video descriptions. These descriptions are based on two pillars: first, on a combination of user-generated metadata such as title, summary, and tags; and second, on closed captions which can be user-generated, or be auto-generated via speech recognition. The plaintext contents of both pillars are being analyzed using multiple Natural Language Processing (NLP) Web services in parallel whose results are then merged and where possible matched back to concepts in the sense of Linking Open Data (LOD). The final result is a deep-linkable RDF description of the video, and a "scroll-along" view of the video as an example of video visualization formats. [PDF]
Abstract: SemWebVid is an online Ajax application that allows for the automatic generation of Resource Description Framework (RDF) video descriptions. These descriptions are based on two pillars: first, on a combination of user-generated metadata such as title, summary, and tags; and second, on closed captions which can be user-generated, or be auto-generated via speech recognition. The plaintext contents of both pillars are being analyzed using multiple Natural Language Processing (NLP) Web services in parallel whose results are then merged and where possible matched back to concepts in the sense of Linking Open Data (LOD). The final result is a deep-linkable RDF description of the video, and a "scroll-along" view of the video as an example of video visualization formats. [PDF]
Abstract: We have developed a tile-wise histogram-based media item deduplication algorithm with additional high-level semantic matching criteria that is tailored to photos and videos gathered from multiple social networks. In this paper, we investigate whether the Media Fragments URI addressing scheme together with a natural language generation framework realized through a texttospeech system provides a feasible and practicable way to visually and audially describe the differences between media items of type photo and/or video, so that human-friendly debugging of the deduplication algorithm is made possible. A short screencast illustrating the approach is available online at http://youtu.be/DWqwEnhqTSc. [PDF]
Abstract: "Web 1.0 connected humans with machines. Web 2.0 connected humans with humans. Web 3.0 connects machines with machines." 1 On the one hand, an incredible amount of valuable data is described by billions of triples, machine-accessible and interconnected thanks to the promises of Linked Data. On the other hand, rest is a scalable, resourceoriented architectural style that, like the Linked Data vision, recognizes the importance of links between resources. Hypermedia apis are resources, too—albeit dynamic ones—and unfortunately, neither Linked Data principles, nor the rest-implied self-descriptiveness of hypermedia apis sufficiently describe them to allow for long-envisioned realizations like automatic service discovery and composition. We argue that describing inter-resource links—similarly to what the Linked Data movement has done for data—is the key to machine-driven consumption of apis. In this paper, we explain how the description format restdesc captures the functionality of apis by explaining the effect of dynamic interactions, effectively complementing the Linked Data vision. [PDF]
Abstract: We have developed an application for the automatic generation of media galleries that visually and audibly summarize events based on media items like videos and photos from multiple social networks. Further, we have evaluated different media gallery styles with online surveys and examined their pros and cons. Besides the survey results, our contribution is also the application itself, where media galleries of different styles can be created on-the-fly. A demo is available at http://social-media-illustrator.herokuapp.com/. [PDF]
Abstract: Social networks play an increasingly important role for sharing media items related to daily life moments or for the live coverage of events. One of the problems is that media are spread over multiple social networks. In this paper, we propose a social network-agnostic approach for collecting recent images and videos which can be potentially attached to an event. These media items can be used for the automatic generation of visual summaries in the form of media galleries. Our approach includes the alignment of the varying search result formats of different social networks, while putting media items in correspondence with the status updates and stories they are related to. More precisely we leverage on: (i) visual features from media items, (ii) textual features from status updates, and (iii) social features from social networks to interpret, deduplicate, cluster, and visualize media items. We address the technical details of media item extraction and media item processing, discuss criteria for media item filtering and envision several visualization options for media presentation. Our evaluation is divided into two parts: first we assess the performances of the image process deduplication and then we propose a human evaluation of the summary creation compared with Teleportd and Twitter media galleries. A demo of our approach is publicly available at http://eventmedia.eurecom.fr/media-finder. [PDF]