According to the Apple App Store statistics, we sold our 1000th copy of RAEplus just yesterday. As you know, RAEplus is MadIdeas‘ application to access the Normative Dictionary of the Spanish Language. After a somewhat slow start, the sales trend has been growing steadily allowing us to reach this milestone sooner than we expected.
Since its release, RAEplus has reached the top of the sales chart in the “Reference” section of the Spanish App Store, currently holding the 2nd position. It has also been among the 100 most popular paid apps (all categories) in the Spanish store for several weeks. It has collected a good amount of positive reviews and ratings.
Thanks a lot to all happy RAEplus owners!!
If you still don’t have it, get your copy today!
My joint work with Stefan Siersdorfer “How useful are your comments? – Analyzing and Predicting YouTube Comments and Comment Ratings” has been accepted as a full paper in the next edition of ACM Conference on the World Wide Web (WWW’10).
Find below the abstract of the submission:
An analysis of the social video sharing platform YouTube reveals a high amount of community feedback through comments for published videos as well as through meta ratings for these comments. In this paper, we present an in-depth study of commenting and comment rating behavior on a sample of more than 6 million comments on 67,000 YouTube videos for which we analyzed dependencies between comments, views, comment ratings and topic categories. In addition, we studied the influence of sentiment expressed in comments on the ratings for these comments using the SentiWordNet thesaurus, a lexical WordNet-based resource containing sentiment annotations. Finally, to predict community acceptance for comments not yet rated, we built different classifiers for the estimation of ratings for these comments. The results of our large-scale evaluations are promising and indicate that community feedback on already rated comments can help to filter new unrated comments or suggest particularly useful but still unrated comments.
It’s been a while (actually, quite a long time) since my partners at MADideas and I finished the development of RAEplus, our very first application for the iPhone and iPod Touch platforms.
RAEplus is an enhancement over previous attempts to provide an interface for accessing RAE (the normative dictionary of the Spanish language). The app encloses a comprehensive database of spanish terms, and actively helps users locate the words they want to look up in the dictionary. A dynamic list of suggested terms is updated in real-time to provide completions for the word the user is typing. This approach enables users to save many keystrokes, and therefore to look words far more efficiently.
The app also integrates a thesaurus of the Spanish language. The app synchronizes users’ search in both dictionaries, providing an intuitive and efficient way to change between these two modes.
Finally, the enclosed term database is used to select random words which are presented at the start page of the app. The user can look up any of these words or, alternatively, shake the device to get a new selection of terms. This is a great way to learn new terms and, ultimately, master the Spanish language.
The app is available from the iTunes Store in the following link.
Get it today and enjoy!
These last couple of days I have been experiencing a quite frustrating behavior of what I think is a problem with Apple’s Mail Client.
The symptoms are simple. All email sent from a dotMac account to a gmail account is sent to the spam folder by google’s filters straight away. After some tests and some initial swear words to google’s filters, I’m starting to think that the problem might well be a poor/defective mail software by apple.
Facts:
1) Sending the email to several recipients seems to solve the problem (at least if one of them is CC’ed)
2) Sending the email from the Mobile Me web interface also seems to work alright
3) Aliases feature the exact same behavior
It seems there’s some buggy behavior somewhere in the client or in the mail delivery infrastructure which is causing perfectly valid messages to be considered spam. Maybe some faulty header field, perhaps a DNS problem with the STMP server…
Meanwhile, remember to CC somebody (I normally do myself) to get your messages properly received!!
Update: Things seem to be back to normal again. All my messages being properly received. Still, I’m getting comments of people still finding troubles. A trick that was working fine for me in the meantime was using the @me.com extension to send the emails. Try this and comment back on the results!
After a quite intensive week, WWW has concluded and it’s seems a good time to look back and summarize the main activities and experiences. Though it was my intention to keep a dairy log of the events, the lack of time was much more important than I expected. So, let this late entry replace the originally planned ones
The conference begun with a series of tutorials and workshops on both monday and tuesday. My schedule for tuesday included only attending the content analysis in web 2.0 sites workshop. The morning session was quite interesting, and included a really inspirational talk by Hugo Zaragoza where he described his work on mining web 2.0 available information to produce semantic formatted web content. Due to work obligations I had to miss the afternoon session of the workshop. However, it was a productive afternoon where I was able to discuss potential future works with Stefan Siersdorfer in order to continue with the fruitful collaboration that begun during our postdoc in Sheffield.
Wednesday was quite a long day. The main conference started that day, and the organizers had arranged for the prince of Spain to give a speech that very morning. The presence of media and the special security measures turned the conference venue into a not very scientific show. Fortunately, Tim Berners-lee did also give a speech that morning, and that helped us forget about less interesting political aspects.
On that day I met lots of new fellow researchers. I had already met on previous days with colleagues from Telefonica I+D (Xavier Amatriain and Pere Obrador). I also met additional members of the research lab that telefonica has in Barcelona, including Joachim Neumann, Karen Church and Josep M. Pujol. I had the chance to meet a former intern of the same lab, Meeyoung Cha, now working as a postdoc at Max Plank Institute. She gave a superb presentation on that day and later joined us to experience Madrid’s night life. Some other colleague from MPI also joined us, Fabian, Mauro and Dimitar.
Another rather interesting session that day was Recommender Systems, where I especially liked the “tagommenders” presentation by Shilad Sen, who will later be in charge of chairing the session where I presented my work.
On thursday I arrived in the conference venue just in time for my presentation, which was scheduled 2nd in the first session of the morning, at 12pm. More about it in this entry. Finishing the presentation felt quite good, and I was able to finally relax after some days of stress.
The afternoon sessions were very interesting. Right after lunch I went to the Media Applications session, where Lyndon Kennedy presented a very original work on grouping together videos captured during the same event. I had been working on similar ideas but using video content analysis instead of audio, so I really enjoyed the presentation. For the final session of the day I chose Web Privacy, which turned out to be a very informative session about all the private data that can be inferred from our interaction with the internet. I especially enjoyed Elena Zheleva’s work on the exploitation of public profiles to know about private profiles.
After the sessions, we headed to the golf course nearby for the conference banquet. By that time of the day and the week I was already exhausted, and although dinner was good I couldn’t enjoy myself so much. Chance wanted that I would sit down at the same table where Marc Gauvin was. He talked me about his new project, a digital rights management system that sounded really interesting (and which I had the chance to see demoed the following day).
On friday I arrived to the conference venue just in time to hear Ricardo Baeza’s keynote. It was a really interesting talk, where he reviewed some methods to exploit social data to enhance search. The most interesting aspect of the talk was the kind of social data they use. Instead of mining web 2.0 sites, they have enough information in their query logs to mine user community interactions. About the sessions, I attended Web Monetization before lunch (not really my cup of tea), Query Categorization right after lunch (interesting talks about methods to better understand user queries) and finally went to the developers track to watch ipos-ds presentation and demo (good presentation of a really interesting service to allow management and licensing of content).
Overall, a top-quality conference that gave me the chance of meeting many new researchers and research works. Inspirational and educational at the same time. Next appointment: SIGIR!
Yesterday it was my turn for presenting our full-paper accepted into this year edition of ACM World Wide Web conference (WWW’09). The expectation for the “Photos and Web 2.0″ session was far beyond the calculations of the organizers, and the room allocated for the talks was simply too small for all the people that showed up. Many had to see the first presentation of the session looking through the door while standing in the corridor. It was during the second presentation (my one) that they decided to remove the panels at the rear to merge with an empty room just behind. Though it was absolutely necessary I simply do not understand why they decided to do it right in the middle of my talk. I loss my concentration completely and it was difficult to re-start the talk again.
Despite the difficulties, the presentation went alright and it lead to a positive reaction from the audience which asked a good amount of interesting questions. The slides are available from the www2009 epapers website.
See you in Boston next July! Sigir awaits!
WWW started today with the round of tutorials and workshops that commonly precede such large conferences. I met again with some of the people I’ve been running into in the last few conferences I’ve attended and used the chance to receive an update from them. I also met with my former college Stefan Siersdorfer, co-author of the submission I got accepted into this conference.
In the morning session I attended the social computing tutorial conducted by Irwin King and which turned out to be rather interesting. This comprehensive review of the many different research areas around web communities had it’s highlight in the human computation section, when he spoke of the different mechanical turk-based applications for the collection of difficult to obtain knowledge directly from humans. These applications are normally implemented as games and are massively used. There was also time for a good introduction to query suggestion techniques, as well as some remarks on privacy issues on the web. He finished the presentation with some thoughts about education and the web. Such a enormous source of information is transforming the ways students can learn about the different subjects. Teachers should become more and more authorities to validate rigorous and valid references to knowledge. Learning does not need to be a one-to-many activity anymore.
The afternoon was shorter as I had to work on my thursday presentation. I partially attended the ‘learning to rank’ tutorial, which provided a quite interesting introduction to rank mechanisms.
My submission to SIGIR 2009 has been accepted as a full paper. The article, entitled “Automatic Video Tagging using Content Redundancy”, proposes an interesting approach to the exploitation of redundant content in folksonomies. We consider the specific case of the leading video sharing website, YouTube. CBCR techniques are used to automatically detect duplication in the video collection, and several metadata propagation methods are proposed to spread community knowledge around the graph of resources.
The abstract of the paper follows below:
“The analysis of the leading social video sharing platform YouTube reveals a high amount of redundancy, in the form of videos with overlapping or duplicated content. In this paper, we show that this redundancy can provide useful information about connections between videos. We reveal these links using robust content-based video analysis techniques and exploit them for generating new tag assignments. To this end, we propose different tag propagation methods for automatically obtaining richer video annotations. Our techniques provide the user with additional information about videos, and lead to enhanced feature representations for applications such as automatic data organization and search. Experiments on video clustering and classification as well as a user evaluation demonstrate the viability of our approach.”
My joint work with Stefan Siersdorfer “Ranking and Classifying Attractiveness of Photos in Folksonomies” has been accepted as a full paper in the next edition of ACM Conference on the World Wide Web (WWW’09).
In this paper, we propose a novel methodology to derive a metric of photo attractiveness in a completely automatic manner taking advantage of user generated data in Flickr (namely metadata and user feedback statistics) as well as image visual features. The paper can be downloaded from the conference site using the following link. Slides for the presentation will be soon available from the same site.
Find below the abstract of the submission:
“Web 2.0 applications like Flickr, YouTube, or Del.icio.us are increasingly popular online communities for creating, editing and sharing content. The growing size of these folksonomies poses new challenges in terms of search and data mining. In this paper we introduce a novel methodology for automatically ranking and classifying photos according to their attractiveness for folksonomy members. To this end, we exploit image features known for having significant effects on the visual quality perceived by humans (e.g. sharpness and colorfulness) as well as textual meta data, in what is a multi-modal approach. Using feedback and annotations available in the Web 2.0 photo sharing system Flickr, we assign relevance values to the photos and train classification and regression models based on these relevance assignments. With the resulting machine learning models we categorize and rank photos according to their attractiveness. Applications include enhanced ranking functions for search and recommender methods for attractive content. Large scale experiments on a collection of Flickr photos demonstrate the viability of our approach.”