Please enter your text to search.

Rest In Peace, Steve Jobs

This is one of these moments that, despite being expected for a long time, end up really impacting you when they finally arrive. Steve Jobs passed away today. I am not the kind of person who like to exalt individual characters. It is always the case that behind the character there is always a group of people that help realizing their dreams. I am not even a long term mac advocate, nor a fanboy. I started using mac back in 2002, when OSX was mature enough and I was getting increasingly tired of the other OS alternatives. And I will of course change again when the alternatives catch up and surpass what Apple provides nowadays.

 

Despite it all, reading today about Steve Jobs death felt deeply painful. I guess you end up sympathizing with such charismatic personalities. I remember how, when I was younger, I would wait impatiently for his next keynote, download the video and watch it over and over. Yes, I must admit I was trapped in Jobs’ Reality Distortion Field for years.

 

Looking back, I think there are two main reasons why Jobs’ news today are having such a significant impact on me. The first goes back to 2002, when I bought my first mac, an iBook G3 800Mhz, and used OSX for the first time. That was the time I discovered there were different and better ways to do things, specifically in the computing field. There is always room for improvement, for innovation. It was a complete new user experience, the mix of a unix heart and, at last, a productive graphical interface environment. It also made me feel somewhat ‘unique’. I don’t have the numbers, but there was probably more Linux than Mac users at that time. Funny how macs are now the most popular computers, at least in my area of scientific research.

 

The second reason is the well-known speech for Stanford graduates he gave some time after he was diagnosed with pancreatic cancer. It is a transcendent speech, about death, about goals, about dreams. The speech of somebody that has looked at death in the eye. He was able to convey the insignificance of our lives, and the necessity to pursue our dreams and follow our instincts.

 

I guess that watching him pass away makes me realize how far I am from realizing mine…

 

Thanks Steve Jobs. Rest in peace.

Book Review: 21 Recipes for Mining Twitter by Matthew A. Russell

This book provides readers with a quite comprehensive introduction to extracting and analyzing information from Twitter. While it is expected that the reader is somewhat familiar with the different Twitter APIs, the author does a fantastic job at presenting strategies for crawling and mining data using python and some additional and freely available third party libraries.

 

The main three aspects that I loved about this little gem were:

 

  • The author does a great job at highlighting the main Twitter’s API limitations (e.g. maximum number of requests for each API call) and bugs (e.g. user ids being different in the ‘/search’ API). Solutions, in the form of functional code, are given. This information can save literally hours debugging code or waiting for twitter to remove restrictions imposed after going beyond some of the limits imposed by the system.

 

  • All the code, available for free from the author’s github.com account, is very well conceived, illustrative and most of the time can be used directly from the command line to perform simple tasks with Twitter’s data.

 

  • Lots of 3rd party libraries and tools (e.g. CouchDB, Redis, Protovis, etc.) are introduced to the reader, and used in appropriate contexts. That is, when they actually make the code easier to read, or simply more flexible in terms of scalability. I’ve learned quite a few tricks that are changing the way I work with data (and not just twitter data).

 

On the other hand, I really missed a short introduction to the main Twitter APIs. It’s confusing to read about “statuses” or “timelines” without a prior formal definition. It took me quite some time to distill the appropriate information from the Twitter’s developer documentation.

 

A must read if you are planning to work with Twitter data.

Book Review: 25 Recipes for Getting Started with R by Paul Teetor

This book contains a fantastic first glance at the R language and its possibilities for statistical data analysis. The book is targeted to the new R programmer, and assumes just a basic knowledge of statistics. Easy to read, it provides readers with the basics to start working with this exciting language. Its main disadvantage: it is just too short and leaves the reader wanting more.

 

As a complete newbie to the R language, I have found the first recipes describing data input extremely helpful. These by themselves are worth the price of the book. R allows for quite a wide range of input formats as well as parsing options. The author gives illustrative examples for the functions and parameters used to import the most common file formats (e.g. csv). It seems obvious after reading these pages that finding this information in the R reference manual would have taken hours instead of minutes.

 

The second part of the book describes data representation within the R environment, namely vectors and data frames. This part is also very useful for the new R programmer: knowing the native R data types helps to understand how the statistical methods function.

 

The book ends with recipes describing basic statistical functions. While the first few examples are illustrative and helpful for about every new R programmer, I find that the last examples are way too specific and not as helpful. This leaves the reader with mixed feelings about the book: for such a short book, every single line should be meaningful to every reader.

 

Buy the book if you are new to the language and want to start using it and getting results in a matter of minutes, literally. Don’t buy it if you know your way around the language. If you are not in a rush, I would recommend alternative and more comprehensive readings, such as “R in a Nutshell” by Joseph Alder.

RAEplus reaches 1000 apps sold!

According to the Apple App Store statistics, we sold our 1000th copy of RAEplus just yesterday. As you know, RAEplus is MadIdeas‘ application to access the Normative Dictionary of the Spanish Language. After a somewhat slow start, the sales trend has been growing steadily allowing us to reach this milestone sooner than we expected.

Since its release, RAEplus has reached the top of the sales chart in the “Reference” section of the Spanish App Store, currently holding the 2nd position. It has also been among the 100 most popular paid apps (all categories) in the Spanish store for several weeks. It has collected a good amount of positive reviews and ratings.

Thanks a lot to all happy RAEplus owners!!

If you still don’t have it,  get your copy today!

Full paper accepter into WWW’2010

My joint work with Stefan Siersdorfer “How useful are your comments? – Analyzing and Predicting YouTube Comments and Comment Ratings” has been accepted as a full paper in the next edition of ACM Conference on the World Wide Web (WWW’10).

Find below the abstract of the submission:

An analysis of the social video sharing platform YouTube reveals a high amount of community feedback through comments for published videos as well as through meta ratings for these comments. In this paper, we present an in-depth study of commenting and comment rating behavior on a sample of more than 6 million comments on 67,000 YouTube videos for which we analyzed dependencies between comments, views, comment ratings and topic categories. In addition, we studied the influence of sentiment expressed in comments on the ratings for these comments using the SentiWordNet thesaurus, a lexical WordNet-based resource containing sentiment annotations. Finally, to predict community acceptance for comments not yet rated, we built different classifiers for the estimation of ratings for these comments. The results of our large-scale evaluations are promising and indicate that community feedback on already rated comments can help to filter new unrated comments or suggest particularly useful but still unrated comments.

RAEplus 1.0 released

It’s been a while (actually, quite a long time) since my partners at MADideas and I finished the development of RAEplus, our very first application for the iPhone and iPod Touch platforms.

RAEplus is an enhancement over previous attempts to provide an interface for accessing RAE (the normative dictionary of the Spanish language). The app encloses a comprehensive database of spanish terms, and actively helps users locate the words they want to look up in the dictionary. A dynamic list of suggested terms is updated in real-time to provide completions for the word the user is typing. This approach enables users to save many keystrokes, and therefore to look words far more efficiently.

The app also integrates a thesaurus of the Spanish language. The app synchronizes users’ search in both dictionaries, providing an intuitive and efficient way to change between these two modes.

Finally, the enclosed term database is used to select random words which are presented at the start page of the app. The user can look up any of these words or, alternatively, shake the device to get a new selection of terms. This is a great way to learn new terms and, ultimately, master the Spanish language.

The app is available from the iTunes Store in the following link.

Get it today and enjoy!

MobileMe (dotMac) mail is sent to the Spam folder by Gmail

These last couple of days I have been experiencing a quite frustrating behavior of what I think is a problem with Apple’s Mail Client.

The symptoms are simple. All email sent from a dotMac account to a gmail account is sent to the spam folder by google’s filters straight away. After some tests and some initial swear words to google’s filters, I’m starting to think that the problem might well be a poor/defective mail software by apple.

Facts:
1) Sending the email to several recipients seems to solve the problem (at least if one of them is CC’ed)
2) Sending the email from the Mobile Me web interface also seems to work alright
3) Aliases feature the exact same behavior

It seems there’s some buggy behavior somewhere in the client or in the mail delivery infrastructure which is causing perfectly valid messages to be considered spam. Maybe some faulty header field, perhaps a DNS problem with the STMP server…

Meanwhile, remember to CC somebody (I normally do myself) to get your messages properly received!!

Update: Things seem to be back to normal again. All my messages being properly received. Still, I’m getting comments of people still finding troubles. A trick that was working fine for me in the meantime was using the @me.com extension to send the emails. Try this and comment back on the results!

Summary of WWW’09 conference

After a quite intensive week, WWW has concluded and it’s seems a good time to look back and summarize the main activities and experiences. Though it was my intention to keep a dairy log of the events, the lack of time was much more important than I expected. So, let this late entry replace the originally planned ones ;)

The conference begun with a series of tutorials and workshops on both monday and tuesday. My schedule for tuesday included only attending the content analysis in web 2.0 sites workshop. The morning session was quite interesting, and included a really inspirational talk by Hugo Zaragoza where he described his work on mining web 2.0 available information to produce semantic formatted web content. Due to work obligations I had to miss the afternoon session of the workshop. However, it was a productive afternoon where I was able to discuss potential future works with Stefan Siersdorfer in order to continue with the fruitful collaboration that begun during our postdoc in Sheffield.

Wednesday was quite a long day. The main conference started that day, and the organizers had arranged for the prince of Spain to give a speech that very morning. The presence of media and the special security measures turned the conference venue into a not very scientific show. Fortunately, Tim Berners-lee did also give a speech that morning, and that helped us forget about less interesting political aspects.

On that day I met lots of new fellow researchers. I had already met on previous days with colleagues from Telefonica I+D (Xavier Amatriain and Pere Obrador). I also met additional members of the research lab that telefonica has in Barcelona, including Joachim Neumann, Karen Church and Josep M. Pujol. I had the chance to meet a former intern of the same lab, Meeyoung Cha, now working as a postdoc at Max Plank Institute. She gave a superb presentation on that day and later joined us to experience Madrid’s night life. Some other colleague from MPI also joined us, Fabian, Mauro and Dimitar.

Another rather interesting session that day was Recommender Systems, where I especially liked the “tagommenders” presentation by Shilad Sen, who will later be in charge of chairing the session where I presented my work.

On thursday I arrived in the conference venue just in time for my presentation, which was scheduled 2nd in the first session of the morning, at 12pm. More about it in this entry. Finishing the presentation felt quite good, and I was able to finally relax after some days of stress.

The afternoon sessions were very interesting. Right after lunch I went to the Media Applications session, where Lyndon Kennedy presented a very original work on grouping together videos captured during the same event. I had been working on similar ideas but using video content analysis instead of audio, so I really enjoyed the presentation. For the final session of the day I chose Web Privacy, which turned out to be a very informative session about all the private data that can be inferred from our interaction with the internet. I especially enjoyed Elena Zheleva’s work on the exploitation of public profiles to know about private profiles.

After the sessions, we headed to the golf course nearby for the conference banquet. By that time of the day and the week I was already exhausted, and although dinner was good I couldn’t enjoy myself so much. Chance wanted that I would sit down at the same table where Marc Gauvin was. He talked me about his new project, a digital rights management system that sounded really interesting (and which I had the chance to see demoed the following day).

On friday I arrived to the conference venue just in time to hear Ricardo Baeza’s keynote. It was a really interesting talk, where he reviewed some methods to exploit social data to enhance search. The most interesting aspect of the talk was the kind of social data they use. Instead of mining web 2.0 sites, they have enough information in their query logs to mine user community interactions. About the sessions, I attended Web Monetization before lunch (not really my cup of tea), Query Categorization right after lunch (interesting talks about methods to better understand user queries) and finally went to the developers track to watch ipos-ds presentation and demo (good presentation of a really interesting service to allow management and licensing of content).

Overall, a top-quality conference that gave me the chance of meeting many new researchers and research works. Inspirational and educational at the same time. Next appointment: SIGIR!

Presentation at WWW’09

Yesterday it was my turn for presenting our full-paper accepted into this year edition of ACM World Wide Web conference (WWW’09). The expectation for the “Photos and Web 2.0″ session was far beyond the calculations of the organizers, and the room allocated for the talks was simply too small for all the people that showed up. Many had to see the first presentation of the session looking through the door while standing in the corridor. It was during the second presentation (my one) that they decided to remove the panels at the rear to merge with an empty room just behind. Though it was absolutely necessary I simply do not understand why they decided to do it right in the middle of my talk. I loss my concentration completely and it was difficult to re-start the talk again.
Despite the difficulties, the presentation went alright and it lead to a positive reaction from the audience which asked a good amount of interesting questions. The slides are available from the www2009 epapers website.

See you in Boston next July! Sigir awaits!

WWW 2009 – Day 1

WWW started today with the round of tutorials and workshops that commonly precede such large conferences. I met again with some of the people I’ve been running into in the last few conferences I’ve attended and used the chance to receive an update from them. I also met with my former college Stefan Siersdorfer, co-author of the submission I got accepted into this conference.

In the morning session I attended the social computing tutorial conducted by Irwin King and which turned out to be rather interesting. This comprehensive review of the many different research areas around web communities had it’s highlight in the human computation section, when he spoke of the different mechanical turk-based applications for the collection of difficult to obtain  knowledge directly from humans. These applications are normally implemented as games and are massively used. There was also time for a good introduction to query suggestion techniques, as well as some remarks on privacy issues on the web. He finished the presentation with some thoughts about education and the web. Such a enormous source of information is transforming the ways students can learn about the different subjects. Teachers should become more and more authorities to validate rigorous and valid references to knowledge. Learning does not need to be a one-to-many activity anymore.

The afternoon was shorter as I had to work on my thursday presentation. I partially attended the ‘learning to rank’ tutorial, which provided a quite interesting introduction to rank mechanisms.

Full paper accepted into SIGIR’09!

My submission to SIGIR 2009 has been accepted as a full paper. The article, entitled “Automatic Video Tagging using Content Redundancy”, proposes an interesting approach to the exploitation of redundant content in folksonomies. We consider the specific case of the leading video sharing website, YouTube. CBCR techniques are used to automatically detect duplication in the video collection, and several metadata propagation methods are proposed to spread community knowledge around the graph of resources.

The abstract of the paper follows below:

“The analysis of the leading social video sharing platform YouTube reveals a high amount of redundancy, in the form of videos with overlapping or duplicated content. In this paper, we show that this redundancy can provide useful information about connections between videos. We reveal these links using robust content-based video analysis techniques and exploit them for generating new tag assignments. To this end, we propose different tag propagation methods for automatically obtaining richer video annotations. Our techniques provide the user with additional information about videos, and lead to enhanced feature representations for applications such as automatic data organization and search. Experiments on video clustering and classification as well as a user evaluation demonstrate the viability of our approach.”

Full paper accepted into WWW’09

My joint work with Stefan Siersdorfer “Ranking and Classifying Attractiveness of Photos in Folksonomies” has been accepted as a full paper in the next edition of ACM Conference on the World Wide Web (WWW’09).

In this paper, we propose a novel methodology to derive a metric of photo attractiveness in a completely automatic manner taking advantage of user generated data in Flickr (namely metadata and user feedback statistics) as well as image visual features. The paper can be downloaded from the conference site using the following link. Slides for the presentation will be soon available from the same site.

Find below the abstract of the submission:

“Web 2.0 applications like Flickr, YouTube, or Del.icio.us are increasingly popular online communities for creating, editing and sharing content. The growing size of these folksonomies poses new challenges in terms of search and data mining. In this paper we introduce a novel methodology for automatically ranking and classifying photos according to their attractiveness for folksonomy members. To this end, we exploit image features known for having significant effects on the visual quality perceived by humans (e.g. sharpness and colorfulness) as well as textual meta data, in what is a multi-modal approach. Using feedback and annotations available in the Web 2.0 photo sharing system Flickr, we assign relevance values to the photos and train classification and regression models based on these relevance assignments. With the resulting machine learning models we categorize and rank photos according to their attractiveness. Applications include enhanced ranking functions for search and recommender methods for attractive content. Large scale experiments on a collection of Flickr photos demonstrate the viability of our approach.”