This book provides readers with a quite comprehensive introduction to extracting and analyzing information from Twitter. While it is expected that the reader is somewhat familiar with the different Twitter APIs, the author does a fantastic job at presenting strategies for crawling and mining data using python and some additional and freely available third party libraries.
The main three aspects that I loved about this little gem were:
- The author does a great job at highlighting the main Twitter’s API limitations (e.g. maximum number of requests for each API call) and bugs (e.g. user ids being different in the ‘/search’ API). Solutions, in the form of functional code, are given. This information can save literally hours debugging code or waiting for twitter to remove restrictions imposed after going beyond some of the limits imposed by the system.
- All the code, available for free from the author’s github.com account, is very well conceived, illustrative and most of the time can be used directly from the command line to perform simple tasks with Twitter’s data.
- Lots of 3rd party libraries and tools (e.g. CouchDB, Redis, Protovis, etc.) are introduced to the reader, and used in appropriate contexts. That is, when they actually make the code easier to read, or simply more flexible in terms of scalability. I’ve learned quite a few tricks that are changing the way I work with data (and not just twitter data).
On the other hand, I really missed a short introduction to the main Twitter APIs. It’s confusing to read about “statuses” or “timelines” without a prior formal definition. It took me quite some time to distill the appropriate information from the Twitter’s developer documentation.
A must read if you are planning to work with Twitter data.