Thursday, October 23, 2008

We Got Data

There are lots of ways to get data into and out of Twitter. So many, in fact, that it's become a bit confusing as to what a developer's options are. We'd like to clear that up.

REST API

If you want to interact with Twitter on behalf of an individual user or a small group of users, your best bet is our REST API. It's perfect for making updates, retrieving timelines of tweets, marking tweets as favorites, and so forth — most any feature you'll find on twitter.com has a corresponding method in the REST API. Desktop applications like Twitterrific and Twhirl use our REST API, as do plenty of web-based and mobile device applications, widgets, and scripts.

Search API

If you want to programatically retrieve tweets about a particular topic, check out our Search API. We provide a lot of flexibility to support a variety of queries. You can filter by criteria like location, hashtag, dates, language, and more. We're seeing more great applications powered by Twitter Search every day.

Data Mining Feed

If you're interested in doing research on the Twitter community, we provide a data mining feed to meet that need. The feed is 600 recent public tweets, cached for a minute at a time. Academics and tinkerers alike are making use of the data mining feed, and we've already seen interesting research papers and statistics gathered from it.

Pinging Service

If you need to know when a large number of Twitter users update, check out Gnip. They'll ping you via REST or XMPP whenever the users you're interested in tweet. This works great for social sites integrating Twitter. We're talking with Gnip about providing full data to applications that need to keep up with a large number of Twitter users. This solution isn't ready just yet, but we'll keep you updated.

The Proverbial "Firehose"

Finally, we understand that some applications need the entire stream of tweets from non-protected users to work best. We provided this stream on an experimental basis some months ago, but had to limit its distribution to just a few subscribers while we worked on technical hurdles. We've looked at third-party solutions for relaying the full stream, but we think we can provide the best developer experience ourselves.

I'm happy to say that we've staffed a project with three engineers (myself included) to find the best solution for distributing what we jokingly refer to as our "firehose" of tweets. We intend to have a solution as soon as possible, as we know that some of you have applications at the ready that depend on this stream. In addition to sorting out the technology to be reliable and scalable, we'll provide a clear license agreement and approval process. We know there's a lot of interest in the "firehose" and we're eager to provide the best solution possible.

Staying in the Loop

We hope that clears up some of the confusion around the data we make available and how to get it. I'll be blogging soon about where we're at with the next version of our REST and Search APIs. You can always get help and share your experience in the Twitter Development Talk group.

Friday, September 26, 2008

On Specialization

It's been a while since we last shared what's going on with our engineering efforts here at Twitter.  In the past few months since our acquisition of Summize, we've grown large enough to be able to specialize.  More than just an engineering team and an operations team, the technical part of Twitter now has teams for search, user experience, back-end services, and so forth.  Your humble author is leading up the API team.

This specialization is letting us do great things.  The UX team has launched a well-received redesign and has a number of performance and usability improvements in store for the site.  The search team has been fighting spam and working with operations to meet the growing demands on search.twitter.com.  Engineers working on back-end services are getting ready to test a major overhaul of how we store and deliver timelines of tweets.  On the API side, we've just completed a strict, robust test suite for the API that will allow us to make major changes and deliver our next big release with confidence.

When I started at Twitter in early 2007, everyone in the company had their hands in everything, as is typical of early-stage startups.  As exciting as this was, I don't think we could have delivered something like the just-launched election.twitter.com without our current degree of specialization.  For every part of the complex system behind Twitter, we now have multiple domain experts.  When taking on new projects, we can quickly find the expertise we need and execute.

It may sound like typical company stuff,  but I've never seen us work together better as a technical organization than we are now.  We've got great things in store for our users and developers.

Wednesday, August 13, 2008

Status of the World - More than just human updates

First, let me introduce myself (Abdur Chowdhury) I am one of the founders of Summize and now Chief Scientist at Twitter. In my quest to understand the Twitter data repository I have had some interesting revelations and will post them here from time to time. My first revelation is it’s not just for humans anymore. Twitter's original question - "What are you doing?" now seems to be catching on with some machines. This is an important addition to the communication channel as it allows people to not only know what their friends are doing but also get the status of inanimate things from the world around us.

In this post, I’ll share some of my favorite examples of this trend like the Chandra X-Ray observatory that posts its location as it circles the globe every 20 minutes or so.


While knowing where a satellite is interesting, some examples that might be more helpful are Red Jets automatic posting of the arrival and departure of their ferries. Sydney Australia posts the traffic issues for the entire city and further sub-divides that world status into sections of the city like (Sydney South West). Sydney is not alone, other cities are starting to do the same from Bangkok to Orlando.



Ever wondered what that song title is you just heard on the radio or at a bar, well many bars and radio stations are now automatically posting the songs they are playing. The Internet Radio or the Belgium station Radioo are great examples of this trend. Want to know what games your friends are playing, Xbox can now post its status for others to track. One last note of interest, you may not want all these sensors in your time line, try using the GET command on Twitter. That will allow you to get the last status of an account for example: "GET InternetRadio".


While getting status of satellites, traffic or radio stations is interesting, it holds nothing to the creativity of a group of students at Olin College who hooked up their laundry room to twitter to get the status of their washers and dryers.


All of these things are natural uses of Twitter’s platform and they illustrate its flexibility and potential. I’m very intrigued to see how people will continue to wire machines and humans into Twitter’s Global Conversation.

Tuesday, July 15, 2008

Summize and The Future of the Twitter API

Earlier today we announced that Twitter has acquired Summize, the leading Twitter search service. I'd like to explain how this change will impact the Twitter developer community.

Summize — now search.twitter.com — has their own API that a number of applications are already using. Over at the Twitter Development Talk group I've been encouraging developers with search and data-mining needs to investigate the Summize API. Developers can now trust that Summize has the entire corpus of public Twitter updates at their disposal for search going forward.

For the time being, the Summize API will live at search.twitter.com as a separate entity from the main Twitter API. We'll try to get the two APIs as coordinated as possible in the short term, but merging Summize's architecture with ours will be a gradual process. That said, we have a clear opportunity for cleanly merging the two APIs.

The Twitter API has much room for improvement. As it's grown over the last year and a half, our API has become increasingly less RESTful. Resource names need to be synced with the nomenclature we use on the site—for example, the API refers to "friends," a concept we've long since left behind. Additionally, our API is due to support OAuth token-based authentication and consistency between method responses is also an issue. We have lots of work to do.

Our plan is to extract the API from our existing application, and in the process to clean up confusing methods, fix inconsistencies, and add features. As part of that work we'll integrate the Search API. The integrated API will live at api.twitter.com. We'll provide a comfortable migration period. Additionally, we'll start versioning our API releases to make future transitions even easier.

As we get closer to making these changes, we'll open up a community feedback process. The Twitter API is only as useful as its developer community is creative, and we want to meet your needs. In the meantime, here are answers to some of the questions from developers about the acquisition that I received today:

What kinds of new methods will you be adding to the API?

Well, search-related methods, as you might guess. We're embracing the concepts of searching and filtering statuses, and we have a bunch of ideas about how that's going to fit into Twitter. Expect the ability to filter many API responses by provided search terms, for example (ex, an API method that can answer the question, "what are the people I follow saying about 'iPhone 3G'?").

What about the different formats for API responses?

Right now, the Twitter API supports XML and JSON for all methods, and RSS and Atom for those methods that return lists of statuses. The Twitter Search API (formerly the Summize API) supports only Atom and JSON. There are also discrepancies between the set of attributes the two APIs return. We'll try to rectify this in the short term so libraries can be updated to take advantage of both APIs in the same way.

Will you maintain backwards compatibility? (That is to say, will old apps break?)

We'll absolutely try to maintain backwards compatibility for both APIs for a reasonable duration.

What about your "track" feature?

Some developers were using track and Jabber to replicate much of what the Twitter Search API can do. Until Jabber and track are back, we suggest you give the Search API a shot instead - heck, give it a shot even if your application doesn't need real-time tracking! As for the user-facing track feature, we're eager to bring it back better than ever, powered by Summize's excellent search technology.

Will Summize help the core team focus on getting IM/Jabber/Google Talk/AIM back up and running?

It's definitely going to be handy to have even more talented engineers on our team! The Summize folks won't being working on our IM infrastructure in the short term, but we'll have AIM and Jabber back up as soon as we can (yes, AIM is coming back!).

How will the transition from the old API to the new API be handled?

See above. In short, we're going to be overhauling the Twitter API and integrating the Summize API in the process. We won't shut any existing API methods down without plenty of advance notice.

Will the Search API do arbitrarily deep/old searches?

That's a goal, definitely, and we'll have to figure out what we can reasonably support. We recognize that while much of the value of Twitter is in recent updates, diving into the corpus of historical updates opens up a number of interesting applications.

Can the client provide a max_ret_count? Or make repeated calls with before_id param?

Check out the Twitter Search API documentation, specifically the rpp and since_id parameters. We'll try to get those parameters in sync with the nomenclature used in the main Twitter API.

The Summize/Twitter Search API doesn't currently have rate limits. Will it?

At some point, yes; you won't find a highly-used API anywhere that doesn't have rate limits. We're still sorting out the particulars, though, as we recognize that the Search API has different usage patterns than the main API.

Will status deletions be correctly deleted from Summize's index?

Yup, that's in the pipeline.

Thursday, May 29, 2008

You've Got Q's, We've Got A's

We had a lot of feedback on our architecture update. In the spirit of continued openness and transparency, I'd like to address a number of the questions that came up in the comments on that post.

Donnie asks if we're making a slow exodus from Ruby, and Scabr asks if our key problems were Ruby problems.  We've got a ton of code in Ruby, and we'll continue to develop in Ruby with Rails for our front-end work for some time.  There's plenty to do in our system that Ruby is a great fit for, and other places where different languages and technologies are a better fit.  Our key problems have been primarily architectural and growing our infrastructure to keep up with our growth.  Working in Ruby has been, in our experience, a trade-off between developer speed/productivity and VM speed/instrumentation/visibility.

RBL asks how we're instrumenting our application. We've used DTrace on a couple of occasions, but for the most part we do a lot of print-style logging to a combined syslog. We also recently added Evan Weaver to our team, and he's been rapidly evolving his BleakHouse memory leak profiling tool for use in our stack.

charles asks if there's anything users can do to lighten our load. The events that hit our system the hardest are generally when "popular" users - that is, users with large numbers of followers and people they're following - perform a number of actions in rapid succession. This usually results in a number of big queries that pile up in our database(s). Not running scripts to follow thousands of users at a time would be a help, but that's behavior we have to limit on our side.

thisisgoingtobebig asks if Twitter's really been down. It has indeed, and you can see how we're doing on our Pingdom Uptime Report and the new Twitter Status site. We have a number of internal monitoring tools and graphs we watch closely as well. We're painfully aware of every minute that we're slow or unavailable.

william sharp asks why Twitter was written as a content management system and not a messaging system. I was speaking in broad strokes when I described our current system as such, but was alluding to the fact that Rails was originally extracted from Basecamp, a CMS. Rails excels at CRUD-style applications, and while you can wedge a messaging system or other types of applications into that model, it's a square peg in a round hole.

In the same vein, Nicholas asks what we thought we were building from day one if not a messaging system. I wasn't at Twitter from day one, but my understanding is that Twitter started as a one-day project to explore sharing status via SMS that rapidly took on a life of its own. That Twitter would eventually evolve into a messaging system in its own right wasn't conceptualized from the get-go.

Chris Kilmer and Tembrooke both ask if putting some limits on what users can do in our system would help, and they're both right. We have some limits, and we're adding more. Legitimate users should never notice them, but these new limits should help mitigate the worst case failures and attacks.

Michael is confused as to why we don't have "an army of geniuses working day and night". We'd love to, but it's easier said than done. We interview constantly and have a talented recruiter bringing us exceptional candidates daily. We're careful about who we hire, though, because we're trying to build a great team in a sustainable way. We're currently exploring supplementing our team with consultants, and we've accepted strategic help from outside organizations who actually do have armies of geniuses.  If you'd like to lend your genius to our particular problems, have a look at our jobs page.

Lastly, Daniel Wabyick asks if I, personally, would use Rails again. I strongly believe that the best tool for the job is the best tool for the job. Rails is the best web application framework around for rapid prototyping and, as aforementioned, building CRUD-style applications. I would choose Rails again for such a project. That said, I'm constantly exploring new technologies, and I've also enjoyed working with Merb and Google App Engine for small projects recently.

Thursday, May 22, 2008

Twittering About Architecture

Here at Twitter HQ, we're not blind to the flurry of discussion over the past weeks about our architecture. For many of our technically-minded users, Twitter downtime is an opportunity to muse about what the source of our problems might be, and to propose creative solutions. I sympathize, as I clearly find our problems interesting enough to work on them every day.

Part of the impetus for this public discussion extends from the sense that Twitter isn't addressing our architectural flaws. When users see downtime, slowness, and instability of the sort that we've exhibited this week, they assume that our engineering progress must be stagnant. With the Twitter team working on these issues on and off for over a year, surely downtime should be a thing of the past by now, right? Shouldn't we be able to just "throw more machines at it"?

To both rhetorical questions, the answer is "not quite yet". We've made progress, and we're more scalable than we were a year ago, but we're not yet reliably horizontally scalable. Why? Because there are significant portions of our system that need to be rewritten to meet that goal.

Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency's sake, Twitter was built with technologies and practices that are more appropriate to a content management system. Over the last year and a half we've tried to make our system behave like a messaging system as much as possible, but that's introduced a great deal of complexity and unpredictability. When we're in crisis mode, adding more instrumentation to help us navigate the web of interdependencies in our current architecture is often our primary recourse. This is, clearly, not optimal.

Our direction going forward is to replace our existing system, component-by-component, with parts that are designed from the ground up to meet the requirements that have emerged as Twitter has grown. First and foremost amongst those requirements is stability. We're planning for a gradual transition; our existing system will be maintained while new parts are built, and old parts swapped out for new as they're completed. The alternative - scrapping everything for "the big rewrite" - is untenable, particularly given our small (but growing!) engineering and operations team.

We keep an eye on the public discussions about what our architecture should be. Our favorite post from the community is by someone who's actually tried to build a service similar to Twitter. Many of the best practices in scalability are inapplicable to the peculiar problem space of social messaging. Many off-the-shelf technologies that seem like intuitive fits do not, on closer inspection, meet our needs. We appreciate the creativity that the technical community has offered up in thinking about our issues, but our issues won't be resolved in an afternoon's blogging.

We'd like people to know that we're motivated by the community discussion around our architecture. We're immersed in ideas about improving our system, and we have a clear direction forward that takes into account many of the bright suggestions that have emerged from the community.

To those taking the time to blog about our architecture, I encourage you to check out our jobs page. If you want to make Twitter better, there's no more direct way than getting involved in our engineering efforts. We love kicking around ideas, but code speaks louder than words.

Saturday, April 12, 2008

140 Character Scripts!








A long time pioneer in the OSS world, Nat Friedman, recently wrote out a call for people to Take the Tweetable Script Challenge.

That is, to post interesting scripts in 140 characters or less.

He documented some of the results here: Ten Tweetable Scripts.


.