Engineering Blog
The Classifier

print( “Hello, TellApart!” )

Introducing Wade Chambers, Vice President of Engineering at TellApart

“I’m a great believer in luck, and I find the harder I work the more I have of it.”

― Thomas Jefferson

feel lucky.

I’ve been lucky to have gone through three IPOs and a major acquisition. I’ve been lucky to have worked with some of Silicon Valley’s elite and to have helped build technologies that hundreds of millions of people have used (and hopefully enjoyed).  I’ve been lucky to hire, mentor, and coach some of the best minds in the valley and watch them grow into amazing Venture Capitalists, CEOs, Directors and Vice Presidents of Product Management, Marketing, and Engineering.  I have been lucky to be a part of teams that have accomplished absolutely amazing things against strong odds.  I am thankful to have been a part of all of those things and more.  But today I feel lucky and humbled by the opportunity in front of me.

I am extremely excited to join TellApart as the Vice President of Engineering and look forward to playing a part in the next wave of scale for the company.  TellApart is a company that is already killing it, committed to doing it the right way, and at the center of what is technically hard but relevant.  As an executive and technologist, it doesn’t get much better.

It shouldn’t be surprising that TellApart is doing so well.  TellApart’s business model is completely aligned with their — correction, our — customers (and our customer’s customer).  We only make money if our customers make money using our products.  To do so, TellApart has built a team of ‘A’ players who are making the cloud, Big Data, and machine learning all work together in a highly-scalable, extremely high performance platform that is capable of both understanding the sophisticated merchandising models of the world’s best brands and the wants/needs of a motivated buyer.

What did come as a surprise to me is TellApart’s strong commitment to do the things necessary to be successful in a scalable and repeatable way (including growing the next generation of leaders and entrepreneurs in the Valley).  This forward thinking personally excites me as I would like to see the next generation of leaders make new mistakes, not repeat old ones.  Over the last 20+ years I have built models, patterns, anti-patterns, and best practices and then (thankfully) reduced the number of “must haves” down to the few (80/20 rule) that generate the greatest impact.

Of those, here are a couple that I feel are connected but often overlooked:

  • Impact over Completion.  You have to ship to win.  As a result, getting good at closing down a project is a necessary and valuable skill.  The problem occurs when shipping takes precedence over creating the desired impact with your customer base.  Impact is measured externally by your customer and involves them realizing if what you shipped them actually met with what they wanted and/or needed.  Completion, on the other hand, is measured internally by the team and is accomplished when the project’s ship criteria is simply met.  Getting the team to own desired impact is much harder and requires different skills inside of the team but produces far superior results.  The team must interact with the customer(s) to test assumptions, theories, approaches, and pivot until they reach the desired impact.  In my experience, teams that continuously focus on making the desired impact a part of their completion criteria grow faster and have much longer relationships with their customers than those who only focus on shipping what they think they want.
  • Measurement over Opinion.  I have often heard that a person’s opinion is only as good as their level of expertise on a subject matter.  It resonates with me.  But it also means that I can’t weigh every opinion in the same way, as they are highly variable and largely formulated from prior experience.  Unless prior experience perfectly maps to the problem under consideration, it is most likely flawed in some way.  Measurement helps expose the noise and focus on signal.  I am highly suspicious of opinions (especially my own), and I look for ways to measure facts and leverage them instead.  It may take several paired measurements to get to the needed facts, but at least then you have something from which you can create a shared baseline.  Good facts generally make for better decisions … egos and emotions seem to dissolve in the face of the right facts.  Please don’t get me wrong, I personally love opinions—and I have plenty of emotions too … I just value facts more.  A lot more.

If these principles are already at work in your environment, you’re doing significantly better than most companies in the Valley.  If not … we are hiring. Want to be lucky too?

wade@tellapart.com

In the Nix of Time

null

Back during the 2011 holiday season I was speaking with Josh, TellApart’s CEO, about a piece of custom hardware I had just built for the office — an “Easy Button” that launches new clients. Josh is an avid follower of what’s hot at the intersection of design and technology (see3D printed lapel pins), and since we were on the topic of hardware, our conversation led to the resurgence of Nixie tubes for decorative displays. He wondered if there was some TellApart themed way of using them, and offered to commission any interesting idea I could come up with.

tumblr_inline_mn82688wZ21qz4rgp

IN-18 Nixie Tube

I didn’t give the proposal too much thought at the time. Nixie tubes are an old technology, requiring high voltages and driver chips that fell out of production decades ago. It was also the middle of the holiday season — by far the busiest time of year for TellApart, so my attention turned to other projects. But the idea was lodged in the back of my head, where it sat percolating.

Many months later, I came across the newly launched Electric Imp — a hardware, software, and web service combination that can trivially connect small, cheap devices to the Internet. Something in my head clicked, and I knew what I had to build. I brought a proposal to Josh, who promptly handed me the corporate credit card, and I was off to work.

tumblr_inline_mn827hS1Xs1qz4rgp

Testing with the Electric Imp hardware.

I knew what I wanted to build: an Internet-connected counter with 10 IN-18 Nixie tubes. There was a catch though: Josh wanted to show off the finished product at the 2012 TellApart holiday party, which gave me just 6 weeks to pull it all together.

IN-18 tubes have been out of production for more than 20 years, and were only made in the former USSR, making them somewhat hard to find. Things like connectors and driver circuits have to be custom made. To make things harder, all the modern reference designs are for 1 to 6 tubes — driving 10 tubes takes more current and more I/O pins than these designs allow for. I raced to cobble together a solution, which ultimately involved an importer in Los Angeles, parts from a hobbyist in the UK, a pair of .5mm pitch 100-pin I/O expanders that pushed the limits of my soldering skill, and some custom parts from TAP Plastics in Mountain View.

Once I had the parts, assembly was straightforward, albeit extremely tedious. I didn’t have time for the back-and-forth needed to make a custom printed circuit board, so I did all the wiring by hand. All told, there are over 500 solder points or connector crimps, which ate into pretty much every night and weekend from the moment the pieces arrived until the unveiling.

tumblr_inline_mn82v55MqP1qz4rgp

So many wires

Fortunately I finished just in time, and the result turned out better than I had anticipated:

tumblr_inline_mn82ap4S9O1qz4rgp

SAMSUNG

tumblr_inline_mn82xx94vV1qz4rgp

That just left filling out the software and tying into the TellApart real-time reporting service. I setup a stream of Ad Impression and Click data pointed at the Electric Imp servers. From there the data is forwarded to the counter over its Wi-Fi connection. Finally, I made some (literally) last minute tweaks to the I2C driver to get the animation just right. The result is a real-time counter of global TellApart Ad Impressions, rendered in glass and neon, right in our lobby:

It was a huge hit at the party; a few people even wanted to know how they could get their own. It was also the center of attention at our PyCon booth and drew quite a crowd. There are a few improvements I’d like to make: adding an OLED character display to tell you what it’s counting, adding new transition effects, and possibly making a proper printed circuit board. For now though, I’m happy to just watch the impressions count up.

Kevin Ballard is a Software Engineer at TellApart. Also, we’re hiring!

 

Presenting commandr: an easy to use and automatic CLI builder for Python

Scripts are a big part of any code base, and if you’re like us, you have lots of them. Also, if you’re like us, you’re not too fond of writing Python optparse specifications. That’s why we created commandr, a tool for automatically converting a function’s signature into a command line interface, using just a decorator. We’ve found commandr so useful, we decided to open-source it!

commandr has a number of useful features:

  • It infers and casts types automatically from default values.
  • Documentation is automatically generated from each function’s signature and docstring.
  • Boolean parameters are automatically converted to flags, (e.g. —caps or —no_caps).
  • Commands can be grouped, making long lists of commands easy to read.

Here’s an example usage:

From the command line:

Internally, we’ve combined commandr with fabric to allow functions to be invoked from our local command lines and executed on remote servers — just by adding a decorator. (Unfortunately, that part was too tightly integrated with the rest of the TellApart stack to release it here).

You can get commandr right from PyPI:

 

Hopefully, you’ll find this tool as useful as we do!

Kevin Ballard is a Software Engineer at TellApart.

How Tyra helped us train our user models in minutes rather than days

All of TellApart’s products are powered by a machine-learning pipeline that calculates Customer Quality Scores (CQS). This is our secret sauce. Without CQS we don’t know the value of a user and don’t know how much to bid on ads for that user. It’s what makes TellApart’s engine churn. But recently, we noticed that with our growing volume of data we were going to need to add more horsepower…

tumblr_inline_mmisjv97fi1qz4rgp

Modeling pipeline to calculate CQS

CQS are calculated in real-time as users navigate the web and allow our algorithms to make decisions ranging from bidding on display advertisements to selecting what content we ultimately serve. The two main challenges in calculating these scores are:

  1. Processing huge amounts of offline historical data with which to train statistical models that predict future user behavior – these predictions are what make up the CQS.
  2. Making predictions fast enough to meet strict latency constraints in the real-time bidding (RTB) ad exchanges in which we participate.

In this post, we discuss recent work to address the first challenge. We’ve had a map-reduce-based modeling pipeline in place since the earliest days of TellApart, but our rapidly growing datasets have proven it to be unscalable and re-working was urgently needed.

The steps in our pipeline are (1): a feature extraction step to produce consolidated representations for each event and (2): a training and validation step to produce high quality statistical models.

Our new system, called Tyra, is a re-design of step (1) above. Traditionally we call our modeling codebase “runway”, and our new system is named Tyra because Tyra trains the runway models! (ba-dum ch)

This blog post aims to share some of the lessons we learned while designing Tyra and to describe the ultimate system we produced. Our desiderata were:

  • Scalability: we are experiencing hockey-stick growth in data volume and want to continue training models with large feature sets without restricting the historical signals we can include.
  • Rapid feature exploration: we experiment with new features, and these evaluations should be fast and require minimal reprocessing of old data.
  • Speed: we are nimble and move quickly so we should be able to re-train models on the latest data weekly or even daily.

Background: Feature Extraction

Each data point used in model training corresponds to a recorded event for one of our users. For example, two types of events are ad events (e.g. the user viewed one of our ads) andproduct view events (e.g. the user viewed a pair of shoes on one of our clients’ sites).

Our characterization of an event can be broken into two parts: features and a class label. Features are pieces of data about the event itself or about the user’s history, for example, time of day the event occurred or number of ads we have shown the user in the past 30 days. Class labels represent facets of customer quality, for example, propensity to click on ads (e.g. class label of click/non-click) or to purchase products (e.g. class label of purchase/non-purchase).

We can visualize the pairing of features with a class label in the model-training pipeline as such:

fig1

The events themselves are stored as flat files on Amazon S3, and are organized by event type and date. The feature extraction pipeline is a map reduce job that processes these logs. At the end of the feature extraction step, we will have produced a list of features for each event.

Sound simple? To give a sense of the scale we are working at, one model we trained a few days ago used roughly 10 billion events!

IMMEDIATE AND HISTORICAL FEATURES

Many features we can simply extract from the row itself, such as the time of day or the website on which we displayed an ad. These features we call immediate features, and are straightforward to extract directly from events — because they depend only on that event itself.

The more interesting features, however, are historical features that depend on user behavior over time. For example, one historical feature is: number of ads shown to this user in the past 30 days. This is much harder to track for millions of users across billions of events. The biggest challenge to scalability in the modeling pipeline is extracting the massive amount of these historical features.

EXTRACTING HISTORICAL FEATURES

There are two general approaches to extracting historical features from logs:

1. Retain the historical context within each event’s log entry:

fig2

We save a snippet of that user’s history in log entries, so this approach requires no examination of historical data. The feature extraction pipeline simply sweeps over log entries and outputs the feature values. The disadvantage of this approach, however, is that it is limited to features that can be derived from the historical data we chose to put in the logs at the time that the event occurred. Adding a new historical feature requires augmenting the information recorded at each log event and waiting to collect training data.

2. The alternative approach is to sweep over past events in a user’s history to extract necessary historical information.

fig3

The advantage of this approach is flexibility: any desired historical feature can be added at any time, and does not require foresight when logging event data. The disadvantage is that a huge amount of historical data must be processed.

Our old feature extraction pipeline used approach (2) and has served us well until recently, but has outlived its practicality when extracting features for two use cases:

  • Production models: extraction time scaled with the time window of the features. For example: extracting 30 days of training data with 30-day trailing window historical features required 60 days of log entries.
  • Models exploring new features: because historical context was not cached in any way, adding a new feature requires re-running the entire flow which could take more than 12 hours to run.

Our new system, Tyra, is designed to retain the best parts of these two approaches.

Tyra Feature Extraction Pipeline

The central idea behind Tyra is to get the advantages of approach (1) above with cached historical context while retaining the flexibility of (2) by being able to easily add new features.

We accomplish this by computing and storing historical summaries. For every day, for every user who interacted on that day, and for every historical feature type, we store a summary containing all historical information necessary for extracting this feature on future days. For example, the feature number of pages viewed in the last month has an associated summary that counts total pages viewed by each user on each day.

This system has some nice properties:

  • Summaries take the place of re-processing past events and are updated on a daily basis.
  • We can retroactively add new historical feature types by adding new summary types to the data store.

Tyra consists of three steps:

  1. Summarize – each day, sweep over all events of that day to produce a summary object for each user and each feature type.
  2. Combine summaries – sweep over summaries from prior days and combine to produce auser state for each feature type.
  3. Extract – begin with the user state for previous day and output successive features for each event on this day (updating an intra-day state as we go).

fig4

In Production

Tyra is implemented in Java using Cascading as an abstraction layer to Hadoop running in production our AWS Elastic Map Reduce cluster. It currently churns out roughly 100 million summaries and over 50 billion individual feature values to our data store every day. The features are accessible via Hive and via our custom Cascading-based tool that supports sub-sampling in order to facilitate data exploration.

As we grow as a company, we have an increasing demand to train new models and run experiments with new features. Our old system was holding us back and would soon have become untenable. This work has transitioned us from a monolithic feature extraction pipeline that churned out months of features in one shot, to an incremental and persistentdata pipeline that avoids repeated computation. With Tyra we can now easily and quickly retrieve subsets of features, train models, and add new historical features. To give a sense of the savings, we can now train models in minutes as opposed to the 12 to 24 hours it previously required.

The only downside is that with our experiments running hundreds of times faster we have less free time waiting around to take in the finer things in life like watching re-runs of America’s next top model. On the other hand, with all the saved AWS fees maybe we can invite Tyra Banks to our next happy hour.

 


(Further Thoughts)

  • State vs summary: it would be possible to skip the summarization step and instead store and update user states directly. This would avoid the need to combine summaries over multiple days. However, storing summaries has the advantage of maintaining independence between days while tracking state has the disadvantage of allowing data integrity issues to cascade forward in time. That being said, we may consider a pure user-state-based system in the future if we start incorporating longer-term historical features.
  • Right now, our summaries are stored in flat binary files with no indexing. While this is acceptable for model training, we are considering moving to HBase in the near future to make this user-centric data more pliable.
  • When implementing Tyra we found it best to use as few reduce steps as possible. This usually required adding dummy values to reduce keys and sorting so that some rows reach reducers before others. In Cascading, this comes down to using GroupBy for dissimilar pipes rather than CoGroup.
  • We found it extremely helpful to keep the number of files and distinct directories as low as possible — this decreases HDFS copy time significantly. One simple way to achieve this is to use a lower number of reducers for jobs writing persistent data.

Tyra was built by TellApart Engineers Jeshua Bratman and Nick Pisarro.

gevent at TellApart

At TellApart, we’re big fans of gevent. It powers a number of our core components, from the TellApart Front End servers, to our open-source Taba aggragation service. For those not familiar with it, gevent is a library for standard Python (CPython) based on greenlet and libevent, which enables high concurrency workloads and co-operative scheduling.

Since we use it so extensively, we were recently invited to give a talk at Pinterest about gevent, its advantages, some pitfalls to avoid, and how we use it throughout the TellApart stack. We wanted to share the slides from that presentation for anyone who’s interested.

 

Early Results from Facebook Exchange Show 10-20x ROI for TellApart Clients

As one of the earliest companies on the Facebook Exchange (FBX), we’ve been serving ads through FBX for three months and have learned an immense amount about how to maximize performance from this new platform. We can say without reservation that it delivers strong ROI for our clients.

Facebook ads work. Hugely so, as every one of our clients on the exchange has seen a 10-20x return on investment. eBags, one of our earliest marketers to go live on FBX, sees an unwavering 15x return on ad spend. And when we say the Facebook Exchange works, understand that this is based on solid direct response data.

TellApart has the unique business model of only making money when our clients do. We get paid (a share of the sale) only when a shopper first clicks on one of our ads and then converts on our retail client’s site. There may be differences in opinion as to the worthiness of the click as the end-all metric, but there is no debating the direct response value of a shopper who clicks and purchases. Indeed, Google has built $40B of annual revenue around this core behavior.

What makes FBX a game changer for the display ads industry?

1) Huge, Consistent Audience & Environment
First, it’s important that Facebook has a lot of users who come back often and spend a lot of time there. According to ComScore, Facebook serves over 25% of all US display ad impressions, and we’ve seen that proven to be accurate. But unlike other exchanges, FBX offers a stable audience and consistent context. To know that the ads we purchase are going to appear in a clean, well-lit environment with high visibility (i.e. rarely below the fold or not in view) — is invaluable when doing the type of modeling that we do to determine predicted click through and conversion rates.

2) BYOD & Import Intent — Per User
With FBX, Facebook has enabled marketers to import their own customer & intent data, while still protecting user privacy. Facebook doesn’t share any new data with us as a result of these ads, but we’re able to remarket to people the products they’ve seen elsewhere. As with other ad exchanges, Facebook allows each marketer to “Bring Your Own Data.”. If a shopper has shown strong interest in a particular pair of Jimmy Choos on Nordstrom, we can show precisely that user exactly that pair of shoes. Both ad creative and bid pricing decisions are made at the level of the individual user, in real time.

3) Highly Engaged Users
Users shown ads on Facebook via FBX click through at similar — or better — rates as all other ad exchanges. TellApart FBX ads for OnlineShoes.com yield a consistent user click through rate of over 15%. That 15 out of 100 users to whom we served ads on Facebook would choose to click through is an amazing fact. Our all customer average user CTR is 6.6% on both the Facebook Exchange and Doubleclick Ad Exchange.

4) High Return on Investment
As great as clicks are, TellApart’s efforts are worthless until the shopper actually makes a purchase. And those from FBX do. The shoppers who come from the Facebook Exchange clicks are of high quality, with order values that are equally as high as our merchants’ average.

5) Fully Complementary
As a buyer across all of the major real time biddable ad exchanges, even we have seen that FBX increases our unique reach by 30%. The significance of this number should not be understated. With FBX, Facebook has introduced a new form of advertising media that is complementary to both the other Facebook programs and non-Facebook focused programs that retailers may be running. Because FBX uses marketers’ audience data, not Facebook data & targeting, any overlap is likely to be small.

FBX is a new opportunity that all marketers should seriously consider. It’s not accurately described by either ‘earned media’ or ‘paid media,’ and so we’ve taken to calling it, ‘invested media.’ You’ve worked hard to build your ecommerce site and brand. You’ve spent huge sums on keywords and queries. You’ve invested in creating the audience on your site who is interested in what you have to offer. Now you can take that data asset and leverage it into the world’s newest, game-changing ad marketplace: the Facebook Exchange.

nordstrom-screen-shot

Keep on Pushing

One of the things that’s important to us at TellApart is engineering agility: we are constantly deploying new features, launching A/B experiments, and making system improvements. Yet, this goal is often at odds with maintaining a robust system with zero downtime, whilst handling tens of thousands of queries per second. To meet both of these needs, we’ve taken advantage of many of the Amazon Web Services (AWS) technologies to build a deployment process that takes about 10 minutes, rotates out a few hundred machines, and never drops a single request on the floor.

First, a little history. What were we doing before?

Our original deploy system was a rolling push in which a few instances were deregistered at a time, overlayed with new code, and then re-registered with the ELB. Because our frontend fleet was hundreds of machines in size, this would take quite a while, frequently more than 2 hours.

As mentioned above, our new deployment process – with the help of the robust AWS services – cut deploy time by 93%. AND we did all this while maintaining overall system stability.

So, how do we do it?

At a high level, the code deployer script brings up a parallel cluster of machines registered behind the ELB. The old machines are then removed from the ELB, and terminated. The end result is the new cluster of machines handling the traffic in a seamless manner with no downtime.

More specifically, here are the steps we follow. These steps are performed by a simple script that uses Boto extensively.

  • Create a release bundle with an archiver (git/tar)
  • Canary using a test instance
  • Bring up a single frontend with the new bundle, registered with the ELB
  • Manually vet the deployment to ensure correctness — here we rely heavily on Taba statistics
  • Push the deployment archive to S3 and tag it as a production archive

Bring up new instances through respective AutoScaling groups in each active region

  • Temporarily disable all scaling activities for the duration of the deployment (doc)
  • Create a list of old instances behind the AutoScaler running the old code (doc)
  • Bring up a parallel set of new instances: a system boot init script will pull the S3 code bundle, and start up the TellApart frontend
  • Rely on the ELB to send traffic to the new instances once they have been setup and healthy (The ELB is configured with an application level /ping path, which will return ‘HTTP 200′ when healthy)

Bring down the old instances in each AutoScaling group

  • Check for new instances behind the ELB to be in the ‘InService’ state.
  • Remove old instances from the AutoScaler (doc)
  • Reset the min and max size for the AutoScalers and resume AutoScaling

This manner of deployment works well at TellApart given other continuous build and testing systems that are deployed. Unit tests and system level tests are run using Hudson, and a few high level system tests are also conducted in parallel in a continuous manner. TellApart also uses Taba extensively to capture, aggregate and monitor statistics over the entire system to ensure that any possible regressions are rapidly caught and addressed.

Sanjay Jeyakumar is a Software Engineer at TellApart.

Taba: Low Latency Event Aggregation

Introduction

At TellApart we love metrics, and certainly have lots to measure. We track data for over 20,000 different event types across all parts of our stack (and that’s just the real-time data!). This information is used for all sorts of things, like monitoring, performance metrics, segmentation information, and feedback loops. In order to manage all this data, we created the Taba service (the name is derived from the Japanese for bundle or flux (束)).

In designing Taba, we had a number of goals:

  • Low latency: An event should be visible within seconds of occurring. This enables responsive monitoring, tight feedback loops, and other near-line applications.
  • Low impact: CPU and memory usage within the client applications should be minimized.
  • Durability: Events should be reasonably durable, so that a Taba “Tab” can be used as a basis of important services like monitoring.
  • Scalability: All components of the Taba service should be horizontally scalable to keep up with the applications it tracks.

Different Types

Each event type, or “Tab”, being tracked is identified by a Name, and a Type that maps to a Handler class. The core Taba service doesn’t implement data manipulation operations – all schemas and transformations are left to Handlers. By separating the schemas from the rest of the service, Tabs that behave differently can easily co-exist, and implementing new types of aggregation is simple.

Data Model

taba_data_model

The fundamental element of data in the Taba service is the Event. Events are composed of three pieces of data: the Tab, the time, and a value. Events are gathered to a central server (more on this later), and combined into State objects which are persisted in a database. The motivation behind persisting States instead of the individual Events is two-fold: (1) storing all events would consume far too much space (timestamps alone would consume gigabytes per hour), and (2) pre-calculated State objects mean faster query response times.

When the Taba service is queried, States are converted into Projections and/or Aggregates. A Projection is a reduction of the State into a dictionary, and an Aggregate is a combination of Projections. Depending on the query, a Projection or an Aggregate will be returned, optionally rendered by the Handler into a human-readable format.

Architecture

taba_architecture1

Applications generate Events by embedding a Taba Client. The Client briefly buffers Events and posts them to a Taba Agent over an HTTP based protocol. An Agent receives Events from multiple Clients (usually on the same machine) briefly buffers them, and posts them to the Taba Server using the same protocol. The Client + Agent scheme allows for a very simple and resource-light Client, putting more complex buffering and durability functions in a separate process. The Agent also helps performance by batching requests to the Server.

The Server is where the real magic happens. Like TellApart’s TAFE server, it’s a distributed Python service sitting behind an Nginx reverse proxy, and uses gevent for simple and fast co-operative concurrency. The Server itself is stateless, meaning you can launch as many as needed; the Agents will distribute requests across Servers. The Server receives Events and invokes the appropriate Handler to fold them into States, or generate Projections and Aggregates. A State object is maintained for each Client and Tab combination, so that the status of any Client can be queried.

We use Redis for the database, but with our own transparent sharding layer on top to overcome single process limitations. Sharding is based on virtual buckets, and supports a subset of Redis data-types, and transactions/locking. Like the Server, as many Redis processes as needed can be run, and re-sharding without downtime is possible.

In Production

Currently, we have the Taba service deployed in each region as a cluster of 2 Servers instances and 1 Database instance, each running 8 processes. Each cluster handles over 10,000,000 events per minute, from 300 individual Clients across 20,000 different Tabs, with an average latency of 30 seconds. The Taba architecture and data model have allowed us to use it for fine-grained monitoring, real-time feedback systems, and internal Dashboards. We’ve only started to integrate its full functionality, and already it has provided a deeper insight into TellApart’s system.

Kevin Ballard is a Software Engineer at TellApart.

Serving Up a Storm

When putting together a high-performance web serving framework, a number of different software and network components typically have to be considered. For starters, there are CDNs, load balancing, reverse proxies, the application web server, and backend components like the database. In this post, we’ll start by delving into our app server software – how TellApart handles a dynamic request with extremely low latency.

Most of our incoming traffic is made up of a large number of concurrent, short-lived requests, each of which must complete in tens of milliseconds. Our initial design was based on a well-worn server configuration that performed reasonably well right out of the box: the venerable Apache web server, with mod_wsgi for running Python application code.

When we first fired up the web server and unleashed some live traffic, we noticed something curious in the performance data. Here’s how this configuration fared at handling a representative I/O-bound request.

table5

Whoa! What’s up, 99th percentile?! In Apache/mod_wsgi, each request is handled in its own system thread. For instance, if three requests need to be handled concurrently, the OS is responsible for switching between them. So what’s the problem?

It turns out that in CPython, multithreading is subject to the limitations of the GIL. Request-handling threads occasionally enter “GIL battles” with other request-handling threads, burning CPU cycles and slowing everything down. David Beazley sets the record straight in a great series of presentations. Most Python web servers will have no trouble handling dozens of requests per second, but thread-based servers will strain when when attempting to handle hundreds or thousands.

We evaluated several alternatives that did not depend on threads for concurrency and eventually chose to replace Apache/mod_wsgi with a custom web server built around the excellent Gevent coroutine networking library. We call this server TAFE (“taffy”), the TellApart front end. Gevent handles each request using a lightweight thread-like structure called a greenlet (essentially, a coroutine). Unlike threads, greenlets must cooperatively yield control flow over to other greenlets, which frees the system from erratic GIL issues at the cost of a bit of increased development complexity.

Here’s how TAFE performed on the same workload:

table6

Much better. So, was it smooth sailing from here? Well, TAFE is definitely a big improvement, but giving up on system threads for concurrency means spending more time discovering code that doesn’t cooperatively yield. We wrote a Gevent Request Profiler to help us do just that. More on that next time.

Mark Ayzenshtat is TellApart’s co-founder and advisor.

Amazon Case Study Showcases TellApart Architecture

A big part of what sets TellApart apart is our technology. When we got started, we suspected that applying the right technical power tools to our clients’ large datasets would make all the difference. But we also knew that developing a clever algorithm or building an elegant system wouldn’t amount to much if we couldn’t put them into production, quickly, and at large scale. To that end, we quickly became enthusiastic proponents of Amazon Web Services — and last December, we were honored to be runner up (2nd of 1,500 entrants) in the 2010 AWS Startup Challenge.

Today, Amazon has released a new case study showcasing some of the cool things we’ve built atop AWS. Like a real-time bidding system that serves tens of thousands of request per second to hundreds of millions of end users. Or Tubes, our cross-region data replication system powered by SQS. These days, we use a veritable alphabet soup of AWS services and are always looking for new ways to integrate AWS into our stack.

The case study gives a good overview, but if you’re hungry for details, fret not — we’ll share much more about our architecture with readers of this blog in posts to follow. And if you like what you read and find these kinds of big data problems interesting…we’re hiring!