Hillary Clinton’s Emails

My colleague Eugene Kwak and I recently had some of our analysis featured on CNBC, titled “What data reveals about Hillary Clinton’s emails“.
We went through the publicly available data set hosted on Kaggle and the Wall Street Journal and it was a fun project dealing with some of the complexities of text analysis.

Anyway, here are some graphics that didn’t make it to the article.

Countries Most Frequently Mentioned by Hillary Clinton

Source: US Dept of State


And of course, I had to create a Word Cloud from emails mentioning Libya.

Source: US Dept of State

Source: US Dept of State


And the most commonly occurring words in email subjects.

Source: US Dept of State

Source: US Dept of State

Share Button

A Few Insights from Crunchbase

As part of my Master’s degree, I have been spending a considerable amount of time on a project related to startups and founders. Here are a few quick snapshots, from the Crunchbase API.

The Top-20 Schools list does contain a few duplicates, but I’ve retained it that way. Stanford vs Stanford GSB, and Harvard vs HBS as examples. It is interesting to note, however, that institutions from the US dominate. And as with many things in the world of data, the tail is incredibly long. Glad to see NYU right up there too!


Top 20 Schools Attended by Founders

Source: Crunchbase


The average amount raised during the various rounds is also interesting. It doubles from Series A to B, and then slows down slightly. The ‘Others’ category was a small set of funding rounds that weren’t tagged, so I put them in their category.

Source: Crunchbase

Source: Crunchbase

And who are the people funding these firms? Here are the top-25 firms.

Source: Crunchbase

Source: Crunchbase

Mind you, this is from Crunchbase’s database, although I have no reason to doubt its veracity. Nevertheless, I find it incredible that entities such as 500 Startups and Y-Combinator have really chalked up the numbers ahead of long standing VCs such as Accel, Sequoia and Andreessen Horowitz. The deal sizes at 500 Startups and Y-Combinator will of course be smaller, but they have managed to make entrepreneurship a lot more accessible.

Share Button

Notes from Certified ScrumMaster Class

I recently attended and completed the Certified ScrumMaster program from Scrum Alliance.

Having taken Project Management classes, read extensively about Lean, Kanban, TQM (and associated philosophies), and having had experience managing projects, most of the program wasn’t anything radically new to me.

The roots of all these approaches can be traced to the post-war manufacturing methodologies, which have since been adapted for applications ranging from consulting to software. The Scrum methodology is geared towards development in an environment of constant changes. The trainer for the course was a person who had worked on large enterprise software projects, so treated this idea as one coming from where customers changed their requirements mid-project, especially when the gestation periods were really long.

I however, saw this as having wide applicability in a startup environment – where incremental gains and frequent releases can tremendously reduce risk as well as time to market. Infact, Eric Ries’s book The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses touches upon this very topic in a very detailed manner, and I highly recommend it to anyone interest in the Lean/Agile philosophies, as applied to a startup environment.

I’ve attached below some notes I made during the two days.



Share Button

Clustering Scotch Whisky: Grouping Distilleries by k-Means Clustering

This was inspired from a section in the fantastic book “Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking“, one of my assigned readings, by Prof Foster Provost from NYU and Tom Fawcett.

Amongst the many interesting examples and challenges given were those of predicting wine quality from weather data and clustering whisky on the basis on flavour profiles. I downloaded the data from the Department of Mathematics and Statistics, University of Strathclyde’s website.

k-means Clusters

I first iterated through the data to find the optimum K for the k-means. There is quite a bit of discussion on the optimum number of clusters (k) in a k-means clustering, and these two are excellent links – Finding the K in K-Means Clustering and Pham et al at Columbia.

I settled at 6 clusters, after iterating through 2 to 10 clusters. This wasn’t a hard scientific measure per se, as I feel I could as easily have selected 5.

So what did these clusters reveal, and how accurate are they? Some interesting results for sure. Let us first get down to the data.

The dataset has data from 86 distilleries, and their flavour profiles are rated on a scale of 0 to 4, for 12 flavour profiles. These include Body, Floral, Fruity, Honey, Malty, Medicinal, Nutty, Smoky, Spicy, Sweetness, Tobacco and Winey.

Here is how these flavours correlate to each other:

Whisky Flavours – Correlation Plot

As we can see, some flavour pairs do appear to be highly correlated – smoky-medicinal, smoky-tobacco – while others tend to have an inverse relationship – such as body-floral, body-medicinal, medicinal-floral.

Cluster Dendograms of Flavours

A Cluster Dendogram of the flavours suggests that the strongest cluster amongst these could be the Smoky-Medicinal cluster, although I’m surprised Tobacco and Smoky don’t seem as close as one would imagine – both here in the Cluster Dendogram, as well as in the Correlation Plot.

The cluster sizes of the distilleries vary from 6 till 23.

Whisky Cluster - Size of Each Cluster

Whisky Cluster – Size of Each Cluster

As for the clusters, here they are:

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6
 Ardmore  Ardbeg  AnCnoc  Aberfeldy Balblair  Isle Of Arran
 Bowmore  Caol Ila  Auchentoshan  Aberlour Craigganmore  GlenDeveron/Macduff
 Bruichladdich  Clynelish  Aultmore  Auchroisk GlenGarioch  GlenElgin
Craigallechie  Lagavulin Benriach Balmenach GlenKeith  Glen Spey
GlenScotia  Laphroig Bladnoch Belvenie Glenmorangie Glenlivet
Highland Park  Talisker Bunnahabhain BenNevis Oban Glenrothes
Isle of Jura  Cardhu Benrinnes Royal Brackla  Glenturret
Old Fettercairn Dalwhinnie Benromach  Strathmill  Inchgower
Old Pulteney Deanston BlairAthol  Tamnavulin  Knochando
Springbank Dufftown Dailuaine  Teaninich  Linkwood
Tormore GlenGrant Dalmore  Tullibardine  Miltonduff
GlenMoray Edradour  Speyburn
Glenallachie GlenOrd  Tomatin
Glenfiddich Glendronach  Tomintoul
Glengoyne Glendullan
Glenkinchie Glenfarclas
Glenlossie Longmorn
Loch Lomond Macallan
Mannochmore Mortlach
Scapa Royal Lochnagar
Speyside Strathisla

Some interesting clusters there. Personally, I’m most interested in Cluster 2 – the smoky single malts section. And I’ve had a Single Malt from each one of these distilleries (bar the Clynelish), so I can attest that they are indeed very smoky and medicinal. What about Clynelish though? Master of Malt lists a 14-year old Clynelish as “smoky” and “coastal”, pretty much how I would describe a Laphroaig 18 year old or a Lagavulin 16 year old.

There are still quite a few distilleries that I am not familiar with, so I will just have to dig in further to explore if my clustering is reasonably accurate. If I get further time, I am going to build a whisky-recommendation system, which does seem like a fun project.

Share Button

NYC Restaurant Inspections in Python and Matplotlib

The NYC Open Data site is a treasure trove for anyone looking for large data sets.

There are data sets on Education, Housing, Health, Safety and many other areas of functioning of the local government. One such interesting data set is the Restaurant Inspection Results. It’s an exhaustive list of restaurants in the New York region, along with the grades and comments about the kinds of violations, if any. And there are plenty of violations.

The database lists 490,055 observations resulting from the inspection of 21090 restaurants (on the date it was last accessed) and not all of it is pretty reading.

First a breakdown of the number of restaurants in each borough.

There are 22 rows with missing borough data, which have not been included in the chart.
Manhattan leads by pure numbers here, and as we will see below, leads in the number of violations too.

So what are the most common health code violations? Here are the Top-10.

The Top-10 Violations by Restaurants in NYC

There are plenty of rat and other vermin related violations, as well as over 26k instances of faulty drainage and sewage disposal systems.

As for rats, here’s a breakdown by borough.

Rat and Mice Related Violations

That’s a lot of Pussy Bonpensieros in one town! I now pay extra attention to the rating displayed outside every restaurant.

Share Button

What’s your beef with it?

Earlier in June this year, the New York Times carried an opinion piece by Sonia Faleiro titled ‘Saving the Cows, Starving the Children‘.

As is typical for Op-eds and opinions in the NYT or The Guardian about India or Indians, a distinctly left-of-centre voice was selected. Nothing fundamentally wrong with that except that there appears to be no balance whatsoever, and Indian authors aspiring to a socialist utopia get a lions share of oped space in Western newspapers. Is someone appreciating free-er markets, economic and policy reforms, and/or less government in the lives of citizens an anathema to these publications? I will not go further into the politics of it though – I will focus on the article itself and deconstruct some of its assertions to see if they hold any merit. Emphasis added is mine.

eggs — a superfood that is about 10 percent fat and extremely high in protein

What exactly is a superfood? As per the British Dietetic Association:

It is simply a marketing term that has become trendy over the last few years. Companies and marketing teams will often put whatever they can on a label to hook you into a purchase.

Many claims can give us false expectations of the benefits or they aren’t fully substantiated.

And in this instance, it is an egg which is a superfood. Evidently, a piece of hyperbole to sell her point of view further. More on superfoods  here and here.
How do eggs compare to other foods? I scraped the USDA Food Database for their list of over 8000 foods.

The average protein content in a boiled egg is 12.58 gms per 100 gms. So what are some of the food items that have a greater protein content per 100 grams?

Dried egg white and its variants – which makes sense because that would be nearly pure Albumin, with an exceedingly high bio-availability, but also very expensive. Also, soy protein isolate, pork skins, peanut flour, or beluga whale if you want to go whaling in Alaska. Quite a few of these aren’t really practical or available in India, especially for a program run by the government.

So what items could potentially be used in government food programs, either as is, or as raw materials for cooking? Some options.

Food Item Protein Content (per 100g)
Soybeans, mature,seeds, dry roasted 39.58
Milk, dry, nonfat,,regular, without added vitamin A and vitamin D 36.16
Peanut flour, low,fat 33.8
Peas, green, split,,mature seeds, raw 23.82
Soybeans, mature,cooked, boiled, without salt 16.64
Cheese, feta 14.21

Even Feta Cheese – which is extremely close to the Indian Paneer scores higher than the egg.

I hope this partially answers Faleiro’s question as to what the alternatives to eggs are. I’m basing this on the assumption that government run programs, while aimed at increasing the nutrition content in a child’s daily food intake must fulfil the following criterion:

  1. They must be available/sourced locally (to cut down on transportation costs and wastage)
  2. Types of food distributed must be minimised, so as to reduce ordering and inventory costs. Hence no pork sausages or bacon rinds keeping in mind religious sensibilities of a minority. Thus, mostly vegetarian food that can be consumed by all.

As it is, eggs are going to be an increasingly expensive commodity, due to the rise in input costs. There has been a global increase in feed prices for poultry. Donohue and Cunningham write in the Journal of Applied Poultry Research about the rise of input costs for US poultry firms , and the FAO speaks about the global rise too, stating that “almost all developing countries are net importers of these ingredients; the poultry feed industries in Africa and Asia depend on imports, which are a drain on their foreign exchange reserves. ”

Image Source: J Appl Poult Res (2009) 18 (2): 325-337. doi: 10.3382/japr.2008-00134


Next, Faleiro goes on to make another statement, with regards to the recent ban on the slaughter of cows.

Another staple food was taken from the plates of the poor…beef.

Beef a staple food in India? Statements like these, with no basis in fact, leave even a hardcore carnivore like me incredulous. Can we reasonably define ‘staple food’ though? We have the FAO to help us out on this once again.

A staple food is one that is eaten regularly and in such quantities as to constitute the dominant part of the diet and supply a major proportion of energy and nutrient needs.

The organisation states the staple foods in the Indian region to be banana, bean, chick-pea, citrus, cucumber, eggplant, mango, mustard, rice and sugar cane, although this isn’t an exhaustive list in my opinion. Nevertheless, the economics of meat production simply don’t allow meat to be a staple food in India, and especially not for the poorer sections of society.

Faleiro further asserts that:

Beef, unlike mutton and chicken, is cheap.

Let us inspect the evidence on that.

Source: FAO


This is a representation of the Food Price Index from FAO, versus Meat Prices. Meat Prices consist of different forms of beef, chicken, pork and turkey. This chart does make some sense though, given the recent burst in commodity prices for soy, corn etc.
Let’s dig deeper. First some data from the USDA.

Beef vs Pork Prices Source: FAO

These are for prices in the US, where absolute as well as per capita consumption of beef is amongst the highest in the world. As you can see, except for a brief period in the 70s, beef has always been more expensive than pork, even in a country like the US where consumption and production of beef is so high. What about the commodity markets? We have data from the IMF on this.

Beef vs Lamb vs Pork: Commodity Prices Source: IMF

Cows(and bulls) were never quite on the meat market in India as it is, and most meat eaters ordering a steak at a restaurant would know that they were going to eat a buffalo steak, which still is available incidentally – both in restaurants as well as in the wholesale market. Alibaba has a repository of such suppliers. Unless of course, if Faleiro’s contention is that Indian commodity prices are an island, and that they are significantly cheaper than global prices, in which case she has uncovered a massive global arbitrage opportunity.

Next, we are informed that many Indians impose their food choices on children.

In India you are what you eat, and devotion to strict vegetarianism is a trait common to many upper-caste Hindus. Some wield their diet like a badge of their status. Others demand that people around them — like children and household staff members — eat as they do to maintain the purity of their kitchens. They will not visit restaurants that also serve nonvegetarian food for fear of being polluted.

What a travesty indeed! I have heard that some parents also force their children to not have too much candy, force them to drink milk and to eat broccoli.
Jokes apart, I don’t understand the point Ms Faleiro is trying to make. Are children supposed to be making independent food choices now, separate from the judgement and/or values of the parents? If so, why stop at just food choices? Why not let 5 year olds decide what school they want to go to, and how late to stay up at night, or when to start taking out the car?

By the same token, are all those vegan parents in sunny California committing some kind of crime by bringing up their kids in a vegan home? Are Jewish parents feeding their kids Kosher food culinary criminals? Are Muslim parents preparing Halal food at home some kind of insidious racists?

I am not one bit surprised by the quality of Sonia Faleiro’s article. Journalism in India isn’t a profession where ethics, quality or standards are expected or demanded – it’s a nepotistic, mutual backslapping club.

I do however, continue to be surprised by the likes of NYT and the Guardian promoting authors/journalists with this bent. Should India be known to NYT readers only via the word of polemics? To draw a parallel, it would be as if Americans were to be known to the rest of the world only via the words of Chomsky, Malcolm X and Michael Moore. They are important voices, but not the only voices.

To those interested, my R script and files can be access on Github here.

Note: A modified version of this post was reproduced on Indiafacts.co.in by their team.

Share Button

How to Make Alcohol Infused Cigars

One of the things I love to do occasionally with my cigars is to infuse them with alcohol – Rum, Single Malt Whisky, Cognac etc. Mind you, this is different from having a flavoured cigar, which is most likely to be flavoured using artificial products. Au contraire, this is a process of letting the cigars do what they do naturally – absorb the quality of the environment they are stored in.

So how do you go about it?

Laphroaig Quarter Cask

I only use cheap cigars for these experiments. The usual suspects here include Cheap Bastard Coronas, Quorum Torpedos or Coronas, and the like.First, you need some wonderful alcohol. I usually do this with peaty single malts such as Laphroaig, Talisker or Smokehead. My past experiments have also included Remy Martin (Cognac) or Ron Zacapa and Flor de Caña (Rum).

I wouldn’t want to do this with premium cigars, going by the belief that a truly premium cigar is a work of art, hand rolled by a master who knew what blend to use for the kind of flavour profile he or she was going for (Yes, he or she – you will be surprised at how many torcedoras work in the industry). Besides, if I ruin a cheap cigar, it doesn’t hurt that much.

Next, you need about 30ml of the alcohol you are using – Laphoraig Quarter Cask in my case.

A shot of Laphroaig Quarter Cask

Don’t over do the alcohol, 30 ml should be good enough for a batch of upto 10 cigars. Remember, this is infusion, not immersion.

Some Cheap Cigars – Cheap Bastard Corona and Camacho Corojo Machito for example

Now add 60 ml of water, place the shot glass and cigars into a Tupperware container, and seal it off. Handle this very carefully, lest you spill the alcohol and water solution onto the cigars, and keep the container away from sunlight or a source of heat. Ideally, keep them there for a good 4 to 6 months, although I have seen acceptable results even at around 2 months.My objective from such infusions is to impart a certain flavour or character to an otherwise ordinary cigar, and in such cases, less is more. I don’t want to taste just peat or rum, I do want to taste the original blend too, no matter how ordinary.

By the way, do not ever try this in your humidor, unless if you want your humidor to forever impart these flavours to the cigars you store in them. The cedar in the humidor will absorb a peaty single malt or cognac and retain it for an exceedingly long time. Plastic jars on the other hand, are disposable. You can always repurpose them, or at worst, recycle them.Like with everything about cigars, you need to have patience here too. It will take its time, but the end result will be worth it.

Similar to alcohol infusion, can also do this with coffee beans or vanilla pods, although that isn’t something I personally would endorse.


A shot of Laphroaig Quarter Cask

Leave your cigars in the tupperware jar with a Boveda Humidifier, some partially crushed coffee beans or vanilla pods, and keep it sealed for 4 to 6 months. And there you have it – home made, alcohol infused cigars!

July Update: I took out the cigars after about 4 months. The shot glass had almost completely evaporated, and the cigars seems like they had just been dipped in a bucket of water.

The smell, however, was divine. Laphroaig with it’s smoky smell really adds to the tobacco. They aren’t ready to be smoked yet, so I will wait a couple of weeks more before I take on of them out and light them up. I have kept them in a separate container still, and I will make sure they are wrapped in cellophane before I put them back into my main humidor. Not because I think they will change the flavour characteristics of other cigars, but because I don’t want my humidor to retain the aroma of an infused cigar.

Repost from April 2, 2014

Share Button

Montecristo No. 4 Petit Corona Review

The Montecristo No. 4 (on the right) next to a Vegafina Corona

The Montecristo No.4 is one of the more recognised, and widely available Cubans. After all, it is the world’s largest selling cigar according to some sources.

The wrapper is a beautiful, oily, maduro with a wonderful woody aroma. These being produced en masse, it is possible to find poorly constructed No. 4s often, so I usually inspect each one before purchasing it. This particular cigar though, was definitely a piece of art, very well constructed and definitely well transported and stored. A Cuban worthy of being called a Cuban!

The pre-light draw on this cigar was on the tighter side, which wasn’t surprising to me. I have often found the draw on Montecristos to be on the tighter side of the spectrum, and it’s just one of their characteristics that sets them apart.

Again, a cocoa and woody aroma dominated. After toasting it, the flavour profile remained the same although it had a bit of a kick in the finish. Lots of spice there, a very distinct peppery finish which lasted for a while on the palette.

Into the second third of the cigar, it felt more to the right of medium to full bodied, with a lot of thick smoke. The burn was razor sharp and the ash held on for a good inch and a half to two inches. There were definite cocoa and coffee notes in there now, and the peppery finish was a constant.

Razor sharp burn on the Montecristo No. 4

The final third of the cigar left me greedy for more. The peppery finish was much shorter now, although the coffee and wood notes were more prominent, and I was thinking to myself that this could be a great cigar to have along with a cup of coffee.

From my experience with the Montecristo No. 1, I knew that pacing oneself on this brings out the flavours significantly. So here, I puffed about once every minute and a half, and it lasted me about seventy five minutes which is great for a Petit Corona. As an added advantage, the cooler smoke also prevented any bitterness from developing.

Overall, this is a wonderful smoke and well worth the money.

The next time I have this, I will perhaps try and pair it with some Glenfiddich 12 Year old or perhaps a Jura 10. Those lighter malts could be perfect to pair with the Montecristo No. 4.

Cigar Rating: 91

Appearance and Construction: 13/15
Flavour: 23/25
Smoking Characteristics: 23/25
Overall Impression: 32/35

Repost from March 29, 2014

Share Button

Romeo Y Julieta Romeo No.1 Tubo 5 1/2 X 40 Review

This is a cigar that had been lying in my humidor since February 2012.

I had bought 15 tubos, some as a gift to a friend, and some to go into my humidor for later use. This was the last of the remaining tubos, one I had almost forgotten about.

Romeo No.1 Tubo 5 1/2 X 40

I was hoping that two years in the humidor would have really mellowed this one down. From my memory of the previous tubos in the set that I had tasted, I could remember some very woody notes, but my palette has developed over the years and I am also a lot more experimentative with my cigars.

This was a well constructed cigar, with no visible veins, and very smooth oily wrapper. All that time in the humidor must have brought out the shine! The pre-light flavour seemed to be very earthy, with hints of chocolate and espresso.

On lighting it, the draw was smooth with a decent amount of smoke. The finish was slightly peppery and, on the shorter side. I did notice some uneven burn, but it corrected itself about an inch and a half into the cigar, with the ash being a very light grey colour.
Into the 2/3rd of the cigar, the woody notes became more prominent and it remained so till the very end with the overall flavour profile remaining earthen – woody mostly, hints of coffee at times with a touch of pepper on the finish.

The Romeo No. 1 is a medium bodied, nice and easy smoke and isn’t going to distract you with its wealth and complexity of flavours, but I really don’t think it was meant to be. That being said, this is a good smoke, especially if you do leave it in a humidor for a while.

Cigar Rating: 82

Appearance and Construction: 12/15
Flavour: 20/25
Smoking Characteristics: 20/25
Overall Impression: 30/35

Repost from March 24, 2014

Share Button

Armenteros Cigars: Corona 5 5/8 X 46 Review

I was recently at an event hosted at a cigar club, where the whole Armenteros range of cigars was presented. From the available range, I opted to smoke the Corona (5 5/8 X 46), and pocketed the Churchill for later.

Armenteros Corona 5 5/8 inches X 46

The first thing I noticed is how soft the cigar was. While this was not necessarily a deal breaker by any means, what struck me was that the overall construction did not seem like the “perfect option for the cigar connoisseur”, as claimed by the company. The seams were very visible, and the cigar itself was toothy with a few minor blemishes.

Given how soft the cigar was, I used my cigar punch rather than the cutter the lady from Armenteros had. The pre-light draw was very easy, almost as if sucking through an empty straw, and the flavour profile was very earthy, perhaps a hint of dark chocolate. There was no other flavour I could discern.

Once lit, the flavour profile didn’t change much, still the same earthy notes, with some dark chocolate and leather notes. It actually reminded me of the Cheap Bastard Corona that I had smoked about a month ago – the Armenteros Corona wasn’t a very pleasant smoke, atleast not so in the first 1/3rd.

Barely 5 minutes into the cigar, the burn line started getting awry and it stayed that way throughout. I also noticed that there was some tunnelling happening, with the area about an inch behind the burn line getting incredibly hot. I had to touch it up significantly around the 15 minute mark, this cigar was firmly heading into the ‘dog’ territory.

The smoke was harsh, devoid of any flavour with inconsistent burn and horrible construction. By the second third of the cigar, the wrapper and binder were coming apart! I showed it to the lady from ITC (the company that owns the Armenteros Cigars brand), and she could just shrug.

I persevered with it, hoping against hope that it would improve in the last third of the cigar, but I never got around to that. The cigar unravelled eventually with the wrapper and binder falling apart completely, and I just had to throw it away. There were quite a few first time cigar enthusiasts at the event, and if this is the quality of cigars that ITC is going to push into India under the guise of ‘premium cigars’, I doubt there will be many takers. Perhaps this is why, cigars are best left to family run small enterprises rather than Big Tobacco firms. I have the Armenteros Churchill 7X47 with me still, and I am not going to smoke another Armenteros anytime soon, so it will remain in my humidor for the time being.

Armenteros Corona Cigars 5 5/8 X46 : Uneven Burn

Cigar Rating: 30

Appearance and Construction: 5/15

Flavour: 10/25

Smoking Characteristics: 5/25

Overall Impression: 10/35



I took the Armenteros Churchill (7X47 ) out of my humidor and photographed it, and right away, I could see some construction issues consistent with the Corona. Also surprised at how soft the cigar was. To me, this looks like inferior workmanship. And that is fine, so long as the prices reflect that reality.

Sadly, these cigars are positioned in India as premium cigars, and everything from the external packaging to prime retail placement and price points lead us to that direction. However, I wish the quality was up atleast to the level of a Quorum or a Cheap Bastard. The Armenteros Cigars range is priced at roughly $15-22 per stick, and that to me is a blatant rip off.

Repost from 24 February 2014

Share Button