Why I Would Short Tesla

As a value investor, I am not a fan of Tesla. Here’s why.

Given that Tesla has no need for an Internal Combustion Engine, and Electric Motors are their bread and butter, I would have expected Tesla to be taking a lead on their research into batteries. That is evidently not the case. Roughly speaking, Tesla is in Google’s league when it comes to battery related patents – which presumably has to do with their Android operating system and Chromebooks.

And they are nowhere to be seen even where it counts – Electric Vehicles.

Nor are they taking a lead in research on even automobile chassis. Note that in the below graph, Google’s patents relate to computer chassis, not automobile chassis.


Based on this data, I would hesitate to even say that safety has taken a back seat. An understatement to say that it’s not their forte.





And with all that talk about driverless cars, and semi-automated driving, one would assume that at least some research would have been conducted in that area.


So the question is – what exactly are you paying for? And are the premiums worth it? Why is Tesla being valued as a software company, and not as an automobile company, especially when they don’t demonstrate any leadership in software? WWBGD?*


* What Would Benjamin Graham Do?

Share Button

Matplotlib and The Sopranos

So here’s how people died on The Sopranos. Unsurprisingly, gunshot wounds top the chart. Remove that, and the remaining is a close contest between ‘Blunt Force Trauma’ and ‘Natural Causes’, at 5 and 6 respectively.

How People Died on The Sopranos

And here are the prolific killers.

Killers on The Sopranos

While Tony Soprano and Christopher Moltisanti are tied at 5 each for solo kills, Moltisanti takes the lead when counting killings with other characters.


Share Button

Visualising Police Shootings

Police shootings have been in the news in the United States again, with two recent shootings by the police becoming headlines both nationally and internationally. There was then the tragic killing of police officers in Dallas, completely unrelated to the aforementioned two events.

The Washington Post curates data on this topic, hosting it on their Github repository. This has become a very politically charged topic, so I offer these visualisations without any comment. Make your own inferences.

First, a distribution of deaths from police shootings by race. In absolute numbers, ‘white’ victims dominate.

Distribution of Police Shooting Deaths, by Race

Source: Washington Post

Second, men are shot at a rate far higher than that of women.

Distribution of Police Shooting Deaths, by Gender

Source: Washington Post

Source: Washington Post

Putting the two together – Green representing males, Red representing females.

Some states contribute the most to these statistics. California, Texas and Florida are the top-3.

Distribution of Police Shooting Deaths, by State

Source: Washington Post

“Beaten”, “Shot and Tasered” and “Shot” are the three causes of death. Here is a distribution of the manner of death by race.

Distribution by Race: Manner of Death

Source: Washington Post

Amongst white victims, a very high number with signs of mental illness got tasered and shot.


Signs of Mental Illness, By Race and Gender

Source: Washington Post

Distribution by Race and Gender: Signs of Mental Illness

Source: Washington Post

A wide range of ages – but surprised to see some victims below 10 and over 75!

Distribution of Shooting Victims by Age and Race

Source: Washington Post

Distribution of Deaths by Gender and Race

Source: Washington Post

Seems like most shootings take place in the first half of the year, then taper off.

Monthly Distribution of Deaths, by Race

Source: Washington Post

As mentioned before, CA, TX and FL are in the top-3 when it comes to police shootings.

States with Highest Police Shooting Deaths

Source: Washington Post

Most victims appeared to be armed when shot. At the same time, ‘unarmed’ is third in the list.

Armed Status When Killed: All Victims

Source: Washington Post

Most Black victims had a gun or a knife in their possession when they were killed.

Armed Status When Killed: Black Victims

Source: Washington Post

Same story elsewhere.

Armed Status When Killed: White Victims

Source: Washington Post

Armed Status When Killed: Asian Victims

Source: Washington Post


Armed Status When Killed: Hispanic Victims

Source: Washington Post

And finally, some data in tabular form too. These numbers might not be totally accurate, since they’re based on the 2010 census, but it might give the reader some insight into the overall trends. The top 10 states for deaths by police shootings were different for Black, White, Asian and Hispanic people. The top-15 covered both Black and White victims, however, for those same states, the number of Asian and Hispanic deaths were quite low or non-existent.

Share Button

Brexit: Text Mining and Sentiment Analysis

The Brexit vote, or the referendum for United Kingdom’s withdrawal from the European Union took place today. I mined Twitter to see what themes were emerging out of the chatter. This was done using the ‘Sentiment’ package in R.

And a wordcloud – to see themes emerging from across 15,000 tweets.

Brexit Wordcloud


I also did a quick run of the news outlets. How are the themes emerging out of known left-leaning, centre-right, and right-leaning (British) news outlets different? And how are all these different (or similar) to international reporting on the topic?

Wordcloud: Right Leaning News Organisations

Wordcloud: Right Leaning News Organisations

Right-leaning publications include The Daily Mail and The Sun.

Wordcloud: Left Leaning News Organisations

Wordcloud: Left Leaning News Organisations


Left-leaning publications include The Guardian and The Independent.

Wordcloud: Centre-Right Leaning News Organisations

Wordcloud: Centre-Right Leaning News Organisations

Centre-right publications include The Telegraph and The Times.

And finally, international publications.

Wordcloud: International News Organisations

Wordcloud: International News Organisations

These include the New York Times, Washington Post and Der Spiegel. I suspect a lot of this would have changed since the assassination of Jo Cox, and it might even have slowed down or stopped the Brexit momentum in favour of the ‘Remain’ camp. I’d put the Brexit camp to be marginally ahead though.

Share Button

Hillary Clinton’s Emails

My colleague Eugene Kwak and I recently had some of our analysis featured on CNBC, titled “What data reveals about Hillary Clinton’s emails“.
We went through the publicly available data set hosted on Kaggle and the Wall Street Journal and it was a fun project dealing with some of the complexities of text analysis.

Anyway, here are some graphics that didn’t make it to the article.

Countries Most Frequently Mentioned by Hillary Clinton

Source: US Dept of State


And of course, I had to create a Word Cloud from emails mentioning Libya.

Source: US Dept of State

Source: US Dept of State


And the most commonly occurring words in email subjects.

Source: US Dept of State

Source: US Dept of State

Share Button

A Few Insights from Crunchbase

As part of my Master’s degree, I have been spending a considerable amount of time on a project related to startups and founders. Here are a few quick snapshots, from the Crunchbase API.

The Top-20 Schools list does contain a few duplicates, but I’ve retained it that way. Stanford vs Stanford GSB, and Harvard vs HBS as examples. It is interesting to note, however, that institutions from the US dominate. And as with many things in the world of data, the tail is incredibly long. Glad to see NYU right up there too!


Top 20 Schools Attended by Founders

Source: Crunchbase


The average amount raised during the various rounds is also interesting. It doubles from Series A to B, and then slows down slightly. The ‘Others’ category was a small set of funding rounds that weren’t tagged, so I put them in their category.

Source: Crunchbase

Source: Crunchbase

And who are the people funding these firms? Here are the top-25 firms.

Source: Crunchbase

Source: Crunchbase

Mind you, this is from Crunchbase’s database, although I have no reason to doubt its veracity. Nevertheless, I find it incredible that entities such as 500 Startups and Y-Combinator have really chalked up the numbers ahead of long standing VCs such as Accel, Sequoia and Andreessen Horowitz. The deal sizes at 500 Startups and Y-Combinator will of course be smaller, but they have managed to make entrepreneurship a lot more accessible.

Share Button

Notes from Certified ScrumMaster Class

I recently attended and completed the Certified ScrumMaster program from Scrum Alliance.

Having taken Project Management classes, read extensively about Lean, Kanban, TQM (and associated philosophies), and having had experience managing projects, most of the program wasn’t anything radically new to me.

The roots of all these approaches can be traced to the post-war manufacturing methodologies, which have since been adapted for applications ranging from consulting to software. The Scrum methodology is geared towards development in an environment of constant changes. The trainer for the course was a person who had worked on large enterprise software projects, so treated this idea as one coming from where customers changed their requirements mid-project, especially when the gestation periods were really long.

I however, saw this as having wide applicability in a startup environment – where incremental gains and frequent releases can tremendously reduce risk as well as time to market. Infact, Eric Ries’s book The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses touches upon this very topic in a very detailed manner, and I highly recommend it to anyone interest in the Lean/Agile philosophies, as applied to a startup environment.

I’ve attached below some notes I made during the two days.



Share Button

Clustering Scotch Whisky: Grouping Distilleries by k-Means Clustering

This was inspired from a section in the fantastic book “Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking“, one of my assigned readings, by Prof Foster Provost from NYU and Tom Fawcett.

Amongst the many interesting examples and challenges given were those of predicting wine quality from weather data and clustering whisky on the basis on flavour profiles. I downloaded the data from the Department of Mathematics and Statistics, University of Strathclyde’s website.

k-means Clusters

I first iterated through the data to find the optimum K for the k-means. There is quite a bit of discussion on the optimum number of clusters (k) in a k-means clustering, and these two are excellent links – Finding the K in K-Means Clustering and Pham et al at Columbia.

I settled at 6 clusters, after iterating through 2 to 10 clusters. This wasn’t a hard scientific measure per se, as I feel I could as easily have selected 5.

So what did these clusters reveal, and how accurate are they? Some interesting results for sure. Let us first get down to the data.

The dataset has data from 86 distilleries, and their flavour profiles are rated on a scale of 0 to 4, for 12 flavour profiles. These include Body, Floral, Fruity, Honey, Malty, Medicinal, Nutty, Smoky, Spicy, Sweetness, Tobacco and Winey.

Here is how these flavours correlate to each other:

Whisky Flavours – Correlation Plot

As we can see, some flavour pairs do appear to be highly correlated – smoky-medicinal, smoky-tobacco – while others tend to have an inverse relationship – such as body-floral, body-medicinal, medicinal-floral.

Cluster Dendograms of Flavours

A Cluster Dendogram of the flavours suggests that the strongest cluster amongst these could be the Smoky-Medicinal cluster, although I’m surprised Tobacco and Smoky don’t seem as close as one would imagine – both here in the Cluster Dendogram, as well as in the Correlation Plot.

The cluster sizes of the distilleries vary from 6 till 23.

Whisky Cluster - Size of Each Cluster

Whisky Cluster – Size of Each Cluster

As for the clusters, here they are:

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6
 Ardmore  Ardbeg  AnCnoc  Aberfeldy Balblair  Isle Of Arran
 Bowmore  Caol Ila  Auchentoshan  Aberlour Craigganmore  GlenDeveron/Macduff
 Bruichladdich  Clynelish  Aultmore  Auchroisk GlenGarioch  GlenElgin
Craigallechie  Lagavulin Benriach Balmenach GlenKeith  Glen Spey
GlenScotia  Laphroig Bladnoch Belvenie Glenmorangie Glenlivet
Highland Park  Talisker Bunnahabhain BenNevis Oban Glenrothes
Isle of Jura  Cardhu Benrinnes Royal Brackla  Glenturret
Old Fettercairn Dalwhinnie Benromach  Strathmill  Inchgower
Old Pulteney Deanston BlairAthol  Tamnavulin  Knochando
Springbank Dufftown Dailuaine  Teaninich  Linkwood
Tormore GlenGrant Dalmore  Tullibardine  Miltonduff
GlenMoray Edradour  Speyburn
Glenallachie GlenOrd  Tomatin
Glenfiddich Glendronach  Tomintoul
Glengoyne Glendullan
Glenkinchie Glenfarclas
Glenlossie Longmorn
Loch Lomond Macallan
Mannochmore Mortlach
Scapa Royal Lochnagar
Speyside Strathisla

Some interesting clusters there. Personally, I’m most interested in Cluster 2 – the smoky single malts section. And I’ve had a Single Malt from each one of these distilleries (bar the Clynelish), so I can attest that they are indeed very smoky and medicinal. What about Clynelish though? Master of Malt lists a 14-year old Clynelish as “smoky” and “coastal”, pretty much how I would describe a Laphroaig 18 year old or a Lagavulin 16 year old.

There are still quite a few distilleries that I am not familiar with, so I will just have to dig in further to explore if my clustering is reasonably accurate. If I get further time, I am going to build a whisky-recommendation system, which does seem like a fun project.

Share Button

NYC Restaurant Inspections in Python and Matplotlib

The NYC Open Data site is a treasure trove for anyone looking for large data sets.

There are data sets on Education, Housing, Health, Safety and many other areas of functioning of the local government. One such interesting data set is the Restaurant Inspection Results. It’s an exhaustive list of restaurants in the New York region, along with the grades and comments about the kinds of violations, if any. And there are plenty of violations.

The database lists 490,055 observations resulting from the inspection of 21090 restaurants (on the date it was last accessed) and not all of it is pretty reading.

First a breakdown of the number of restaurants in each borough.

There are 22 rows with missing borough data, which have not been included in the chart.
Manhattan leads by pure numbers here, and as we will see below, leads in the number of violations too.

So what are the most common health code violations? Here are the Top-10.

The Top-10 Violations by Restaurants in NYC

There are plenty of rat and other vermin related violations, as well as over 26k instances of faulty drainage and sewage disposal systems.

As for rats, here’s a breakdown by borough.

Rat and Mice Related Violations

That’s a lot of Pussy Bonpensieros in one town! I now pay extra attention to the rating displayed outside every restaurant.

Share Button

What’s your beef with it?

Earlier in June this year, the New York Times carried an opinion piece by Sonia Faleiro titled ‘Saving the Cows, Starving the Children‘.

As is typical for Op-eds and opinions in the NYT or The Guardian about India or Indians, a distinctly left-of-centre voice was selected. Nothing fundamentally wrong with that except that there appears to be no balance whatsoever, and Indian authors aspiring to a socialist utopia get a lions share of oped space in Western newspapers. Is someone appreciating free-er markets, economic and policy reforms, and/or less government in the lives of citizens an anathema to these publications? I will not go further into the politics of it though – I will focus on the article itself and deconstruct some of its assertions to see if they hold any merit. Emphasis added is mine.

eggs — a superfood that is about 10 percent fat and extremely high in protein

What exactly is a superfood? As per the British Dietetic Association:

It is simply a marketing term that has become trendy over the last few years. Companies and marketing teams will often put whatever they can on a label to hook you into a purchase.

Many claims can give us false expectations of the benefits or they aren’t fully substantiated.

And in this instance, it is an egg which is a superfood. Evidently, a piece of hyperbole to sell her point of view further. More on superfoods  here and here.
How do eggs compare to other foods? I scraped the USDA Food Database for their list of over 8000 foods.

The average protein content in a boiled egg is 12.58 gms per 100 gms. So what are some of the food items that have a greater protein content per 100 grams?

Dried egg white and its variants – which makes sense because that would be nearly pure Albumin, with an exceedingly high bio-availability, but also very expensive. Also, soy protein isolate, pork skins, peanut flour, or beluga whale if you want to go whaling in Alaska. Quite a few of these aren’t really practical or available in India, especially for a program run by the government.

So what items could potentially be used in government food programs, either as is, or as raw materials for cooking? Some options.

Food Item Protein Content (per 100g)
Soybeans, mature,seeds, dry roasted 39.58
Milk, dry, nonfat,,regular, without added vitamin A and vitamin D 36.16
Peanut flour, low,fat 33.8
Peas, green, split,,mature seeds, raw 23.82
Soybeans, mature,cooked, boiled, without salt 16.64
Cheese, feta 14.21

Even Feta Cheese – which is extremely close to the Indian Paneer scores higher than the egg.

I hope this partially answers Faleiro’s question as to what the alternatives to eggs are. I’m basing this on the assumption that government run programs, while aimed at increasing the nutrition content in a child’s daily food intake must fulfil the following criterion:

  1. They must be available/sourced locally (to cut down on transportation costs and wastage)
  2. Types of food distributed must be minimised, so as to reduce ordering and inventory costs. Hence no pork sausages or bacon rinds keeping in mind religious sensibilities of a minority. Thus, mostly vegetarian food that can be consumed by all.

As it is, eggs are going to be an increasingly expensive commodity, due to the rise in input costs. There has been a global increase in feed prices for poultry. Donohue and Cunningham write in the Journal of Applied Poultry Research about the rise of input costs for US poultry firms , and the FAO speaks about the global rise too, stating that “almost all developing countries are net importers of these ingredients; the poultry feed industries in Africa and Asia depend on imports, which are a drain on their foreign exchange reserves. ”

Image Source: J Appl Poult Res (2009) 18 (2): 325-337. doi: 10.3382/japr.2008-00134


Next, Faleiro goes on to make another statement, with regards to the recent ban on the slaughter of cows.

Another staple food was taken from the plates of the poor…beef.

Beef a staple food in India? Statements like these, with no basis in fact, leave even a hardcore carnivore like me incredulous. Can we reasonably define ‘staple food’ though? We have the FAO to help us out on this once again.

A staple food is one that is eaten regularly and in such quantities as to constitute the dominant part of the diet and supply a major proportion of energy and nutrient needs.

The organisation states the staple foods in the Indian region to be banana, bean, chick-pea, citrus, cucumber, eggplant, mango, mustard, rice and sugar cane, although this isn’t an exhaustive list in my opinion. Nevertheless, the economics of meat production simply don’t allow meat to be a staple food in India, and especially not for the poorer sections of society.

Faleiro further asserts that:

Beef, unlike mutton and chicken, is cheap.

Let us inspect the evidence on that.

Source: FAO


This is a representation of the Food Price Index from FAO, versus Meat Prices. Meat Prices consist of different forms of beef, chicken, pork and turkey. This chart does make some sense though, given the recent burst in commodity prices for soy, corn etc.
Let’s dig deeper. First some data from the USDA.

Beef vs Pork Prices Source: FAO

These are for prices in the US, where absolute as well as per capita consumption of beef is amongst the highest in the world. As you can see, except for a brief period in the 70s, beef has always been more expensive than pork, even in a country like the US where consumption and production of beef is so high. What about the commodity markets? We have data from the IMF on this.

Beef vs Lamb vs Pork: Commodity Prices Source: IMF

Cows(and bulls) were never quite on the meat market in India as it is, and most meat eaters ordering a steak at a restaurant would know that they were going to eat a buffalo steak, which still is available incidentally – both in restaurants as well as in the wholesale market. Alibaba has a repository of such suppliers. Unless of course, if Faleiro’s contention is that Indian commodity prices are an island, and that they are significantly cheaper than global prices, in which case she has uncovered a massive global arbitrage opportunity.

Next, we are informed that many Indians impose their food choices on children.

In India you are what you eat, and devotion to strict vegetarianism is a trait common to many upper-caste Hindus. Some wield their diet like a badge of their status. Others demand that people around them — like children and household staff members — eat as they do to maintain the purity of their kitchens. They will not visit restaurants that also serve nonvegetarian food for fear of being polluted.

What a travesty indeed! I have heard that some parents also force their children to not have too much candy, force them to drink milk and to eat broccoli.
Jokes apart, I don’t understand the point Ms Faleiro is trying to make. Are children supposed to be making independent food choices now, separate from the judgement and/or values of the parents? If so, why stop at just food choices? Why not let 5 year olds decide what school they want to go to, and how late to stay up at night, or when to start taking out the car?

By the same token, are all those vegan parents in sunny California committing some kind of crime by bringing up their kids in a vegan home? Are Jewish parents feeding their kids Kosher food culinary criminals? Are Muslim parents preparing Halal food at home some kind of insidious racists?

I am not one bit surprised by the quality of Sonia Faleiro’s article. Journalism in India isn’t a profession where ethics, quality or standards are expected or demanded – it’s a nepotistic, mutual backslapping club.

I do however, continue to be surprised by the likes of NYT and the Guardian promoting authors/journalists with this bent. Should India be known to NYT readers only via the word of polemics? To draw a parallel, it would be as if Americans were to be known to the rest of the world only via the words of Chomsky, Malcolm X and Michael Moore. They are important voices, but not the only voices.

To those interested, my R script and files can be access on Github here.

Note: A modified version of this post was reproduced on Indiafacts.co.in by their team.

Share Button