Photo by Annie Spratt on Unsplash

From Start-up to Unicorn: Analysing the Journey

Josh Jameson
9 min readAug 15, 2023

--

For this SQL project, I chose to analyse a dataset showing details of all companies which have achieved “unicorn” status — a $1 billion valuation. This turned out to be a very interesting dataset which definitely provides a lot of opportunity for further investigation.

As well as using some new SQL functions as part of this project, this was my first time using Microsoft’s SQL Server Management Studio (SSMS) which allowed me to query the dataset on my own computer rather than in an online query editor. This came with a few curveballs as the syntax used in SSMS is quite different from what I’d learned before in some cases but this provided some great learning opportunities and is the whole point of completing these projects after all!

The Data:

This dataset is up to date as of March 2022 and includes 1,074 companies. Columns of interest include:

  • Company — Company name.
  • Valuation — Company valuation in billions of dollars.
  • Date joined — The date on which the company reached a $billion valuation.
  • Industry — The industry the company operates in.
  • City, Country, Continent — The city, country, and continent where the company was founded.
  • Year Founded.
  • Funding — Total amount raised across all funding rounds in billions or millions of dollars.

The original dataset can be downloaded from the Maven Analytics Data Playground. Before importing the data into SSMS I carried out some preliminary data cleaning in Excel. I made the following changes, amongst others:

  • The data came with “B” or “M” included in the valuation and funding amounts, which means it was treated as a text value rather than an integer. I used a Find & Replace to remove the letters and then formatted all values into billions rather than millions.
  • The “Date Joined” data was in DD/MM/YYYY format. I separated this into a month and year column so that I could interact with the dates easily, without having to unnecessarily add complex date & time functions.
  • The “Select Investors” values came as one text string separated by commas. I split these out into 4 separate columns using a TEXTSPLIT function with “,” as the delimiter. I also used a TRIM function to get rid of the white space before/after the values after splitting. I didn’t end up using these values in my analysis but the data was a lot easier to use in this format.

After this was all complete, I was ready to import the data into SSMS. This proved to be a lot more challenging than I realized it would be but after a few (…a lot) of YouTube tutorials I managed to get the data into SSMS.

Once I did, I realized I had failed to see that there were 18 companies in the data set where the continent was ‘NULL’ so I removed these from the dataset in SSMS using the below statement.

I started with the goal of answering the following questions:

  1. On which continent have the most unicorns been founded? Which continent has the highest-valued unicorns on average?
  2. Are companies in any specific industry likely to attract higher levels of funding? Are there any industries that consistently attract higher funding amounts?
  3. Which city/cities have the highest amount of funding available? Which countries have the most unicorns? Are there any cities that appear to be industry hubs?
  4. How long does it usually take for a company to become a unicorn? Do companies that have reached a $1 billion valuation faster tend to have higher valuations overall?

Key Insights

After carrying out my analysis I came to these key insights:

  • Over half (54%) of all unicorns have been founded in North America.
  • Although the Auto & Transport industry attracts the most funding on average, the fintech industry does so on a more consistent basis.
  • San Francisco lived up to its reputation as an incubator for high-valued start-ups, being the city where the most unicorn funding money has been raised. San Fran’s $81B raised almost equals the second and third-placed Beijing ($50.1B) and New York ($38.1B) combined. This also holds true when considering the number of unicorns founded in each of these cities.
  • The majority of unicorns take between 5 and 9 years to reach their $1B valuation. The time taken to reach this valuation seems to have no significant effect on valuation.

Continent Analysis

To see how many unicorns were founded on each continent, I used a COUNT() function, combined with a GROUP BY() clause. I put these in descending order by the number of companies using an ORDER BY () DESC clause.

I used a SUM aggregator function along with the OVER() statement to insert a column that shows the percentage of the total number of unicorns in the data set which were founded in each continent, as well as showing the actual number of unicorns.

From this, it’s clear that just over half of the unicorns were founded in North America. North America and Asia together make up over two-thirds of the dataset, with 82% of unicorns being founded in these continents.

Taking a look at the average unicorn valuation in each continent, Oceania is significantly above the rest of the world. However, upon closer inspection, we can see this is due to one large outlier with a $40 billion valuation in a small dataset of just 8 unicorns.

*Edit as of 09/04/2024 — “Average Valuation ($B)” in the second query and right hand table should be “Valuation ($B)”.

Industry Analysis

The next interesting angle for analysis of this data set is which industry the unicorns are in and whether or not there are any observable trends in this. In which industry do companies attract the highest amount of funding? Which industry produces the most unicorns?

To get an overview of the industries I used a simple GROUP BY() function, which showed that there are 15 different industries in this dataset.

Here, I wanted to determine whether or not unicorns in certain industries are likely to attract higher levels of funding. To do this I used a query that grouped the companies into their industries and returned the number of companies, average funding, and total funding for each industry. This yielded some interesting results, outlined below.

I ordered the data by the average funding which unicorns in that industry attract, as I wanted to see in which industries companies attract the most funding on average. Only two industries average over $1 billion of funding, namely Auto & Transportation and Consumer & Retail. This suggests that on average a new unicorn in these industries will attract the most funding.

Looking closer at the top 10 companies in the Consumer & Retail industry we can see that the average funding here is heavily skewed by the inclusion of JUUL Labs. With so few unicorns in the industry (25) this outlier has a significant effect on the average.

Excluding JUUL Labs from the consumer retail industry, the average funding falls down to $0.5 billion.

Considering the above, I took another look at this question from a different perspective, ordering the results first by the number of companies in the industry and then by the average funding. This indicated that the Fintech industry attracts the highest funding on a more consistent basis. Auto & Transportation falls down to 12th on this list, with Consumer & Retail falling down to 14th.

City Analysis

Looking now at the different cities in which these companies were founded, I wanted to see if there were any specific cities with a lot of funding up for grabs.

There were no prizes for guessing which city attracted the most funding. San Francisco is a well-known incubator for high-valued start-ups and so it is no surprise to see that it is the city where the most unicorn funding money has been raised. San Fran ($81.6B) has actually almost seen as much money raised as the second and third placed, Beijing ($50.1B) and New York ($38.1B), combined. Similarly, the number of unicorns founded in San Fran (152) almost matches that of Beijing (63) and New York (103) combined.

Again, looking at the number of unicorns founded in each city and the average funding would be interesting for further analysis, with Stockholm having by far the highest average funding but only 6 unicorns founded there.

Age Analysis

Here I wanted to see how long it usually takes companies to reach unicorn status and if this time has any effect on their valuations.

In order to avoid overcomplicating my statements with the lengthy date and time functions unnecessarily, I broke down the “Date Joined” data provided into year and month when I was cleaning the data in Excel before getting started. This paid off here, enabling me to get a quick overview of how long it took for companies to become unicorns after being founded.

I added the ‘Year_Founded’ and ‘Join _Year’ columns just to double check my process had worked correctly, which it had.

To identify if there were any observable trends between how long it took a company to reach unicorn status and its valuation, I used the following statements.

I used a CASE function to divide the companies up into:

- Long: Ten or more years to reach unicorn status.

- Medium: 5 to 9 years to reach unicorn status.

- Short: Less than 5 years to reach unicorn status.

I structured this as a subquery and an outer query as I couldn’t group the output by the new “Type” column if I didn’t do this.

This shows that the majority of unicorns take between 5 and 9 years to reach their $1 billion valuation. Interestingly, it is more common for a company to reach a billion-dollar valuation in less than 5 years than more than 10 years — indicating that more unicorns reach this milestone due to being exciting, innovative ideas, rather than established, blue-chip companies.

This output also shows that the time taken to become a unicorn has no significant effect on the valuation which the company will achieve.

Further Investigation

As mentioned at the offset, there are definite opportunities for further investigation into this dataset. There were some questions I thought of but didn’t analyse here, purely to try and keep this project reasonably short. These include:

  • Is there a connection between the total funding raised and the number of years it took for a company to reach a $1 billion valuation?
  • Are there any specific investors or investment firms that are consistently associated with unicorns with higher valuations?
  • Is there a relationship between the founding year of a company and the number of select investors it attracted? Are there any observable trends here?
  • Which unicorn companies have had the biggest return on investment?
  • Which investors have funded the most unicorns?

Final Word

Thank you so much for making it this far and reading my entire Start-up to Unicorn project! I hope you enjoyed reading it as much as I enjoyed putting it together. Whether or not you enjoyed it, I’d love to hear your feedback so please feel free to get in touch — you can reach me on LinkedIn here or by email at jamesonjoshua1@gmail.com.

--

--

Josh Jameson

I write about data analysis (SQL, Power BI and Excel) as well as (less frequently) about finance & markets.