Written by Jimmy Chan, Co-Founder and CEO of pes tracksA platform that automates manual data work and converts CSV, Excel and online data into analytics-ready databases.
Open source software startups that build data products are having a moment. In just the past few months, we’ve seen startups like Airbyte, ClickHouse, PostHog, and RudderStack rapidly grow strong communities and have collectively raised more than $100 million. These startups are open source alternatives to closed source products like Fivetran, Snowflake, Amplitude, and Segment, which are collectively valued at more than $100 billion at the time of writing.
This rise is not surprising given that the amount of data production and consumption is increasing exponentially. In this article, I will talk about the reasons behind this trend and how entrepreneurs can find opportunities to build successful data start-up companies.
What is startup data and open source software?
The data startup develops software and tools to help individuals and organizations make their data more useful. This includes helping to collect, clean, centralize, store, transform, and analyze data.
Open Source Software (OSS) refers to software that people can view, use, download, study, modify, distribute and share for any purpose. The specific purposes depend on the type of its license; For example, MIT license, GPL 3.0 or Apache 2.0.
What explains the sharp rise in open source companies building data products?
Basically, a shift in market needs. Increasingly companies are looking to have better control over their data, adapt products to custom use cases and have a choice of tools that can be combined. This is exacerbated by the increasing amount of data companies produce and their desire to extract more ideas from it to grow the business.
People who work with data are familiar with the following pattern: They want to capture marketing data to understand customer trends. They sign up for a commercially available product developed by a seller they trust. Then their data gets larger, workflows become more complex and they need to customize the product to suit specific use cases. To do this, they need access to basic data, which they don’t have, unless they pay sellers a five-figure recurring surcharge. Most companies go through this pattern. While the requirements of many companies will not go beyond the features offered by closed source vendors, the most successful ones will certainly create great opportunities for startups.
Why do organizations turn to open source data tools?
Because they can take advantage of it in ways that closed source products can’t.
• Access: You get maximum access to your essential data.
• monitoring: You can control what you want to do with your data, how it is stored or backed up, and who also has access to it.
• Option: You can choose to use your data with any uploader that requires access to the data. You can also choose the tool you want to combine it with.
• Customization: You can modify and customize the data product to suit the needs of your teams or businesses, allowing for maximum flexibility.
• Mobility: You can take your data to another location without extensive and time-consuming effort.
• Compliance and regulation: You are in control of all your data and do not need to establish relationships with the data processor with third parties.
• innovation speed: The innovation effort is brought on by the distributed community, and it is all about product improvement, rather than being handled by a single team or company.
What matters when creating an open source data startup?
Community, flexibility, speed of development, and customization are the most important factors that determine the success of a data-driven OSS startup. Surprisingly, product market fit in the traditional sense is not the most important challenge for OSS data start-ups. This is because data products built as open source software often rely on already available and successful commercial products, which already have market-appropriate products. Open source companies benefit from observing how a closed source system operates, from their pricing, going to market, strengths and weaknesses to their product documentation, which is often tailored individually from existing products.
On the other hand, there are other difficult challenges as well. First, it is not easy to choose any closed source software, create an open source alternative to it and get started. Not all products can or will succeed as open source. Second, you will still need to know how to choose a market-friendly business model, business model, and pricing system that works well for your target market. And third, building community and trust is challenging with open source data products. Clients may not choose a project that does not have sufficient community support or is rapidly growing. Community is probably the hardest thing about getting started with open source data but it’s also the most important.
Products that have worked or would have worked well as open source
Although there is no exact formula, here are some examples of products that have worked or will work as well as OSS. These are the products that:
• Query your data warehouse (business intelligence tools, data indexing, data monitoring)
• Requires a very high level of engineering customization
• It is an essential building block for other products (authentication and security)
• Extracting insights from their data (machine learning, product analytics)
• Helping you create custom applications based on your company data (in-house builders and data automation tools)
• Capture or stream data logs (customer data platforms, event streams)
• Enhance team collaboration (Figma, Jira, Google Sheets)
There is a great opportunity to learn about closed source products that work well and are hugely successful and build an open source alternative to them. We are likely to see more companies marketing open source software for data products due to the increased market demand and intrinsic benefits of open source data products.
Over the next few years, we’ll likely see the emergence of a modern standard “data set” – or a set of basic tools that help with data-related work, with every core component based on open source software. This future of OSS-based data products is incredibly bright, full of amazing opportunities for entrepreneurs and is just around the corner.