Pardon the extremely click-baity blog title. There's a good reason for that, I promise. In the world of data science, MVP stands for Minimum Viable Product. It's a first draft of a data science project in which we've put together enough of our workflow to read in some data, put it into a workable format for our tools to handle, train a basic model, and calculate some preliminary results.

The results can be absolute garbage, and that's okay.

An MVP is an engineering effort, meant to provide us with a pipeline to quickly develop new iterations of our work, and to produce a baseline model that we can use to benchmark our more serious findings.

There are lots of tools to this trade

Data science is a discipline that requires a very broad skill set. A data scientist has to know how to write code, have a good understanding of math and statistics and machine learning and research methods, and understand the context and knowledge base of whatever problem they're trying to figure out.

Most data science training emphasizes the math, stats, machine learning, and research skills. Coding fundamentals are frequently taught in data science courses, but the bulk of engineering skills are usually a process of learning by doing. Domain knowledge is left as an exercise to the reader.

Jumpstart your engineering so you can think more about the science

The goal of Data Science MVP is to collect useful tools for data science workflows, and provide introductions that are just detailed enough to get new data scientists up and running. I'll provide some tips and personal philosophy on best coding engineering practices as well. I believe that the best way to learn data science is to get to work cleaning data, training and testing models, and interpreting results. There's a steep learning curve to doing that, especially when you're still learning the tools of the trade. My hope is that by making the tools easier to understand, you'll be able to spend more brain cycles learning how to think like a data scientist. You'll get used to the tools along the way.


Comments

comments powered by Disqus