Resources

Data Science MVP: The Template

The Data Science MVP project template is designed to support improved workflows for data scientists, which is the main objective of this blog.

Here is my blog post explaining how to use the template.

Here is a link to the template on GitHub.

Other Goodies

One of my favorite things about teaching at Metis is that every quarter, as our students work on their research projects, they post links to useful resources that they find online. Lots of these are things I've never heard of before, and wouldn't have found without researching a specific question. I'll share those here, and use this page as a To Do list for deciding on future topics to write about.

Tools

neuron: An Interactive Programming Experience for Data Scientists, extension for VS Code

regexr: an online tool to learn, build, & test Regular Expressions

nbdime: diffing and merging of Jupyter Notebooks

SoloLearn: learn python, SQL, javascript, html, and more on the web or on your phone

Workflows

Cookie Cutter Data Science: A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Equinor Data Science Template: A starter template for data science projects

fastec2: python package for managing EC2 instances on AWS

Datasets

Google Dataset Search: a dataset search utility from Google, still in Beta.

Awesome Public Datasets: a list of topic-centric public data sources in high quality.

Visualization

PyViz: PyViz is a coordinated effort to make data visualization in Python easier to use, easier to learn, and more powerful.

mplcursors: Interactive data selection cursors for Matplotlib.

mapbox: location data platform for mobile and web applications.

scikit-plot: python package for ML-specific plots (metrics, estimators, etc.)

Understanding Statistical Power and Significance Testing: an interactive visualization and power calculator.

Application Frameworks

Flask: a web application framework written in Python.

Kivy: Open source Python library for rapid development of applications that make use of innovative user interfaces, such as multi-touch apps.

Heroku: a platform as a service (PaaS) that enables developers to build, run, and operate applications entirely in the cloud.

Amazon Elastic Beanstalk: an easy-to-use service for deploying and scaling web applications and services.

Google App Engine: highly scalable applications on a fully managed serverless platform.