Data science is a set of multidisciplinary skills that have become essential in many areas of research and industry. The skills required draw fundamentally from three components: computer science, statistics and a specific scientific domain or discipline.
The goal of data science is to use insights extracted from large data sets to generate new knowledge. The process starts with a motivating question that can be explored with data.
- Knowledge in one or more disciplines is needed both to formulate the question or problem that needs to be solved, and to interpret the data analysis. This component is the domain knowledge, which could draw on classical knowledge (physics, biology, medicine, etc.) or transdisciplinary knowledge (cognitive science, earth system science, ecological economics, sustainability science, etc.).
- Knowledge of statistics is fundamental for designing the data collection, analyzing and modelling the data, and presenting the results in a clear way.
- Knowledge in computer science is needed to write the code and algorithms necessary to access, store, process and visualize large and complex data collections.
The data science process
From problem formulation to communication of results, data science follows a series of steps that are common to most areas of scientific inquiry, but including extensive computer coding throughout.
Essential skills for data science
Database query, machine learning and AI, web scraping and data visualization are some of the tools for applying data science to your discipline expertise.
The growth in use of data science is driven by advances in technology. Increased capacity and abundance of sensors, lab instrumentation, resolution in numerical simulations and remote sensing made readily available vast amounts of data. The advent of the internet, the Internet of Things, and our digital lifestyles have expanded further the amount of data and, most importantly, the variety of data generated. Increased computing power and a massive reduction in the cost of computer memory have facilitated the gathering and storage of data. And the development of more powerful methods for data analysis and modeling has helped further the processing of the data.
The term data science emerged in the 1990s, but the areas it draws from have a much longer trajectory. To trace the evolution of the term data science, we must look into the evolution of the data generation (amount and variety), the gathering and sharing, and the analysis of data (statistical methods and computer science developments).