Essential skills for data science

Some skills are needed throughout the data science process, like knowledge of a computer programming language. The most popular languages are Python and R. In some disciplines, knowledge of both is needed in order to access libraries or packages created for a specific task.

DATA SCIENCE PROCESS SKILLS / KNOWLEDGE
1. Framing the problem Domain knowledge
2. Data collection

Database management (My SQL, PostgreSQL, MongoDB)

Distributed processing (Apache Hadoop, Spark, Flink)

Web scraping and using APIs

3. Data cleaning

Pandas for Python. R

4. Exploratory analysis

Statistics

Data visualization (libraries in Python: Numpy, Matplotlib, Pandas, Scipy. Packages in R: ggplot2, Dplyr)

5. Modeling and analysis

Statistical inference

Machine Learning (scikit-learn for Python)
6. Interpretation and communication of results

Domain knowledge

Data visualization (matplotlib, ggplot, seabron, tableau, d3j)

Dashboards (Shine for R, Dash for Python)

Sharing and documenting code (Jupyter notebooks, R Markdown, creating R or Python packages)

 

Back to top