How to (Actually) get Started with Data Science Using Python
Disclaimer: This is only intended to be a plain-English guide to quickly getting started up in Python for Windows, if you wish to learn more about the features of everything to be installed please view their official websites.
So you’ve completed some online training on a website like DataCamp, and you’re ready to start working on your own projects to build up a portfolio, but how exactly do you setup your work environment? Follow along with the next 4 steps, and you will be analyzing your first data set in no time!
Step 1: Mmmmm….Chocolate(y)
The first step will be to download a program called chocolatey. Chocolatey is a package manager and will be what we use to install the Python language onto our computer. It is as easy as copying and pasting a single line of code into the windows command prompt (literally). To install, navigate to the below link and copy the corresponding line of code and paste into either the command prompt or PowerShell.
Either will do, it is just important to ensure you are using command prompt or PowerShell as an administrator.
Step 2: Teaching our computer a new language
Now that we have chocolatey installed, we can now install the Python language to our computer, to do so all we must do is type the following into the command prompt or PowerShell
Choco install python –upgrade
From here chocolatey will take over and install the most recent version of python. At this point, we could begin to code in python using its barebones console, but it will be lacking many of the common data science packages like NumPy and Pandas, which leads us to the next step, downloading our interpreter.
Bonus: want to use R instead of python? No problem, replace ‘python’ with ‘R’ in the above command prompt line.
Step 3: Anaconda? Python? What is with all these snakes?
Anaconda3 is an open source, Python/R package manager that has over 1500+ packages, making it the premier data science distribution[i]. In plain English, Anaconda is your go to tool for all things data science. By downloading Anaconda3, it will give you the ability to create Jupyter Notebooks like this one, as well as taking care of installing common data science packages, among other features.
Step 4: An ID-What?
The final step to getting yourself setup to start analyzing data with python is to download the IDE, or Integrated Development Environment of your choice. It is here where you will write, edit, and run your code. Personally I use Pycharm by Jetbrains. I have experience using their other IDE IntelliJ Idea for Java programming, and found it an easy transition. Pycharm has a fairly intuitive layout and includes features like code autocomplete, direct connection with Git, and the ability to access SQL databases straight from the IDE, and many more. However, this really is a personal preference thing and you should do your research to see which IDE is right for you.
And that is it, you are all setup to start coding and analyzing data with Python!