Let's open Anaconda Prompt (or Command Prompt if you're Mac user).
The first step is to choose the environment name. Usually, it's = project name.
conda create --name redcar python=3.7.4
After you have created a new virtual environment, you need to activate it to use it. To do so, use the following command:
conda activate redcar
Now you can install all the packages that you will use in this project. Let's try to install pandas:
conda install pandas
You can always install a specific version of a package conda install pandas==0.25
(e.g., your project group member using not the latest one.) You can also install a set of packages at once. Just type them separated with space:
conda install matplotlib seaborn numpy
Let's take a look at what packages do we have:
conda list
To dump this list of packages into a single file use:
conda env export > environment.yml
🎉 We created a file conventionally called environment.yml
that contains all the packages that have been installed! Now we can pass by this file to a person and he or she will be able to recreate the same setup. Usually this file is stored under C:/Users/<your_user_name>/
(and for Mac )
Alternatively, if you want to build an can use the following command:
conda list --explicit > spec-file.txt
Note that this specification file will allow you to install on the same operation system (i.e. Windows).
Now let's get back to our base environment. For this simply type:
conda deactivate
Let's finalize our practice by deleting this virtual environment 😱 :
conda env remove --name redcar
Don't worry! We doing this only to recreate it again from previously environment.yml file 🦉 . If you somehow forget to do so, take one here and put it to C:/Users/<your_user_name>/
(and for Mac ). The command for creating an environment from the file is as follows:
# Deafult option
conda env create --name redcar --file environment.yml
# Or if you used the second option
conda list --explicit > spec-file.txt
Alright! We're back on track. We have a new virtual environment with a base set of packages to continue our work!
However, to make this environment work inside JupyterLab (or Jupyter Notebook), we need to tell JupyterLab that the environment exists. We can do this with the following command:
python -m ipykernel install --user --name=redcar
Poof! You're done! Now you should see an extra icon on the right from default Python 3 called redcar.
After you've finished a project, you may want to delete associated virtual environment and the kernel from JupyterLab (read get rid of myenv icon like on the image below, or as in our case redcar).
To do so, let's open Anaconda Prompt or Command Prompt and type:
jupyter kernelspec list
The output of this command is a list of all available kernels. For example, if you have only two kernels installed redcar and default Python 3, __the output should look like this:
Available kernels:
redcar /home/user/.local/share/jupyter/kernels/redcar
python3 /usr/local/share/jupyter/kernels/python3
Alright, let's delete redcar:
jupyter kernelspec uninstall redcar
Alright! We've just rolled back to the original state of your JupyterLab.
{% hint style="warning" %} 🧠 Note: There are other ways to manage virtual environments pipenv or venv. We cannot say which one is better. As usual there are pros and cons. Our advice is: whenever you feel uncomfortable with the tool that you're using, dive in to find a better option. {% endhint %}
Cookiecutter is a tool that helps to create project templates for Python packages, Java and Android applications, etcetera. Having a project template with a couple of lines of code prevents you from manual work (and that's the end goal, right 🐌?).
The project template that we will use was designed by DrivenData and called Cookicutter Data Science. The project website says:
"Cookiecutter Data Science is a logical, reasonably standardized, but flexible project structure for doing and sharing data science work."
After testing in numerous ⚔, we concluded that it's pretty handy. So let's continue by installing Cookiecutter and Cookiecutter Data Science.
The first step is to open Anaconda Prompt (or Command Prompt) and activate the virtual environment.
conda activate redcar
Now let's install Cookiecutter with pip (it's not available with conda 🤷):
pip install cookiecutter
Nice! The next step is to point Cookiecutter to a specific project template:
cookiecutter https://github.com/drivendata/cookiecutter-data-science
You'll get a set of questions such as:
- project name and repo name (usually they're =),
- author name (surname and company if any),
- short description of your project (a couple of line of code for README.md),
- licence (read more about licenses here),
- S3 bucket and AWS profile (for establishing a pipeline with Makefile).
Let's fill it up!
{% hint style="success" %} Alright! Great success! Now it's time to continue this work with Git and GitHub. {% endhint %}