This document is written for new members. Every new member needs to do every assignment here. It is expected that the entire training takes no more than 20 hours.
Please read the Welcome Document and understand the differences between "student thinking" and "researcher thinking".
Please read Making of CAM2 and understand the purpose of the project.
You should having a Linux computer for this project. If your computer does not run Linux and it has at least 8GB memory, you may use Virtualbox or VMware to create a virtual Linux computer Setting up Your VM. Even if you have a Linux computer, you should consider running a virtual Linux installation. This allows you to change settings and install new software packages without affecting your native Linux.
IMPORTANT: The following setting cannot be easily changed later:
While creating the virtual machine, assign a harddisk space of atleast 30-40GB(or higher depending on your machine's config) on the virtual linux when prompted. If you're on the database team, consult your teamlead/Ryan, as you might usually need more storage.
Your contributions to the project will be mostly recorded by github. Please watch training video. For more information read through the wiki page.
Contribution to the CAM2 group relies heavily on the use of Python. There are many resources avalable to learn python, from reading the Python Wiki to the Google Develuper Guide. You can learn the syntax by signing up to Code Acadamy. The best way to learn however is to simply try to start coding so follow the tutorials in this training repository and you will begin to get the feel of it in no time.
Please clone System Core and understand how different pieces fit together. This repository is greatly simplified from the real system. This core does not include anything about cloud computing.
Django and Heroku are both vital tools to the Camera Database team and the CAM2 UI team. Everyone who wants to contribute to the CAM2 project should have knowledge of developing with these tools. To get started there are assignments you must complete. Joseph also created an introduction to Heroku video you can watch here. The UI team currently uses Travis CI (Continuous Integration) to deploy the website. A video demo for this is also avalable here.
To get a basic understading of what the CAM2 Image Team does. Please answer and submit the following questions:
- What is a training and testing data-set?
- What is meant by model variance and model bias? Which one would we prefer and why?
- What is overfitting and how does it relate to model complexity?
- What is cross-validation?
- List 3 methods that performed well on ImageNet last year. Don't say "Ensemble A", but rather give the actual deep learning model used. If you are unsure, google your answer and there will be a pdf of the paper on the topic.
The CAM2 Database has thousands of cameras. These cameras were not added manually instead web parsing tools such as BeautifulSoup4 and Selenium were used to gather the data from various camera websites around the world. This paper explains this process in more detail as well as information about how metadata about the cameras is collected. The final step in completing the introductory training for the CAM2 Purdue Team is to complete the two web parsing assignments below.
- Compleate and submit Parsing Tutorial 1 (JSon and BeautifulSoup4)
- Compleate and submit Parsing Tutorial 2 (Selenium)