I've created an automation project that simply focuses on 6 platforms.
- Youtube
- Gmail
- Google Drive API
I've got the idea while I wanted to backup my Instagram account. Therefore, I need to download all the images and videos from my personal account.
After searching on Internet for a while, I've found that the Instagram API really can help me to crawl the data from any users (public users and your friends), execpt private users.
Every section is describe clearly as following
Table of contents
- Youtube
- Gmail
- Coursera
- Google Drive Download API
[Updated] After backing up all data in my Instagram account. I no longer use the Instagram anymore. The instagram part would not be the lastest update version.
I highly recommend to use virtual environment with Python 3.6.
I've utilized the Instaloader to download images and videos from Instagram.
Instaloader's benefits:
- downloads public and private profiles, hashtags, user stories, feeds and saved media,
- downloads comments, geotags and captions of each post,
- automatically detects profile name changes and renames the target directory accordingly,
- allows fine-grained customization of filters and where to store downloaded media.
- downloads many profiles at the same time
$ pip3 install instaloader
To download all pictures and videos of a profile, as well as the profile picture, do
instaloader profile [profile ...]
where profile
is the name of a profile you want to download. Instead of only one profile, you may also specify a list of profiles.
Instaloader can also be used to download private profiles. To do so, invoke it with
instaloader --login=your_username profile [profile ...]
When logging in, Instaloader stores the session cookies in a file in your temporary directory, which will be reused later the next time --login
is given. So you can download private profiles non-interactively when you already have a valid session cookie file.
instaloader [--comments] [--geotags] [--stories] [--highlights] [--tagged]
[--login YOUR-USERNAME] [--fast-update]
profile | "#hashtag" | :stories | :feed | :saved
To install it right away for all UNIX users (Linux, macOS, etc.), type:
$ sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
$ sudo chmod a+rx /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget:
$ sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
$ sudo chmod a+rx /usr/local/bin/youtube-dl
Install the UniDecode package
$ pip install unidecode
python youtube/downloads_YTvideos.py --folder-dst ./
arguments:
-- dst_path: Where to save videos
-- url : Download link
- Decode the Unicode: convert Tiếng Việt to Tieng Viet
- Rename the filenames
filter_videos.py
- work with file name as Unicode type
- rename the old names with new file names
$ python facebook/get_FB_imgs.py
The main file is quickstart.py
. We will filter emails with keywords
Input the keyword in function delete_messages
For example, I want to delete emails that contain Google Calendar.
delete_messages('Google calendar')
To download the courses of Coursera, you need to create an account and register the courses that you want to download.
If you don't already have one, create a Coursera account and enroll in a class. See https://www.coursera.org/courses for the list of classes.
Firstly, download this repository at this link via command line
$ git clone https://github.com/coursera-dl/coursera-dl
$ cd coursera-dl
Running the script:
Run the script to download the materials by providing your Coursera account and password as well as the class names. You can specify some additional parameters:
$ python coursera-dl -u <email> <course-name> --subtitle-language en --path <download-folder> --download-delay 0
If you don't want to type your password in command line as plain text, you can use the script without -p option. In this case you will be prompted for password once the script is run.
Parameters
--subtitle-language en
--download-delay 0
--path
If you encounter the problem:
HTTPError: 400 Client Error: Bad Request for url: https://api.coursera.org/api/login/v3
You need install the coursera-dl chrome browser extension:
https://github.com/e-learning-archive/browser-extension/ in order to get cookies of coursera from your web browesr. Copy cookies from the extension and put this argument inside the main command line above.
--cauth <your_cookie>
The full implementation would be
$ python coursera-dl -u <email> <course-name> --subtitle-language en --path <download-folder> --download-delay 0 --cauth <your_cookie>
For advanced user, there're another interesting arguments you can discover later on. Please take a look the original repo at reference.
Download the public with Google Download API.
You can download large file from Google Drive via command-line
or python
.
Installation
pip install gdown
From Command Line
gdown --id 0B_NiLAzvehC9R2stRmQyM3ZiVjQ
From Python
import gdown
url = 'https://drive.google.com/uc?id=0B9P1L--7Wd2vNm9zMTJWOGxobkU'
output = '20150428_collected_images.tgz'
gdown.download(url, output, quiet=False)
https://developers.google.com/drive/api/v3/quickstart/python