Selenium│自動でログインしてスクレイピングする方法

2019年8月19日2022年12月5日

以前からやってみたかったブラウザのオートメーションツール、Selenium（セレニウム）に触れてみました。

スクリプトで複雑なブラウザー操作ができるので、かなり自動化できる表現の幅が広がりそうです。

seleniumの環境構築

Pythonのインストール

インストーラーをダウンロードしてWindowsにPython 3.7.4(Windows x86-64 web-based installer)を入れます。

https://www.python.org/downloads/windows/

Add Python 3.4 to PATHにチェックを入れていれば環境変数にパスが追加されます。

C:\Users\ユーザー名\AppData\Local\Programs\Python\Python37\
C:\Users\ユーザー名\AppData\Local\Programs\Python\Python37\Scripts\

seleniumのインストール

コマンドプロンプトを開いて、下記のコマンドを入力

pip install selenium

ChromeDriverのダウンロード

まずChromeを開いてスリードットメニュー＞ヘルプ＞Chromeについてからバージョンを確認します。

次に下記のリンクからChromeのバージョンと合わせたドライバーをダウンロード（ChromeDriver 76.0.3809.68）
http://chromedriver.chromium.org/downloads

↑目次へ戻る

seleniumの実行

Chromeを起動してGoogleを開く

test.pyという名前で中身は下記のコードにしたものをコマンドプロンプトで実行してみます。

python C:\Users\ユーザー名\Desktop\test.py

※もちろん.pyをダブルクリックでもOKです。

from selenium import webdriver
driver = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
driver.get("https://google.co.jp")

※Windowsの場合は\ではなくバックスラッシュ（/）でパスを区切ること！
※文字エンコードは「UTF-8」

テストして、Chromeが起動すればOKです。
※ブラウザーとドライバーのバージョンが合わせないと起動しません

Googleで「CGメソッド」と検索する

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
driver.get("https://www.google.co.jp/")
search = driver.find_element_by_name("q")
search.send_keys("CGメソッド")
search.send_keys(Keys.RETURN)

このブログ内で「恋声」と検索する

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
driver.get("https://cg-method.com")

xpath = '//*[@id="s"]'
search = driver.find_elements_by_xpath(xpath)[1]
search.send_keys("恋声")
search.send_keys(Keys.RETURN)

※検索BOXが複数ある場合はdriver.find_elements_by_xpath(xpath)[1]と記述する

Twitterアナリティクスにログインして今月のツイート数を取得する

一番身近なのでテストとしてTwitterを選びましたが、本来はスクレイピング禁止なので実用化は非推奨です！

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys

twitter()

account = 'アカウント'
password = 'パスワード'

def twitter():
    driver = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
    driver.get('https://analytics.twitter.com/user/cg_method/home')
    time.sleep(3)

    element_account = driver.find_element_by_class_name("js-username-field")
    element_account.send_keys(account)
    time.sleep(3)

    element_pass = driver.find_element_by_class_name("js-password-field")
    element_pass.send_keys(password)
    time.sleep(3)

    element_login = driver.find_element_by_xpath('//*[@id="page-container"]/div/div[1]/form/div[2]/button')
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    element_login.click()
    time.sleep(3)

    selector = "body > div.container > div > div.home-content > div > div.home-columns > div.home-column-secondary > div:nth-child(2) > div > div > div:nth-child(1) > div > div"
    tweetsNum = driver.find_element_by_css_selector(selector)
    print("今月のツイート数:",tweetsNum.text)