[python] WEB CRAWLING

Python

[python] WEB CRAWLING

SangRok Jung 2022. 9. 8. 00:00

WEB CRAWLING

▶ 웹 크롤링의 원리

▶ 웹 크롤링의 주의사항

실제 운영되고 있는 사이트에서 크롤링한 데이터를 업무상 목적으로 임의로 사용할 경우 법적인 문제가 발생할 수 있다. 이럴 경우 모든 법적 책임이 사용자에게 있으니 데이터를 수집할 수 있는 능력을 배웠다고 해서 수집된 데이터를 함부로 사용하면 절대로 안된다.
과도한 크롤링 작업으로 해당 사이트에 여러가지 문제나 손해가 발생할 경우 영업 방해로 법적인 책임을 질 수 있다. 이 부분도 아주 주의 해야합니다.

▷ bs4를 설치한다.

!pip install bs4

▷ selenium을 설치한다.

4.2.1 이상 버전부터 실행 문장들이 대폭 수정되었다.

!pip install selenium==4.2.0

▷ 필요한 모듈과 라이브러리를 로딩하고 검색어를 입력받는다.

from bs4 import BeautifulSoup
from selenium import webdriver
import time

query_txt = input("크롤링 할 키워드 : ")

▷ 크롬 드라이버를 사용해서 웹 브라우저를 실행한다.

- 크롬 드라이버 설치 링크

https://sites.google.com/chromium.org/driver/

ChromeDriver - WebDriver for Chrome

WebDriver is an open source tool for automated testing of webapps across many browsers. It provides capabilities for navigating to web pages, user input, JavaScript execution, and more. ChromeDriver is a standalone server that implements the W3C WebDriver

sites.google.com

path = "../Desktop/chromedriver" # 파일 경로 설정
driver = webdriver.Chrome(path)

driver.get("https://korean.visitkorea.or.kr/main/main.do#home")
time.sleep(1) # 1초 대기

▷ 검색창의 이름을 찾아서 검색어를 입력한다.

driver.find_element_by_id("inp_search").click()
element = driver.find_element_by_id("inp_search")
element.send_keys(query_txt)

▷ 검색 버튼을 눌러 실행합니다.

driver.find_element_by_link_text("검색").click()

'Python' 카테고리의 다른 글

[Python] CRAWLING - Element Access (0)	2022.09.16
[Python] CRAWLING - BeautifulSoup (0)	2022.09.13
[python] Module (0)	2022.09.06
[python] Package (0)	2022.09.06
[python] 예외처리 (0)	2022.09.06

현재글[python] WEB CRAWLING

꾸준함이 말미암아