Голосов: 0
#1
Опубликовано: 12/2016 Английский
Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting, etc.) is a technique for extracting large amounts of data from websites and save the the extracted data to a local file or to a database.
In this course, you will learn how to perform web scraping using Python 3 and the BeautifulSoup, a free open-source library written in Python for parsing HTML.
We will use lxml, which is an extensive library for parsing XML and HTML documents very quickly; it can even handle messed up tags. We will also be using the Requests module instead of the already built-in urllib2 module due to improvements in speed and readability.
Finally, we will use Selenium to crawl AJAX & JavaScript driven pages.
The course cover the following topics: accessing web pages programmatically; scraping web pages to extract the required data using BeautifulSoup to parse web pages; interacting with web pages to do different things with them programmatically; and using Selenium for web scraping and when we need it.
By the end of this course, you will be able to understand how websites and servers function, diverse data extraction techniques, and methods of handling and organizing data.
This Web Scraping course covers the following topics:
- Basic review of data structures (Lists, Dictionaries, Tuples, File Handling)
- List, dictionaries, tuples comprehensions, inline if else statements
- Intro on how websites are hosted on servers
- Basic calls to the server (GET, POST methods)
- Basic review of HTML and CSS
- Intro to requests module and BeautifulSoup Module
- Learning how to parse HTML using BeautifulSoup
- Assignment Solution of Assignment
- Learning filtering elements using BeautifulSoup and navigating the Parse Tree
- Project
- Project Discussion
- Introduction to JavaScript and AJAX
- Introduction to selenium and the need for it
- Selecting elements using selenium
- CSS selectors
- Xpath selectors
- Navigating pages using selenium
- Project
- Project Discussion
Что я получу от этого курса?
- Python: Обзор структур данных, Conditionals, Обработка файлов
- Как веб-сайты размещаются на серверах; Основные вызовы к серверу (GET, POST методы)
- Веб соскоб с Python Beautiful Soup и Requests
- Использование Selenium для обработки JavaScript и AJAX
- Различные веб-Зачистка Упражнения
- Исходные коды для всех упражнений можно скачать
Какова целевая аудитория?
- Те, кто хочет узнать, как использовать Python для веб-соскоба и извлечения данных.
Для просмотра содержимого вам необходимо зарегистрироваться!Для просмотра содержимого вам необходимо зарегистрироваться!
Последнее редактирование модератором:
- Статус
- В этой теме нельзя размещать новые ответы.