How to Use Python to Scrape Data From Websites & Save It to Excel (2025 Guide)
This guide is for mid-to-large companies. You often need to collect data from websites. This guide shows you how to do it with Python. We’ll scrape data and save it to an Excel file. It’s easy to understand, even without coding experience. What is Web Scraping? Web scraping is automated data extraction. It pulls information from websites. This information is then saved in a structured format. Think of it like copying and pasting, but done by a computer program. It’s much faster and more efficient. Why Use Python for Web Scraping? Python is a popular programming language. It’s great for web scraping because: The Tools You’ll Need (Python Libraries) We’ll use these key Python libraries: Installation: Open your command prompt or terminal and type: Bash pip install requests beautifulsoup4 openpyxl selenium pyppeteer You’ll also need to download the appropriate web driver for Selenium and Pyppeteer. Method 1: Scraping Static Websites (using requests and BeautifulSoup) Static websites display the same content to all users. The content doesn’t change dynamically. Step 1: Get the Web Page Content Python from bs4 import BeautifulSoup import requests from openpyxl import Workbook url = “https://www.example.com” # Replace with the URL you want to scrape headers = {‘User-Agent’: ‘Mozilla/5.0’} # Mimic a browser response = requests.get(url, headers=headers) response.raise_for_status() # Check for errors html_content = response.text Step 2: Parse the HTML with BeautifulSoup Python soup = BeautifulSoup(html_content, ‘html.parser’) Step 3: Find and Extract the Data This is where you use BeautifulSoup’s methods to locate the specific data you need. Examples: Python # Find the first paragraph (<p> tag) and get its text: paragraph_text = soup.find(‘p’).text # Find all links (<a> tags) and get their URLs: links = soup.find_all(‘a’) for link in links: href = link.get(‘href’) print(href) # Find an element with a specific class: element = soup.find(‘div’, class_=’my-class’) # Find an element with a specific ID: element = soup.find(id=’my-id’) # Find all images and get their source URLs images = soup.find_all(‘img’) for image in images: src = image.get(‘src’) print(src) #Navigate to sibling tags next_sibling = soup.find(‘h2’).find_next_sibling() previous_sibling = soup.find(‘h2’).find_previous_sibling() #Extract and modify attributes attributes = soup.find(‘a’).attrs Step 4: Store the Data in Excel (using openpyxl) Python wb = Workbook() # Create a new Excel workbook ws = wb.active # Get the active worksheet ws.title = “Scraped Data” # Set the sheet title # Add headers (column names) ws.append([“Product Name”, “Price”, “Description”]) # Example data (replace with your actual scraped data) products = [ {“name”: “Product 1”, “price”: “$10”, “description”: “This is product 1.”}, {“name”: “Product 2”, “price”: “$20”, “description”: “This is product 2.”}, ] for product in products: ws.append([product[‘name’], product[‘price’], product[‘description’]]) wb.save(“scraped_data.xlsx”) # Save the Excel file Method 2: Scraping Dynamic Websites (using Selenium) Dynamic websites load content using JavaScript. requests can’t handle this. Selenium can. It controls a real web browser. Step 1: Set Up Selenium Python from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.service import Service # ADDED from selenium.webdriver.chrome.options import Options # ADDED # — For Headless Mode (Optional) — options = Options() options.add_argument(“–headless”) # Run Chrome in headless mode #service = Service(‘/path/to/chromedriver’) # Replace with the actual path to chromedriver driver = webdriver.Chrome(options=options) #options=options for headless Step 2: Navigate to the Page Python url = “https://www.example.com/dynamic-page” # Replace driver.get(url) Step 3: Interact with the Page (if needed) Selenium lets you click buttons, fill forms, and scroll. Python # Example: Find an element by its ID and click it: button = driver.find_element(By.ID, ‘my-button’) button.click() # Example: Find an input field by its name and type text: input_field = driver.find_element(By.NAME, ‘my-input’) input_field.send_keys(“Hello, world!”) # Example: Wait for an element to appear (important for dynamic content!) from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC try: element = WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.ID, “dynamic-element”)) ) finally: pass # Removed driver.quit() – we’ll handle it later Step 4: Get the Page Source (after JavaScript has loaded) Python html_content = driver.page_source Step 5: Parse with BeautifulSoup (same as Method 1) Now you have the updated HTML. Use BeautifulSoup to extract the data, just like in Method 1. Python soup = BeautifulSoup(html_content, ‘html.parser’) # … (use find(), find_all(), etc. to extract data) … Step 6: Taking a Screenshot: save_screenshot() Python driver.save_screenshot(‘screenshot.png’) Step 7: Close the Browser Python driver.quit() # Close the browser and free up resources Method 3: Scraping with Pyppeteer (Alternative to Selenium) Pyppeteer is another browser automation library. It controls Chromium/Chrome. Step 1: Set Up Pyppeteer Python import asyncio from pyppeteer import launch async def main(): browser = await launch(headless=True) # headless=False to show the browser page = await browser.newPage() await page.goto(‘https://www.example.com’) # Replace # … (Interact with the page, extract data) … html_content = await page.content() # Get Page content await browser.close() asyncio.get_event_loop().run_until_complete(main()) Step 2: Interact with the Page (Examples) Python # Find an element by CSS selector and click it: button = await page.querySelector(‘#my-button’) await button.click() # Type text into an input field: await page.type(‘#my-input’, ‘Hello, world!’) # Wait for an element to appear: await page.waitForSelector(‘#dynamic-element’) # Taking a Screenshot: screenshot() await page.screenshot({‘path’: ‘screenshot.png’}) Step 3: Parse with BeautifulSoup (same as before) Python soup = BeautifulSoup(html_content, ‘html.parser’) # … (Extract data using BeautifulSoup) … Step 4: Close the Browser Python await browser.close() Important Considerations FAQ Conclusion Python is a powerful tool for web scraping. With libraries like requests, BeautifulSoup, Selenium, and Pyppeteer, you can extract data from almost any website. Remember to scrape responsibly and ethically. Need help with web scraping or data extraction projects? Contact Hir Infotech (ensure this link is always active) for expert data solutions. We can handle the technical complexities, so you can focus on using your data. #WebScraping #Python #DataExtraction #BeautifulSoup #Selenium #Pyppeteer #Excel #DataScience #DataMining #Automation #WebAutomation #2025