Exploring Web Browsing and Web Scraping with Lynx

How lynx command line browser can be used for web browsing and web scrapping.
February 24, 2025 by
Exploring Web Browsing and Web Scraping with Lynx
Hamed Mohammadi
| No comments yet

Lynx is one of the oldest web browsers still in active use—a lightweight, text-based browser that runs entirely in the terminal. While its text-only interface may seem limiting at first glance, Lynx has a host of advantages for both browsing and web scraping. In this post, we’ll dive into how you can use Lynx for day-to-day web browsing, and then explore its powerful capabilities for extracting data from web pages.

Table of Contents

  1. Introduction
  2. Installing and Configuring Lynx
  3. Basic Web Browsing with Lynx
  4. Web Scraping with Lynx
  5. Scripting and Automation Examples
  6. Advanced Tips and Customizations
  7. Conclusion

Introduction

Lynx is a command-line web browser that renders web pages in plain text. Because it does not process JavaScript or multimedia content, it displays only the server-rendered HTML. This behavior is particularly useful for web scrapers that need a simplified view of a webpage without the extra clutter of images, scripts, and style sheets.

Whether you’re troubleshooting a site’s text output for SEO purposes, or you need to quickly harvest links and data from a website using a shell script, Lynx offers an efficient and reliable solution.

Installing and Configuring Lynx

Installation

Lynx is available on many platforms:

  • Linux (Ubuntu/Debian):
    sudo apt-get update
    sudo apt-get install lynx
    
  • macOS (using Homebrew):
    brew install lynx
    
  • Windows: Install Lynx through Windows Subsystem for Linux (WSL) or via a package manager like Chocolatey.

Basic Configuration

After installation, you can check the version by running:

lynx --version

Lynx also supports a configuration file (lynx.cfg) where you can customize settings such as:

  • Keybindings (e.g., enabling vi keys for navigation)
  • Link numbering and text formatting options
  • Cookie handling for smoother browsing

A properly tuned configuration file can enhance both your browsing experience and scraping accuracy.

Basic Web Browsing with Lynx

Using Lynx for browsing is straightforward. Launch it by specifying a URL:

lynx https://example.com/

Navigation in Lynx is done entirely via the keyboard. Links are typically numbered, so you can jump to a specific link by entering its corresponding number. Here are some common commands:

  • Arrow Keys / j and k: Scroll up and down
  • Enter: Follow a link
  • q: Quit Lynx
  • p: Print or save the current page

The text-only view provided by Lynx is great for quickly scanning content without distractions, especially when you need a clear view of the server-rendered HTML.

Web Scraping with Lynx

Because Lynx strips away JavaScript and multimedia, it produces a clean dump of a web page’s content. This makes it ideal for scraping data such as:

  • Lists of links
  • Plain text content
  • Metadata visible in the HTML source

Dumping a Webpage

To output the rendered text of a page, use the --dump option:

lynx --dump https://example.com/ > output.txt

This command prints the page’s content to standard output, which you can redirect into a file. The dumped content often includes a list of numbered links at the bottom of the file.

Extracting Links

If your goal is to extract just the links from a page, combine a few options:

  • --listonly: Outputs only the list of links
  • --nonumbers: Removes the line numbers from the output
  • --display_charset=utf-8: Ensures proper character encoding

Example:

lynx --listonly --nonumbers --display_charset=utf-8 --dump https://www.nytimes.com/ | grep "^http" | sort | uniq > links.txt

This command extracts, sorts, and removes duplicate URLs from the New York Times homepage.

Scripting and Automation Examples

By integrating Lynx with other shell utilities, you can create powerful web scraping scripts. Here are two practical examples.

Example 1: Extracting Links with a Shell Function

Add this function to your shell configuration file (e.g., ~/.bashrc or ~/.zshrc):

extract_links() {
    lynx --listonly --nonumbers --display_charset=utf-8 --dump "$1" | grep "^http" | sort | uniq
}

Usage:

extract_links https://www.example.com/ > example_links.txt

Example 2: Scraping Specific Data

Suppose you want to extract weather data from a specific site. You can pipe Lynx’s dump into grep and awk to filter the data:

#!/bin/bash
# weather_scrape.sh: Extract weather info

URL="https://weather.example.com/today"
# Dump the page content and filter for temperature information
weather_info=$(lynx --dump "$URL" | grep "Temperature:")

echo "Today's weather: $weather_info"

Make the script executable:

chmod +x weather_scrape.sh

And run it:

./weather_scrape.sh

Advanced Tips and Customizations

Using Regular Expressions

For more refined data extraction, integrate regular expressions with grep or sed. For instance, to extract IP addresses from a webpage:

lynx --dump https://example.com/ | grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' > ips.txt

Combining Lynx with Other Tools

  • curl: Sometimes using Lynx together with curl can further refine your workflow.
  • Mailcap configuration: Lynx is mailcap-aware, allowing you to specify external programs to handle specific MIME types. This is useful if you need to process or view certain types of data with specialized tools.
  • Scripting in Bash: Automate regular scrapes by scheduling your scripts with cron or integrating them into larger data pipelines.

Handling Dynamic Content

Remember that Lynx does not execute JavaScript. For sites that heavily rely on client-side scripting, Lynx may not capture all dynamic content. However, many sites still serve a basic HTML version that is perfect for text scraping and SEO analysis.

Conclusion

Lynx may have started its life in the early 1990s, but its simplicity and efficiency still make it a valuable tool for both web browsing and web scraping. Its ability to render pages in pure text provides an uncluttered view of server-rendered content—ideal for debugging, SEO analysis, and quick data extraction.

By mastering Lynx’s command-line options and combining it with powerful shell utilities like grep, awk, and sed, you can create flexible scripts that handle everything from link extraction to comprehensive web scraping tasks.

Exploring Web Browsing and Web Scraping with Lynx
Hamed Mohammadi February 24, 2025
Share this post
Tags
Archive

Please visit our blog at:

https://zehabsd.com/blog

A platform for Flash Stories:

https://readflashy.com

A platform for Persian Literature Lovers:

https://sarayesokhan.com

Sign in to leave a comment