02/04/2025 01:55

PHP Simple HTML DOM Parser library

ByMisterFoxOnline

Jun 19, 2024 #cURL, #snippet

I cannot remember why I was busy fiddling with cURL, but I did some fiddling scraping URLs using the PHP Simple HTML DOM Parser library.

In this tutorial:

Enable cURL
Demo
PHP code

See also:

Scrape a URL’s Content Using PHP cURL

1. Enable cURL

Open php.ini
Search for an uncomment: extension=curl
Restart Apache

2. Demo

View a demo of the following code: Simple HTML DOM Parser demo

3. PHP code

<!DOCTYPE html>
<html lang="en">

<head>
	<title>stylus :: Simple HTML DOM Parser demo</title>
	<style>
		* {
			font-family: Arial;
		}
	</style>
</head>

<body>
	<h1>Simple HTML DOM Parser demo</h1>
	<p><a href="https://sourceforge.net/projects/simplehtmldom/" target="_blank">simplehtmldom</a></p>
	<p>Simple implementation of the Simple HTML DOM Parser.</p>
	<p>Provide a fully qualified URL and the scraper will return a page displaying just the images from that URL.</p>
	<form action="" method="POST">
		<label for="scrapeurl">URL: </label><input type="text" name="scrapeurl" />
		<input type="submit" name="scraper" />
	</form>

	<?php
	if ($_SERVER['REQUEST_METHOD'] == 'POST') {
		if ($_POST['scrapeurl']) {
			/* https://sourceforge.net/projects/simplehtmldom/ */
			require('simplehtmldom/simple_html_dom.php');
			$url = $_POST['scrapeurl'];
			echo '<p>URL scraped: <a href="' . $url . '" target="_blank">' . $url . '</a></p>';
			if ($html = file_get_html($url)) {
				foreach ($html->find('img') as $element) {
					echo '<p>' . $element->src . '</p>';
					echo '<img src="' . $element->src . '" />';
				}
			} else {
				echo '<p>There is a problem with the URL: ' . $url . '</p>';
			}
		} else {
			echo '<p>You posted nada...</p>';
		}
	}
	?>
</body>
</html>

php Manual. (no date) Client URL Library. Available at: https://www.php.net/manual/en/book.curl (Accessed: 19 June 2024).
SourceForge. (2021) PHP Simple HTML DOM Parser. Available at: https://sourceforge.net/projects/simplehtmldom/ (Accessed: 19 June 2024).

By MisterFoxOnline

Mister Fox AKA @MisterFoxOnline is an ICT, IT and CAT Teacher who has just finished training as a Young Engineers instructor. He has a passion for technology and loves to find solutions to problems using the skills he has learned in the course of his IT career.

Related Post

Sending multi-part email with PHP

Feb 14, 2025 foxbeefly

PHP’s max_execution_time & set_time_limit()

Nov 29, 2024 MisterFoxOnline

PHP Bingo

Sep 30, 2024 foxbeefly

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.