The original post: /r/php by /u/stonedoubt on 2024-12-15 05:35:33.
Here's the main class implementation:
<?php
namespace App\Services\Screenshot;
use HeadlessChromium\Browser;
use HeadlessChromium\BrowserFactory;
use HeadlessChromium\Page;
use EchoLabs\Prism\Prism;
use EchoLabs\Prism\Enums\Provider;
use EchoLabs\Prism\ValueObjects\Messages\UserMessage;
use Imagick;
class CleanScrollingScreenshot
{
private Browser $browser;
private ?Page $page = null;
private array $hiddenElements = [];
public function __construct(
private string $chromePath = 'chrome',
private array $browserOptions = ['headless' => true]
) {
$factory = new BrowserFactory($chromePath);
$this->browser = $factory->createBrowser($browserOptions);
}
public function capture(string $url): string
{
try {
$this->page = $this->browser->createPage();
$this->page->navigate($url)->waitForNavigation();
$html = $this->getPageHtml();
$screenshot = $this->takeInitialScreenshot();
$distractingElements = $this->analyzePageContent($html, $screenshot);
$this->hideElements($distractingElements);
$cleanScreenshot = $this->takeFullPageScreenshot();
return $this->createScrollingGif($cleanScreenshot);
} finally {
if ($this->page) {
$this->page->close();
}
}
}
private function getPageHtml(): string
{
return $this->page->evaluate('document.documentElement.outerHTML')->getReturnValue();
}
private function takeInitialScreenshot(): string
{
$screenshot = $this->page->screenshot([
'format' => 'png',
'fullPage' => true,
]);
$tempFile = tempnam(sys_get_temp_dir(), 'screenshot');
$screenshot->saveToFile($tempFile);
return $tempFile;
}
private function analyzePageContent(string $html, string $screenshotPath): array
{
$prompt = <<<EOT
Analyze this webpage HTML and screenshot to identify distracting elements that should be hidden for a clean scrolling screenshot.
Focus on:
1. Advertisements
2. Fixed navigation bars
3. Popup modals
4. Cookie notifications
5. Newsletter signup forms
6. Social media widgets
7. Chat widgets
For each element, provide:
1. A CSS selector to target it
2. Why it's considered distracting
3. Confidence level (0-100)
Return the analysis in JSON format like:
{
"elements": [
{
"selector": "string",
"reason": "string",
"confidence": number
}
]
}
EOT;
try {
$response = Prism::text()
->using(Provider::Anthropic, 'claude-3-sonnet')
->withMessages([
new UserMessage("HTML:\n{$html}\n\nAnalyze according to instructions:\n{$prompt}")
])
->generate();
$analysis = json_decode($response->text, true);
return array_filter($analysis['elements'], fn($element) => $element['confidence'] > 75);
} catch (\Exception $e) {
return [];
}
}
private function hideElements(array $elements): void
{
foreach ($elements as $element) {
$this->page->evaluate(sprintf(
'document.querySelector("%s").style.display = "none";',
addslashes($element['selector'])
));
$this->hiddenElements[] = $element['selector'];
}
$this->page->evaluate('window.scrollTo(0, 0)');
usleep(500000);
}
private function takeFullPageScreenshot(): string
{
$screenshot = $this->page->screenshot([
'format' => 'png',
'fullPage' => true,
'captureBeyondViewport' => true
]);
$tempFile = tempnam(sys_get_temp_dir(), 'clean_screenshot');
$screenshot->saveToFile($tempFile);
return $tempFile;
}
private function createScrollingGif(string $screenshotPath): string
{
$source = new Imagick($screenshotPath);
$width = $source->getImageWidth();
$height = $source->getImageHeight();
$viewportHeight = 1080;
$fps = 30;
$duration = 10;
$totalFrames = $fps * $duration;
$animation = new Imagick();
$animation->setFormat('gif');
for ($frame = 0; $frame < $totalFrames; $frame++) {
$progress = $this->easeInOutQuad($frame / $totalFrames);
$scrollY = (int)($progress * max(0, $height - $viewportHeight));
$frameImage = clone $source;
$frameImage->cropImage(
$width,
$viewportHeight,
0,
$scrollY
);
$frameImage->setImageDelay(100 / $fps);
$animation->addImage($frameImage);
$frameImage->destroy();
}
$animation->setImageIterations(0);
$animation->optimizeImageLayers();
$outputPath = tempnam(sys_get_temp_dir(), 'scrolling_screenshot');
$animation->writeImages($outputPath, true);
$source->destroy();
$animation->destroy();
unlink($screenshotPath);
return $outputPath;
}
private function easeInOutQuad(float $t): float
{
return $t < 0.5 ? 2 * $t * $t : 1 - pow(-2 * $t + 2, 2) / 2;
}
public function __destruct()
{
$this->browser->close();
}
}
Tutorial: Creating Clean Scrolling Screenshots with Chrome PHP and AI
Prerequisites
Before starting, install these packages:
Chrome PHP installation
=======================
composer require chrome-php/chrome
Prism for AI analysis
=====================
composer require echolabs/prism
ImageMagick (Ubuntu/Debian)
===========================
sudo apt-get install imagemagick php-imagick
ImageMagick (macOS)
===================
brew install imagemagick
pecl install imagick
Basic Usage
Here's how to use the utility:
use App\Services\Screenshot\CleanScrollingScreenshot;
// Create instance
$screenshot = new CleanScrollingScreenshot();
// Capture and save
$outputPath = $screenshot->capture('<https://example.com/'>);
rename($outputPath, 'final/screenshot.gif');
Advanced Configuration
Custom Chrome Path:
php $screenshot = new CleanScrollingScreenshot( chromePath: '/usr/local/bin/chrome' );
Browser Options:
php $screenshot = new CleanScrollingScreenshot( browserOptions: [ 'headless' => false, // For debugging 'windowSize' => [1920, 1080], 'debugLogger' => 'php://stdout' ] );
How It Works
- Page Loading: Loads URL in headless Chrome
- Analysis: Uses Prism/Claude to analyze page content
- Cleanup: Hides distracting elements
- Capture: Takes full-page screenshot
- Animation: Creates smooth scrolling GIF
AI Analysis
The AI looks for:
* Advertisements
* Fixed navigation bars
* Popup modals
* Cookie notices
* Newsletter forms
* Social widgets
* Chat widgets
Elements with >75% confidence are hidden.
Error Handling
php try { $screenshot = new CleanScrollingScreenshot(); $outputPath = $screenshot->capture('https://example.com/'); rename($outputPath, 'final/screenshot.gif'); } catch (\Exception $e) { error_log("Screenshot failed: " . $e->getMessage()); }
Customization
Animation Settings:
php private function createScrollingGif(string $screenshotPath): string { // Customize these $fps = 30; // Frames per second $duration = 10; // Total duration $viewportHeight = 1080; // Viewport height }
Custom Element Detection:
php $prompt = <<<EOT Analyze this webpage HTML and screenshot to identify distracting elements that should be hidden for a clean scrolling screenshot. Focus on: // Add your categories 8. Auto-playing videos 9. Floating buttons 10. Interstitial banners EOT;
Performance Tips
-
Memory Usage:
- Lower frame rate for big pages
- Reduce animation duration
- Adjust viewport height
-
Speed Optimization:
- Use powerful machine for Chrome
- Adjust page load timeouts
- Consider parallel processing
Lighter Headless Chrome
Personally, for scrapers or whatever other purpose I may have, I prefer to use ‘chrome-headless-shell’ which is a standalone and fairly portable headless chrome version that uses the old headless implementation.
npx @puppeteer/browsers install chrome-headless-shell@stable
If you didn’t know this, you can also generate pdfs and even take video using the screencast api via a headless chrome instance.
License
MIT License - feel free to modify and use as needed.