this post was submitted on 15 Dec 2024
1 points (100.0% liked)

PHP: The latest news in the PHP world

5 readers
1 users here now

Share and discover the latest news about the PHP ecosystem and its community. Please respect r/php's rules.

founded 2 years ago
MODERATORS
 
The original post: /r/php by /u/stonedoubt on 2024-12-15 05:35:33.

Here's the main class implementation:

<?php

namespace App\Services\Screenshot;

use HeadlessChromium\Browser;
use HeadlessChromium\BrowserFactory;
use HeadlessChromium\Page;
use EchoLabs\Prism\Prism;
use EchoLabs\Prism\Enums\Provider;
use EchoLabs\Prism\ValueObjects\Messages\UserMessage;
use Imagick;

class CleanScrollingScreenshot
{
 private Browser $browser;
 private ?Page $page = null;
 private array $hiddenElements = [];

public function __construct( private string $chromePath = 'chrome', private array $browserOptions = ['headless' => true] ) { $factory = new BrowserFactory($chromePath); $this->browser = $factory->createBrowser($browserOptions); }

public function capture(string $url): string { try { $this->page = $this->browser->createPage(); $this->page->navigate($url)->waitForNavigation();

    $html = $this->getPageHtml();
    $screenshot = $this->takeInitialScreenshot();

    $distractingElements = $this->analyzePageContent($html, $screenshot);
    $this->hideElements($distractingElements);

    $cleanScreenshot = $this->takeFullPageScreenshot();
    return $this->createScrollingGif($cleanScreenshot);

} finally {
    if ($this->page) {
        $this->page->close();
    }
}

}

private function getPageHtml(): string { return $this->page->evaluate('document.documentElement.outerHTML')->getReturnValue(); }

private function takeInitialScreenshot(): string { $screenshot = $this->page->screenshot([ 'format' => 'png', 'fullPage' => true, ]);

$tempFile = tempnam(sys_get_temp_dir(), 'screenshot');
$screenshot->saveToFile($tempFile);
return $tempFile;

}

private function analyzePageContent(string $html, string $screenshotPath): array { $prompt = <<<EOT


Analyze this webpage HTML and screenshot to identify distracting elements that should be hidden for a clean scrolling screenshot. 
Focus on:
1. Advertisements
2. Fixed navigation bars
3. Popup modals
4. Cookie notifications
5. Newsletter signup forms
6. Social media widgets
7. Chat widgets

For each element, provide:
1. A CSS selector to target it
2. Why it's considered distracting
3. Confidence level (0-100)

Return the analysis in JSON format like:
{
 "elements": [
 {
 "selector": "string",
 "reason": "string",
 "confidence": number
 }
 ]
}
EOT;

try {
    $response = Prism::text()
        ->using(Provider::Anthropic, 'claude-3-sonnet')
        ->withMessages([
            new UserMessage("HTML:\n{$html}\n\nAnalyze according to instructions:\n{$prompt}")
        ])
        ->generate();

    $analysis = json_decode($response->text, true);
    return array_filter($analysis['elements'], fn($element) => $element['confidence'] > 75);

} catch (\Exception $e) {
    return [];
}

}

private function hideElements(array $elements): void { foreach ($elements as $element) { $this->page->evaluate(sprintf( 'document.querySelector("%s").style.display = "none";', addslashes($element['selector']) )); $this->hiddenElements[] = $element['selector']; }

$this->page->evaluate('window.scrollTo(0, 0)');
usleep(500000);

}

private function takeFullPageScreenshot(): string { $screenshot = $this->page->screenshot([ 'format' => 'png', 'fullPage' => true, 'captureBeyondViewport' => true ]);

$tempFile = tempnam(sys_get_temp_dir(), 'clean_screenshot');
$screenshot->saveToFile($tempFile);
return $tempFile;

}

private function createScrollingGif(string $screenshotPath): string { $source = new Imagick($screenshotPath); $width = $source->getImageWidth(); $height = $source->getImageHeight(); $viewportHeight = 1080;

$fps = 30;
$duration = 10;
$totalFrames = $fps * $duration;

$animation = new Imagick();
$animation->setFormat('gif');

for ($frame = 0; $frame < $totalFrames; $frame++) {
    $progress = $this->easeInOutQuad($frame / $totalFrames);
    $scrollY = (int)($progress * max(0, $height - $viewportHeight));

    $frameImage = clone $source;
    $frameImage->cropImage(
        $width,
        $viewportHeight,
        0,
        $scrollY
    );

    $frameImage->setImageDelay(100 / $fps);
    $animation->addImage($frameImage);
    $frameImage->destroy();
}

$animation->setImageIterations(0);
$animation->optimizeImageLayers();

$outputPath = tempnam(sys_get_temp_dir(), 'scrolling_screenshot');
$animation->writeImages($outputPath, true);

$source->destroy();
$animation->destroy();
unlink($screenshotPath);

return $outputPath;

}

private function easeInOutQuad(float $t): float { return $t < 0.5 ? 2 * $t * $t : 1 - pow(-2 * $t + 2, 2) / 2; }

public function __destruct() { $this->browser->close(); }


}

Tutorial: Creating Clean Scrolling Screenshots with Chrome PHP and AI

Prerequisites

Before starting, install these packages:


Chrome PHP installation
=======================

composer require chrome-php/chrome

Prism for AI analysis
=====================

composer require echolabs/prism

ImageMagick (Ubuntu/Debian)
===========================

sudo apt-get install imagemagick php-imagick

ImageMagick (macOS)
===================

brew install imagemagick
pecl install imagick

Basic Usage

Here's how to use the utility:

use App\Services\Screenshot\CleanScrollingScreenshot;

// Create instance
$screenshot = new CleanScrollingScreenshot();

// Capture and save
$outputPath = $screenshot->capture('<https://example.com/'>);
rename($outputPath, 'final/screenshot.gif');

Advanced Configuration

Custom Chrome Path:

php $screenshot = new CleanScrollingScreenshot( chromePath: '/usr/local/bin/chrome' );

Browser Options:

php $screenshot = new CleanScrollingScreenshot( browserOptions: [ 'headless' => false, // For debugging 'windowSize' => [1920, 1080], 'debugLogger' => 'php://stdout' ] );

How It Works

  1. Page Loading: Loads URL in headless Chrome
  2. Analysis: Uses Prism/Claude to analyze page content
  3. Cleanup: Hides distracting elements
  4. Capture: Takes full-page screenshot
  5. Animation: Creates smooth scrolling GIF

AI Analysis

The AI looks for: * Advertisements * Fixed navigation bars * Popup modals * Cookie notices * Newsletter forms * Social widgets * Chat widgets

Elements with >75% confidence are hidden.

Error Handling

php try { $screenshot = new CleanScrollingScreenshot(); $outputPath = $screenshot->capture('https://example.com/'); rename($outputPath, 'final/screenshot.gif'); } catch (\Exception $e) { error_log("Screenshot failed: " . $e->getMessage()); }

Customization

Animation Settings:

php private function createScrollingGif(string $screenshotPath): string { // Customize these $fps = 30; // Frames per second $duration = 10; // Total duration $viewportHeight = 1080; // Viewport height }

Custom Element Detection:

php $prompt = <<<EOT Analyze this webpage HTML and screenshot to identify distracting elements that should be hidden for a clean scrolling screenshot. Focus on: // Add your categories 8. Auto-playing videos 9. Floating buttons 10. Interstitial banners EOT;

Performance Tips

  1. Memory Usage:

    • Lower frame rate for big pages
    • Reduce animation duration
    • Adjust viewport height
  2. Speed Optimization:

    • Use powerful machine for Chrome
    • Adjust page load timeouts
    • Consider parallel processing

Lighter Headless Chrome

Personally, for scrapers or whatever other purpose I may have, I prefer to use ‘chrome-headless-shell’ which is a standalone and fairly portable headless chrome version that uses the old headless implementation.

npx @puppeteer/browsers install chrome-headless-shell@stable

If you didn’t know this, you can also generate pdfs and even take video using the screencast api via a headless chrome instance.


License

MIT License - feel free to modify and use as needed.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here