Not a specific to Yii, Goutte crawls websites and extract data from the responses (requires php 5.3).
From the website,
"Goutte is a screen scraping and web crawling library for PHP.
Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses."
Example of sending request
require_once '/path/to/goutte.phar';
use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', 'http://www.example.org/');
Some examples extracting data, using its CSS selector, ‘filter()’.
$nodes = $crawler->filter('.error_list');
// get document title
$crawler->filter('title')->text());
// get form element
$form = $crawler->filter('input[type=submit]')->form();
I’ve recently used SimpleHTMLDom for this purpose. It’s fairly easy to integrate it as an external library. In my case, I did contact the website owner first to make sure it wasn’t an issue. I’m actually leaning towards a client side solution for version 2.
My needs were fairly simple, so your mileage may vary.