Search Engine Scraper source code

Project offered by compunect [scraping@compunect.com] last successful test run: 28 Jan 2016

This advanced PHP source code is developed to power scraping based projects. While the code can already be used from console (or browser) this source is mainly a base for customization. You can either customize this project by yourself or hire us to do what we can do best. compunect is an IT services and development company founded in Germany and now situated in Czech Republic focused at professional customers.

This free Search Engine Scraper already includes:


Scraping search engines became a serious business in the past years and it remains a very challenging task. We know how difficult it can be to find an experienced developer in this area and it is hardly possible at all to find detailed information online. We took quite a step by providing this source code for free as it contains rare knowledge and there is nothing else comparable available. We still release this for free, you may use this source code in your commercial project without paying us a cent. However, if you require customization or additional features we offer such services, after all who else could do it better ? If you require a professionally managed Linux server to run your projects on: we can help you to get this accomplished at a fair rate. You definitely will require high quality, dedicated IP addresses to power your project. We offer these services as well and would be glad to find a solution for you. If you are interested in scraping projects, check out the Google Suggest Scraping Spider as well. The Suggest Scraper can generate thousands of organic search relevant terms to be scraped.

More to know about scraping

It took us months of testing and developing to get accurate results from Google when using automated scripts. This source code already includes most of this work. We even included the possibility to gather local search results, so you can scrape results from any country without using IP addresses from that country. However, to receive correct results you will also need exceptional good IP addresses. We can provide this for you if you struggle to do it on your own. Extending the source to work for Bing, Yahoo or another search engine should not be a big leap as many of the core functions will stay similar.

What to do with this tool?

There are countless very interesting activities where this scraper comes in handy. Do you invest in Google adwords to have your websites ranked for competitive search terms ? Then you likely struggle with all those thousands of keywords Google wants you to invest money in, which ones to choose and which ones are a waste of money? Imagine being able to check your website rank for thousands of keywords and key phrases and only pay for those where your website is not ranked good enoughy. You can even automate the whole process using the adwords API to pay according to your organic rank per keyword and update this monthly. And on top of Googles own suggestions, maybe there are hundreds of oragnic relevant key phrases you do not even know about ? Use the Google Suggest Scraping Spider to find what people are really looking for, then use this Google Search Scraper to find out if you are ranked already. Are you optimizing your websites for Google or are you in the SEO business optimizing for your customers ? Track thousands of websites and keywords to see where you have to invest work. That way you can also track the efficiency of your various methods to improve the rank. Or go one step further and offer your customers a graph for all their websites and keywords which shows how well your work has influenced the ranks. Or go even one more step further and analyze the ranks of hundreds of thousands worldwide companies. You can use our Google Finance Scraping Spider to get all the companies out of Google Finance. You may also make the whole project interactive for users, let them get ranks or charts according to their keywords and websites. Of course this project can also be used to just brute force get massive amounts of URLs, titles according to a set of keywords. By doing regular scrape runs and putting the results into a database with timestamp you can unleash the real power of this project, if you need help to develop such extensions I am ready for hire.

IP/Proxy management

When scraping it is most essential to avoid detection. Google would ban any user who tries to automatically scrape their search engine results. In the worst case they can throw out a ban which blocks ten thousands of IP addresses permanently. This is usually all that happens, it threatens the project but not the legal entity behind it. However there is also a legal threat. If you do not accept the search engine TOS you should not have legal threats with passively scraping it. To make sure about that you need to consult your local lawyer. In any case it is possible to avoid getting detected, the free Search Engine Scraper on this website can be used longterm without detection. a) It will send Google requests at a rate of 10 requests per hour per IP address. b) It will calculate a proper delay between each request. c) It will not accept any tracking offered by Google. d) It will rotate the IP address at the correct moments. e) It will keep a local data cache and IP history.

Google captcha blocks automated access

If following these guidelines a block by captcha due to your own actions are very unlikely. When using a different IP/Proxy service the reason most likely come from shared IP usage or previous abuse. The Google Search Scraper from here already contains code to detect, detection and abort in that case. There are different typical error messages Google issues when it decided to block or slow down activity. Here are two examples:

We're sorry... ... but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now. We'll restore your access as quickly as possible, so try again soon. In the meantime, if you suspect that your computer or network has been infected, you might want to run a virus checker or spyware remover to make sure that your systems are free of viruses and other spurious software. We apologize for the inconvenience, and hope we'll see you again on Google.

We're sorry... ... but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now. We'll restore your access as quickly as possible, so try again soon. In the meantime, if you suspect that your computer or network has been infected, you might want to run a virus checker or spyware remover to make sure that your systems are free of viruses and other spurious software. If you're continually receiving this error, you may be able to resolve the problem by deleting your Google cookie and revisiting Google. For browser-specific instructions, please consult your browser's online support center. If your entire network is affected, more information is available in the Google Web Search Help Center. We apologize for the inconvenience, and hope we'll see you again on Google. To continue searching, please type the characters you see below:

Often a captcha is offered to continue searching, in the worst case Google completely blocks all access to one or all services for one or multiple IPs. This is a worst case scenario, if you stick to the peak rates and use IPs from us-proxies.com it is unlikely you will run into this problem.

US-Proxy support

This project runs through a US Proxy service, powered through the supplied API it is possible to scrape millions of results without getting blocked. The benefit of using us-proxies.com is an easily extendable IP service providing the best IP quality in the industry at a fair price aimed toward professionals. However, the code is not limited to this particular service. You are free to adapt the source to suit your needs.

Google Search Scraper PHP code

The source code is written in PHP and is ready to be used immediately. You can either make an agreement with us-proxies for IP addresses or replace the relevant parts and use your own IP solution. Before using the source code please read the license agreement.

Example output

Here is an example result-set from a test-run:

Keyword: Scraping PHP

!Ranking information for keyword "Scraping PHP" ! !Rank [Type] - Website - Title! 1 [organic] - http://stackoverflow.com/questions/34120/html-scraping-in-php - HTML Scraping in Php - Stack Overflow 2 [organic] - http://www.oooff.com/php-scripts/basic-php-scrape-tutorial/basic-php-scraping.php - Basic PHP Web Scraping Script Tutorial - Oooff.com 3 [organic] - http://anchetawern.github.io/blog/2013/08/07/getting-started-with-web-scraping-in-php - Getting Started with Web Scraping in PHP - Wern Ancheta 4 [organic] - http://simplehtmldom.sourceforge.net/ - PHP Simple HTML DOM Parser 5 [organic] - http://www.jacobward.co.uk/web-scraping-with-php-curl-part-1/ - Web Scraping With PHP & CURL [Part 1] | Jacob WardJacob Ward 6 [organic] - https://github.com/fabpot/goutte - fabpot/Goutte · GitHub 7 [organic] - http://www.instructables.com/id/Beginning-web-page-scraping-with-php/ - Beginning web page scraping with php. - Instructables 8 [organic] - https://www.phparch.com/books/phparchitects-guide-to-web-scraping-with-php/ - php|architect's Guide to Web Scraping with PHP « php[architect ... 9 [organic] - http://scraping.pro/scraping-in-php-with-curl/ - Scraping in PHP with cURL - Web Scraping 10 [organic] - http://www.ymc.ch/en/webscraping-in-php-with-guzzle-http-and-symfony-domcrawler - Webscraping in PHP with Guzzle HTTP and Symfony DomCrawler ... 11 [organic] - https://code.tutsplus.com/tutorials/html-parsing-and-screen-scraping-with-the-simple-html-dom-library--net-11856 - HTML Parsing and Screen Scraping with the Simple HTML DOM ... 12 [organic] - http://www.eppie.net/simple-php-scraper-class/ - Simple PHP Scraper Class | - Eppie.net 13 [organic] - http://jacerdass.wordpress.com/2013/07/17/web-scrapping-done-right-using-php/ - Web scraping done right using PHP | Jacer Omri's Blog 14 [organic] - http://www.youtube.com/watch?v=632ql93H90g - Scraping Websites with PHP using DOMDocument and DOMXpath ... 15 [organic] - http://www.youtube.com/watch?v=Uv4eASStpas - PHP web scraping tutorial 1 : Automated Registration Form - YouTube 16 [organic] - http://www.devhour.net/filling-out-forms-with-php-and-curl/ - Scraping data with PHP and cURL Devhour 17 [organic] - http://www.thefutureoftheweb.com/blog/web-scrape-with-php-tutorial - Easy web scraping with PHP - The Future of the Web » Articles » 18 [organic] - http://www.devblog.co/php-web-page-scraping-tutorial/ - PHP Web Page Scraping Tutorial | DevBlog.co 19 [organic] - http://www.notprovided.eu/six-tools-web-scraping-use-data-journalism-creating-insightful-content/ - 6 tools for scraping - Use for datajournalism & insightful content 20 [organic] - http://www.packtpub.com/web-scraping-with-php/book - Instant PHP Web Scraping [Instant] | Packt Publishing 21 [organic] - http://showmethecode.es/php/php-goutte-una-libreria-para-hacer-web-scraping/ - PHP: Goutte una librería para hacer web scraping - Show me the code 22 [organic] - http://www.sitepoint.com/image-scraping-symfonys-domcrawler/ - Image Scraping with Symfony's DomCrawler - SitePoint 23 [organic] - http://www.amazon.com/Instant-PHP-Scraping-Jacob-Ward-ebook/dp/B00E7NC9CS - Amazon.com: Instant PHP Web Scraping eBook: Jacob Ward: Kindle ... 24 [organic] - http://code.google.com/p/universal-web-scraper/ - Universal Web Scraper - Google Code 25 [organic] - http://hackaday.com/2012/12/10/web-scraping-tutorial/ - Web scraping tutorial - Hack a Day 26 [organic] - http://www.tdbowman.com/?p=426 - Web Scraping Using PHP and jQuery | Managing My Impression 27 [organic] - https://barebonescms.com/documentation/ultimate_web_scraper_toolkit/ - Ultimate Web Scraper Toolkit Documentation - Barebones CMS 28 [organic] - http://www.phpclasses.org/package/1754-PHP-Extract-structured-data-from-remote-HTML-pages.html - PHP Scraper: Extract structured data from remote HTML pages ... 29 [organic] - http://imbuzu.wordpress.com/2013/06/26/web-scraping-with-php/ - Web Scraping with PHP | Buzu's Oficial Blog 30 [organic] - http://developer.yahoo.com/yql/guide/yql-code-examples.html - YQL Code Examples - YDN 31 [organic] - http://blog.wlindley.com/2013/07/easy-screen-scraping-in-php/ - Easy screen scraping in PHP | A journal of my take on this wacky world 32 [organic] - http://www.mozenda.com/php-screen-scrape - PHP Screen Scrape Software Program Tool set - Mozenda 33 [organic] - http://acrl.ala.org/techconnect/?p=3850 - Web Scraping: Creating APIs Where There Were None ACRL ... 34 [organic] - http://www.matthewwatts.net/tutorials/php-tutorial-2-advanced-data-scraping-using-curl-and-xpath/ - PHP Tutorial 2: Advanced Data Scraping Using cURL And XPATH ... 35 [organic] - http://www.martinhurford.com/screen-scraping-with-php-querypath.html - Screen Scraping with PHP and QueryPath - Martin Hurford 36 [organic] - http://www.webmasterworld.com/php/4652704.htm - Web scraping PHP Server Side Scripting forum at WebmasterWorld 37 [organic] - https://leanpub.com/web-scraping - Web Scraping for PHP… by sameer borate [Leanpub PDF/iPad/Kindle] 38 [organic] - http://www.akshitsethi.me/parsing-web-pages-in-php/ - Parsing web pages in PHP | Akshit Sethi 39 [organic] - http://google-scraper.squabbel.com/ - Scraping Google for Fun and Profit 40 [organic] - http://blog.cnizz.com/2012/10/12/scrape-faster-with-php-domdocument-and-safely-with-tor/ - Scrape Faster with PHP DomDocument and Safely with Tor | Chris ... 41 [organic] - https://classic.scraperwiki.com/docs/php/php_intro_tutorial/ - Documentation / First scraper tutorial | ScraperWiki 42 [organic] - http://snipplr.com/view/22188/ - Easy scraping and HTML parsing with PHP5 and XPath - PHP ... 43 [organic] - http://lab.abhinayrathore.com/imdb/ - Free PHP ASP.net C# VB.net IMDb Scraper API and Web Service ... 44 [organic] - http://www.lie-nielsen.com/scraping-planes/large-scraping-plane/ - Large Scraping Plane - Lie-Nielsen Toolworks 45 [organic] - http://saturnboy.com/2010/03/scraping-google-groups/ - Scraping Google Groups « Saturnboy 46 [organic] - http://www.amazon.co.uk/Instant-PHP-Scraping-Jacob-Ward/dp/1782164766 - Instant PHP Web Scraping: Amazon.co.uk: Jacob Ward: Books 47 [organic] - http://sledgedev.com/build-a-scraper-with-php/ - Sledge Dev – Build a scraper with php 48 [organic] - http://www.maxprog.com/forum/viewtopic.php?f=11 - Maxprog Forum • View topic - scraping php sites? 49 [organic] - http://books.google.com/books?id=Q-cEMrCWckkC - Instant PHP Web Scraping - Google Books Result 50 [organic] - https://www.odesk.com/o/profiles/users/_~01d067ffb7cb06ee0e/ - Sandip Debnath - Proxy&Login-Bots/Scraping/Php/Regex/Ai/Ajax ... 51 [organic] - http://saf33r.com/web-scraping-101-with-php-and-goutte - Web Scraping 101 with PHP and Goutte | Safeer 52 [organic] - http://www.redscraper.com/blog/basic-of-web-scraping-using-php/ - Basic of Web Scraping Using PHP | Redscraper Blog 53 [organic] - http://skybluesofa.com/blog/how-use-phps-domdocument-scrape-web-page/ - How to Use PHP's DOMDocument to Scrape a Web Page - Sky Blue ... 54 [organic] - http://webdata-scraping.com/data-scraping-pdf-files-using-php/ - How to do data scraping from PDF files using PHP? | WebData ... 55 [organic] - http://www.russellbeattie.com/blog/using-php-to-scrape-web-sites-as-feeds - Using PHP to scrape web sites as feeds - Russell Beattie 56 [organic] - http://www.slideshare.net/tobias382/web-scraping-with-php-presentation - Web Scraping with PHP - SlideShare 57 [organic] - http://www.hochmanconsultants.com/articles/stop-email-spam.shtml - Code to Prevent Email Address Scraping and Form Spam via PHP ... 58 [organic] - http://www.developertutorials.com/tutorials/php/easy-screen-scraping-in-php-simple-html-dom-library-simplehtmldom-398/ - Easy Screen Scraping in PHP with the Simple HTML DOM Library 59 [organic] - http://thinkdiff.net/php/php-for-web-scraping-and-bot-development/ - PHP for Web scraping and bot development | Thinkdiff.net 60 [organic] - http://rojan.com.np/scraping-nodejs-vs-php/ - Rojan's blog | Scraping – Nodejs Vs Php 61 [organic] - http://www.barattalo.it/2013/12/08/php-jquery-dom-navigating-scrape-spider/ - Scraping content with PHP as if was jQuery, PHP jQuery like methods 62 [organic] - http://www.webdeveloper.com/forum/showthread.php?230985-Blocking-php-curl-from-scraping-website-content - Blocking php curl from scraping website content - WebDeveloper.com 63 [organic] - http://www.reddit.com/r/PHP/comments/1xiygj/what_is_the_best_php_library_for_scraping/ - What is the best php library for scraping websites, and filling out ... 64 [organic] - http://neerajpro.wordpress.com/2013/09/16/web-scraping-and-bot-development-using-php/ - web scraping and bot development using PHP | OPEN LEARNING 65 [organic] - http://wiki.vuze.com/w/Scrape - Scrape - VuzeWiki 66 [organic] - http://papermashup.com/use-jquery-and-php-to-scrape-page-content/ - Use jQuery and PHP to scrape page content | Papermashup.com 67 [organic] - http://ctrlq.org/code/19064-web-scraping-amazon - Web Scraping Amazon with PHP | The Programmer's Library 68 [organic] - https://www.facebook.com/apps/site_scraping_tos_terms.php - Automated Data Collection Terms - Facebook 69 [organic] - http://tyler.io/2008/05/scraping-imdb-with-php/ - Scraping IMDB With PHP | tyler.io 70 [organic] - http://www.coderanch.com/t/549196/PHP/Solved-Regular-Expressions-Scraping - [Solved] Help With Regular Expressions/Scraping (PHP forum at ... 71 [organic] - http://web3o.blogspot.com/2010/10/php-imdb-scraper-for-new-imdb-template.html - FREE! PHP IMDb Scraper/API for new IMDb Template 72 [organic] - http://superuser.my/web-scraping-ganon-php/ - Web Scraping Using Ganon PHP Library | superuser.my 73 [organic] - http://www.hmp.is.it/scraping-a-site-with-php/ - Simple way of scraping a website using PHP - hmp.is.it 74 [organic] - http://www.scriptrr.com/ - Website Scraper | Forum Crawler | Screen Scrapping | Data Mining ... 75 [organic] - http://scraperblog.blogspot.com/2013/07/php-scrape-website-with-rotating-proxies.html - ScraperBlog: Php - scrape website with rotating proxies 76 [organic] - http://wiki.xbmc.org/index.php?title=Naming_video_files/TV_shows - Naming video files/TV shows - XBMC 77 [organic] - https://packagist.org/search/?tags=scraper - Scraper - Packagist 78 [organic] - http://codedit.com/php/web-scraping-with-php-curl - Codedit.com | Web Scraping with PHP & CURL 79 [organic] - http://www.screen-scraper.com/products/all.php - Web scraping software | screen-scraper.com 80 [organic] - http://www.warriorforum.com/programming-talk/530802-scraping-websites-use-php-regexp-something-else.html - Scraping websites - use PHP and Regexp or something else ... 81 [organic] - http://codeatomic.com/services/web-scraping/ - Code Atomic Web scraping php (web harvesting or web data ... 82 [organic] - http://jon.netdork.net/2011/02/21/nagios-web-scraping-and-php-as-an-agent - Nagios, web scraping, and PHP as an agent - TheGeekery 83 [organic] - http://www.scrapegoat.com/faqs.php - FAQs Page - Data Mining and Screen Scraping from ScrapeGoat.com 84 [organic] - http://www.indeed.com/q-PHP-Scraping-jobs.html - PHP Scraping Jobs, Employment | Indeed.com 85 [organic] - https://forums.digitalpoint.com/threads/php-screen-scraping-specific-data.2680501/ - PHP screen scraping specific data - Digital Point Forums 86 [organic] - http://codereview.stackexchange.com/questions/40538/why-is-my-web-scraping-script-so-slow - php - Why is my web scraping script so slow? - Code Review Stack ... 87 [organic] - http://www.freelancer.com/jobs/Web-Scraping/ - Web Scraping Jobs and Contests | Freelancer.com 88 [organic] - http://www.fiverr.com/systemexpert/code-a-php-scraper-that-will-scrape-5-items-from-a-website-of-your-choice - code a php scraper that will scrape 5 items from a website of your choi 89 [organic] - http://forums.macrumors.com/showthread.php?t=1689584 - Setting up a web scraping system - MacRumors Forums 90 [organic] - http://www.4shared.com/office/CC-9NLJn/php_architects_guide_to_web_sc.html - php architect's guide to web scraping with php - Download - 4shared 91 [organic] - http://raphaelstolt.blogspot.com/2008/10/scraping-websites-with-zenddomquery.html - : Scraping websites with Zend_Dom_Query 92 [organic] - http://www.codefire.org/blogs/item/data-scraping-using-curl-in-php.html - Data scraping using cURL in PHP - CodeFire 93 [organic] - http://matthewturland.com/2010/04/20/web-scraping-with-php-now-available/ - Matthew Turland » Blog Archive » “Web Scraping with PHP” Now ... 94 [organic] - http://www.xmarks.com/site/www.bradino.com/php/screen-scraping/ - PHP Screen Scraping Tutorial - Xmarks 95 [organic] - http://www.dmxzone.com/go/4402/page-scraping/ - Page Scraping - Articles - DMXzone.COM 96 [organic] - http://blog.makewebsmart.com/scraping-library-for-codeigniter-framework/136 - Scraping library for CodeIgniter Framework | MakeWebSmart 97 [organic] - http://www.phpninja.info/blog/2013/08/crawling-scraping-app-store-andor-android-market/ - Crawling and Scraping App Store and/or Android Market - Php Ninja 98 [organic] - http://www.phpdeveloper.org/tag/scraping - scraping - PHPDeveloper: PHP News, Views and Community 99 [organic] - http://www.phpbuilder.com/columns/marc_plotz011410.php3 - PHPBuilder - Build a PHP Link Scraper with cURL 100 [organic] - http://www.archiveteam.org/index.php?title=URLTeam - URLTeam - Archiveteam 101 [organic] - http://devzone.zend.com/1087/php-abstract-episode-22-screen-scraping/ - PHP Abstract Episode 22: Screen Scraping | Zend Developer Zone 102 [organic] - http://www.ngo-hung.com/blog/2012/11/03/list-of-open-source-screen-scraping-tools - List of open source screen scraping tools - Ngo The Hung's blog 103 [organic] - http://entropytc.com/screen-scraping-with-php/ - Screen scraping with PHP - Entropy Technical Consulting 104 [organic] - http://www.fromzerotoseo.com/scraping-websites-php-curl-proxy/ - Scraping websites with PHP cURL under proxy | From Zero To SEO 105 [organic] - http://www.yiiframework.com/extension/yiiscrapermodule/ - yiiscrapermodule | Extension | Yii PHP Framework 106 [organic] - https://docs.google.com/document/d/18Q2THQvYCG2_n6nKVsZRHlaPG9iJ9NvLezOOQbEuAJs/edit?hl=en - Tipsheet: Web Scraping for Non-Programmers - Google Drive 107 [organic] - http://www.digeratimarketing.co.uk/2008/12/16/curl-page-scraping-script/ - CURL Page Scraping Script - Digerati Marketing 108 [organic] - http://www.shekhargovindarajan.com/scripts/web-scraping-with-firefox-and-php-using-xpath/ - Web Scraping with Firefox and PHP, using XPath | Shekhar ... 109 [organic] - http://www.quickscrape.com/ - QuickScrape | Quick php html scraper and crawler for scraping and ... 110 [organic] - http://www.linkedin.com/groups/Php-Web-Html-Content-Scraping-4818098 - Php Web Html Content Scraping Help | LinkedIn 111 [organic] - http://forum.codecall.net/topic/77005-scraping-charts-from-this-website/ - Scraping charts from this website? - PHP - Codecall 112 [organic] - https://www.elance.com/r/contractors/q-PHP%20cURL%20Data%20Scraping - Find PHP cURL Data Scraping Freelancers & Contractors 113 [organic] - http://php.dzone.com/news/gotcha-scraping-net - Gotcha on Scraping .NET Applications with PHP and cURL | PHP ... 114 [organic] - https://itunes.apple.com/us/book/instant-php-web-scraping/id680880119?mt=11 - iTunes - Books - Instant PHP Web Scraping by Jacob Ward 115 [organic] - http://www.zacharydavidbiles.com/2012/05/scraping-pinterest-with-php/ - Scraping Pinterest with PHP | Zach Biles – Cartersville, GA Web ... 116 [organic] - http://www.ebook3000.com/php-architect-s-Guide-to-Web-Scraping-with-PHP_113893.html - php|architect's Guide to Web Scraping with PHP - Free eBooks ... 117 [organic] - http://www.weblee.co.uk/2009/06/18/simple-dom-helper-for-codeigniter/ - Simple Dom Helper codeigniter | Screen Scraping | PHP ... - Web Lee 118 [organic] - http://www.nicolasmarin.com/web-scraper-con-php/ - Web scraper con PHP | Nicolás Marín 119 [organic] - http://www.quora.com/Web-Scraping/How-do-you-scrape-asp-or-php-pages - Web Scraping: How do you scrape .asp or .php pages? - Quora 120 [organic] - http://www.urbandictionary.com/define.php?term=scraper - Urban Dictionary: scraper 121 [organic] - http://forums.phpfreaks.com/topic/276972-scraping-the-data-from-website/ - scraping the data from website - PHP Coding Help - PHP Freaks 122 [organic] - http://www.h-net.org/reviews/showrev.php?id=37101 - H-Net Reviews 123 [organic] - http://www.connotate.com/technology/product - Automated Web Data Collection | Intelligent Web Scraping | Hosted ... 124 [organic] - http://phptrends.com/dig_in/scraping - scraping - PHP Trends, libraries and frameworks 125 [organic] - http://www.tonido.com/blog/index.php/2013/12/28/web-scraping-and-legal-issues/ - Web Scraping and Legal Issues - Tonido 126 [organic] - http://elanmarikit.me/2011/03/scraping-aspnet-page-in-php-curl.html - Scraping ASP.NET page in PHP Curl | PHP/Web Development 127 [organic] - http://www.r-bloggers.com/scraping-table-from-any-web-page-with-r-or-cloudstat/ - Scraping table from any web page with R or CloudStat | (R news ... 128 [organic] - http://www.peopleperhour.com/freelance/web+scraping+php+curl - Web scraping php curl - PeoplePerHour.com 129 [organic] - http://dayat.net/introduction-to-scraping-techniques/ - Introduction To Scraping Techniques | Dayat Technologies 130 [organic] - http://robertbasic.com/blog/book-review-guide-to-web-scraping-with-php - Book review - Guide to Web Scraping with PHP ~ Robert Basic ~ the ... 131 [organic] - http://forums.whirlpool.net.au/archive/1983474 - Running a PHP scraping script - Programming - Whirlpool Forums 132 [organic] - http://www.adminspoint.com/programming/296-easy-screen-scraping-php-server-side-scripting-language-simple-html-dom-library.html - Easy Screen Scraping in PHP with the Simple HTML DOM Library 133 [organic] - http://www.hotscripts.com/forums/php/114448-data-scraping-question.html - Data Scraping Question - Hot Scripts Forums 134 [organic] - http://www.pearltrees.com/mic100/php-scraping/id4775553 - Php scraping | Pearltrees 135 [organic] - http://hublog.hubmed.org/archives/001558.html - HubLog: Scraping web pages with PHP 5 136 [organic] - http://blog.hartleybrody.com/web-scraping/ - I Don't Need No Stinking API: Web Scraping For Fun and Profit 137 [organic] - http://www.blackhatworld.com/blackhat-seo/black-hat-seo/565471-dev-php-crawler-scraping-video-sites.html - [DEV] PHP crawler for scraping video sites - Black Hat World 138 [organic] - http://deepinthecode.com/2014/02/28/scraping-div-element-web-page-php/ - Scraping a DIV Element from a Web Page with PHP – Deep in the ... 139 [organic] - http://ao2.it/en/blog/2013/07/07/tweeper-twitter-rss-web-scraper - Tweeper: a Twitter to RSS web scraper | en hacking | ao2.it 140 [organic] - http://bz9.com/index.php/youtube-scraper/ - YouTuber :: YouTube Scraper - BZ9.com 141 [organic] - https://phpacademy.org/topics/html-web-scraping-with-php/33032 - HTML Web Scraping with PHP | phpacademy 142 [organic] - http://blogoscoped.com/archive/2004_06_23_index.html - Screen-scraping With PHP5 | Googlebot Alert | Gmail Hype Ending ... 143 [organic] - http://superuser.com/questions/179253/how-legal-is-site-scraping-using-curl - php - How "legal" is site-scraping using cURL? - Super User 144 [organic] - http://osdir.com/ml/org.user-groups.php.uphpu/2008-09/msg00075.html - org.user-groups.php.uphpu - Web site scraping - msg#00075 ... 145 [organic] - http://my.safaribooksonline.com/book/programming/php/9781782164760/1dot-instant-php-web-scraping/ch01s09_html - Instant PHP Web Scraping > 1. Instant PHP Web Scraping ... 146 [organic] - https://discussion.dreamhost.com/thread-125593.html - php curl screen scraping program needs an if fork - DreamHost Forum 147 [organic] - http://www.daniweb.com/web-development/php/threads/289020/blocking-php-curl-from-scraping-website-content - Blocking php curl from scraping website content | DaniWeb 148 [organic] - http://leandroarts.com/how-to-scrape-google-search-results-for-query-popularity-with-php/ - How to scrape Google search results for query popularity with PHP ... 149 [organic] - http://jimblackler.net/blog/?p=13 - Jim Blackler · Scraping text from Wikipedia using PHP 150 [organic] - http://www.mishainthecloud.com/2009/12/screen-scraping-aspnet-application-in.html - Misha in the Cloud: Screen-scraping an ASP.NET application in PHP 151 [organic] - http://ehelion.net/projects/htmlscrape/scrape.html - Collecting data using HTML scraping - ehelion.com 152 [organic] - http://www.wellho.net/resources/ex.php4?item=h307/scraper.php - Scraping a remote URL content - PHP example 153 [organic] - http://horusss2.wordpress.com/2009/12/05/use-php-dom-parser-for-more-robust-screen-scraping/ - Use PHP DOM Parser for more robust screen scraping | THIS BLOG ... 154 [organic] - http://www.amitsamtani.com/2010/03/30/web-scraping-using-php-and-xpath/ - Web Scraping using PHP and XPath - amitsamtani.com 155 [organic] - http://99webtools.com/extract-website-data.php - Extract website data using php - Web tools 156 [organic] - http://www.iwebscraping.com/Web_Scraping_Service.php - Web Scraping Service | Web Data Scraping | Website Scraping 157 [organic] - http://www.windbusinessfactor.it/storage/video/1309/-php-architects--guide-to-web-scraping-with-php.pdf - php|architect's Guide to Web Scraping with PHP - Wind Business ... 158 [organic] - http://www.computerhope.com/forum/index.php?topic=129466.0 - PHP cURL (Scraping a website) - Computer Hope 159 [organic] - http://scrapedefender.com/education/web-scraping-job-listings/ - Data and Web Scraping Job Listings | Scrape Defender 160 [organic] - http://wordpress.org/plugins/wp-web-scrapper/other_notes/ - WordPress › WP Web Scraper « WordPress Plugins 161 [organic] - http://phpcircle.net/content/website-scraping-advantages-php - Website Scraping Advantages With PHP !! | PHPCircle 162 [organic] - http://devtrench.com/posts/screen-scrape-with-php-curl - Screen Scraping: How to Screen Scrape a Website with PHP and ... 163 [organic] - http://forums.devshed.com/php-development-5/scraping-aspx-site-php-799426.html - Scraping an aspx site with php - Dev Shed Forums 164 [organic] - http://www.internetnews.com/ec-news/article.php/3334651 - Google Moves to Block RSS Scraping - InternetNews. 165 [organic] - http://softadvice.informer.com/Php_Email_Scraper.html - Php Email Scraper - free download suggestions - Software Advice 166 [organic] - http://sourabhjainblog.wordpress.com/2013/11/13/scraping-websites-with-php-curl-under-proxy/ - Scraping websites with PHP cURL under proxy | Sourabh Jain - php ... 167 [organic] - http://nbviewer.ipython.org/url/www.unc.edu/~ncaren/Lax-1.ipynb.json - Web scraping in Python - IPython Notebook Viewer 168 [organic] - http://scrollingtext.org/using-curl-and-user-agent-string-web-scraping-pt-2-now-php - Using curl and a user agent string for web scraping pt 2; Now with PHP 169 [organic] - http://blog.amhill.net/2010/09/17/scraping-twitpics-with-php-coding/ - Scraping Twitpics with PHP [Coding] | Blog.amhill 170 [organic] - http://corgitoergosum.net/2011/01/17/replicating-flipboard-part-i-site-scraping/ - Replicating Flipboard Part I – Site Scraping | Cogito Ergo Sum 171 [organic] - http://www.earthinfo.org/xpaths-with-php-by-example/ - XPaths with PHP by example « Earth Info 172 [organic] - https://trac.transmissionbt.com/ticket/4158 - (scraping trackers of form "announce.php?key ... - Transmission 173 [organic] - http://harmssite.com/2012/01/scraping-a-page-with-php - Scraping a page with php - HarmsSite 174 [organic] - http://bytes.com/topic/php/answers/889713-blocking-php-curl-scraping-website-content - Blocking php curl from scraping website content - PHP - Bytes 175 [organic] - http://blog.digitalmethods.net/2010/asimpletwitterscraper/ - A simple Twitter scraper - Digital Methods Initiative 176 [organic] - http://www.satya-weblog.com/2010/11/play-with-yql-html-scraping-using-yql-and-php.html - Play with YQL: HTML Scraping using YQL and PHP - Satya's Weblog 177 [organic] - https://www.e-education.psu.edu/geog863/l6_p6.html - Web Scraping | GEOG 863: Mashups - e-Education Institute 178 [organic] - http://php.find-info.ru/php/010/phphks-CHP-5-SECT-12.html - PHP: Hack 44. Scrape Web Pages for Data 179 [organic] - https://support.startpage.com/index.php?/Knowledgebase/Article/View/188/23/how-does-startpage-prevent-scraping-and-abuse-without-recording-ip-addresses - How does StartPage prevent scraping and abuse without recording ... 180 [organic] - http://www.seerinteractive.com/blog/scraping-for-dummies-with-outwit-a-marketers-best-friend - Scraping for Dummies with Outwit (a Marketer's Best Friend) | SEER ... 181 [organic] - http://health.mo.gov/lab/scabies.php - Skin Scraping Exam | State Public Health Laboratory | Health ... 182 [organic] - http://junseewebdesigner.wordpress.com/2013/08/05/php-scrape-a-wordpress-feed/ - PHP Scrape a WordPress Feed | Junsee 183 [organic] - http://blog.matthewdfuller.com/2012/07/defeating-x-frame-options-with-scraping.html - Matthew D Fuller - Blog: Defeating X-Frame-Options with Scraping 184 [organic] - http://www.garysieling.com/blog/scraping-google-maps-search-results-with-javascript-and-php - Scraping Google Maps Search Results With Javascript And PHP ... 185 [organic] - http://tellini.info/2011/05/scraping-mac-app-store-reviews/ - Scraping Mac App Store reviews | Simone Tellini 186 [organic] - http://forums.thedailywtf.com/forums/p/8578/162940.aspx - Lame PHP Screen Scraping - TDWTF Forums 187 [organic] - http://www.tutorialized.com/tutorial/Wikipedia-Content-Scraper-in-PHP/81662 - PHP Web Fetching Wikipedia Content Scraper in PHP Tutorial 188 [organic] - http://www.dreamincode.net/forums/topic/9687-programatically-logging-in-and-page-scraping/ - Programatically Logging In And Page Scraping - PHP | Dream.In.Code 189 [organic] - http://alexdglover.com/web-scraping-php-and-wheel-of-fortune/ - Alex D Glover Web Scraping, PHP, and Wheel of Fortune - Fun ... 190 [organic] - http://itsrj.com/2010/12/24/scraping-sites-using-curl-xpath/ - Scraping Sites Using cURL & XPath | it's rj 191 [organic] - http://scraperlab.com/ - ScraperLab | Web Scrapers Generator 192 [organic] - http://www.gamegecko.com/game/204/scrape - Scrape - GameGecko.com 193 [organic] - http://ask.metafilter.com/98518/Web-scraping-for-dummies - Web scraping for dummies - php mysql programming | Ask MetaFilter 194 [organic] - http://kbeezie.com/scraping-google-results/ - Scraping Google Front Page Results » KBeezie 195 [organic] - http://forums.thetvdb.com/viewtopic.php?f=4 - TheTVDB.com • View topic - 503 errors using the API / Errors ... 196 [organic] - http://forums.devnetwork.net/viewtopic.php?f=1 - screen scraping a site which uses AJAX • PHP Developers Network 197 [organic] - http://technoloid.blogspot.com/2012/03/screen-scraping.html - Screen Scraping Tumblr Using Curl | Technoloid 198 [organic] - http://themanwhosoldtheweb.com/craigslist-email-scraper.php?tol - Craigslist Email Scraper - TheManWhoSoldtheWeb.com 199 [organic] - http://forums.digitizedesign.com/topic/1604-beginner-scraping-script-with-php-and-curl/ - Beginner scraping script with PHP and cURL - PHP - Digitize Design 200 [organic] - http://readwrite.com/2012/02/24/data-scraping-comes-of-age-wit - Data Scraping Comes of Age With ScraperWiki.com – ReadWrite 201 [organic] - http://www.binaryspark.com/classes/Art-of-the-scrape.pdf - Art of the scrape!!!! - BinarySpark.com 202 [organic] - http://rhodesmill.org/brandon/chapters/screen-scraping/ - Chapter 10: Screen Scraping by Brandon Rhodes - Rhodes Mill 203 [organic] - http://php.bigresource.com/Scraping-a-Secure-Site-3QvPycau.html - PHP :: Scraping A Secure Site 204 [organic] - http://www.nickycakes.com/scraping-websites-for-fun-and-profit-part-2/ - Scraping Websites for Fun and Profit Part 2 | NickyCakes.com 205 [organic] - http://books.google.com/books/about/PHP_Architect_s_Guide_to_Web_Scraping.html?id=H6O9cQAACAAJ - PHP-Architect's Guide to Web Scraping - Matthew Turland - Google ... 206 [organic] - https://community.x10hosting.com/threads/php-xpath-scraping-data-from-a-page.101059/ - PHP - XPATH - Scraping Data From A Page | x10Hosting Community 207 [organic] - http://nicklewis.org/node/962 - Stupid Simple Web Scraping with SimpleXML | Nick Lewis: The Blog 208 [organic] - http://www.newthinktank.com/2010/11/python-2-7-tutorial-pt-13-website-scraping/ - Python 2.7 Tutorial Pt 13 Website Scraping - New Think Tank 209 [organic] - http://byronwhitlock.com/FastCrawl/ - Whitlock Web Development - Fast Crawl PHP Web crawl framework 210 [organic] - http://gablaxian.com/2013/06/18/scraping-twitter-feeds-with-nodejs.html - Scraping Twitter Feeds with NodeJS | gablaxian.com 211 [organic] - http://programming.textures-tones.com/2012/01/30/basic-screen-scraping-part-1-basic-xml-parsing/ - Basic Screen Scraping – Part 1, Basic XML Parsing | programming ... 212 [organic] - http://www.nmdnet.org/2011/09/01/best-web-host-for-web-scraping-application/ - Best Web host for Web scraping application? » UMaine NMDNet 213 [organic] - http://thewebscraping.com/web-scraper-open-source-3/ - Web scraper open source | The Web Scraping 214 [organic] - http://skookum.com/blog/scraping-poorly-formatted-data-with-curl-and-phpquery/ - Scraping Poorly Formatted Data with cURL and phpQuery ... 215 [organic] - http://www.customwebscraping.com/php-web-scraping - PHP Web Scraping | Andrade Global 216 [organic] - http://www.lightspeedretail.com/blog/ - Retail Industry Blog – LightSpeed Retail POS 217 [organic] - https://www.distilled.net/blog/seo/building-your-own-scraper-for-link-analysis/ - Building Your Own Scraper for Link Analysis | Distilled 218 [organic] - http://datajournalismhandbook.org/1.0/en/getting_data_3.html - Getting Data from the Web - The Data Journalism Handbook 219 [organic] - http://jafty.com/blog/scraping-with-curl-using-cookies/ - Scraping with Curl using Cookies | Jafty Interactive Web Development 220 [organic] - http://zrashwani.com/simple-web-spider-php-goutte/ - Simple web spider with PHP Goutte | Z.Rashwani Blog 221 [organic] - http://blog.redbranch.net/2011/10/28/php-web-scraping-for-munin/ - PHP Web Scraping for Munin » Red Branch 222 [organic] - http://answers.google.com/answers/threadview/id/785059.html - Google Answers: Webscraping and WebMacros software 223 [organic] - http://www.topprojectshub.com/ - Outsourcing Data Entry, Data Scraping, Document Scanning, PHP ... 224 [organic] - http://opensourcebridge.org/sessions/97 - Web Scraping with PHP / Open Source Bridge: The conference for ... 225 [organic] - http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/6183/pdf/imm6183.pdf - Algorithms for Web Scraping 226 [organic] - http://www.armstrong-chemtec.com/rm/index.php?option=com_content - Scraped Surface Crystallizers 227 [organic] - https://joind.in/435 - Talk: Web Scraping with PHP - Joind.in 228 [organic] - https://thomashunter.name/blog/open-sourcing-my-php-web-scraper/ - Open Sourcing my PHP Web Scraper - Thomas Hunter II 229 [organic] - http://ubuntuforums.org/showthread.php?t=1259548 - [other] PHP scraping Help - Ubuntu Forums 230 [organic] - http://www.sunilb.com/php/writing-website-scrapers-in-php - Writing Website Scrapers in PHP | Geek Files 231 [organic] - http://blog.ericlamb.net/2009/01/a-journey-into-php-cli-and-scraping/ - A journey into php-cli and scraping | Made of Everything You're Not 232 [organic] - http://www.troywolf.com/articles/php/class_http/ - PHP class_http from Troy Wolf 233 [organic] - http://tutorialzine.com/2013/02/24-cool-php-libraries-you-should-know-about/ - 24 Cool PHP Libraries You Should Know About | Tutorialzine 234 [organic] - http://www.logaholic.de/2009/06/01/elegant-oop-html-scraping-with-domdocument/ - Elegant OOP HTML scraping with DOMDocument - Logaholic.de 235 [organic] - http://capelinks.net/about/internet/spamdexing/ - Spamdexing: Scrape-O-Rama ~ CapeLinks Internet Services 236 [organic] - http://www.proscraper.com/ - Professional Scraper - Website Scraping, Crawling, Data Mining ... 237 [organic] - http://www.mrwebmaster.it/php/web-scraping-php_7568.html - Il Web Scraping in PHP | PHP | Mr.Webmaster 238 [organic] - http://adamyoung.net/Quickstart-to-PHP-Screen-Scraping - Quickstart to PHP Screen Scraping | Adam Young 239 [organic] - http://blog.svnlabs.com/craigslist-scraper-tool/ - Craigslist Scraper Tool | S V N Labs Softwares 240 [organic] - http://www.codediesel.com/php/web-scraping-in-php-tutorial/ - Web scraping tutorial - CodeDiesel 241 [organic] - http://curl.phptrack.com/forum/viewtopic.php?f=1 - CURL PHP Examples • View topic - Problem scraping url - PHP CURL ... 242 [organic] - http://pp19dd.com/2009/11/php-algorithm-for-scraping-and-converting-a-twitter-list-into-rss-format-with-super-fancy-xpath-queries-in-six-awesomely-easy-steps/ - PHP algorithm for scraping and converting a twitter list into RSS ... 243 [organic] - http://forum.phux.org/viewtopic.php?f=12 - phux Development • View topic - Data Scraping - MetaCritic.com 244 [organic] - http://www.net-security.org/malware_news.php?id=1641 - Malware-driven pervasive memory scraping - Help Net Security 245 [organic] - http://www.php-forum.com/phpforum/viewtopic.php?f=2 - www.php-forum.com • View topic - Site Scraping with PHP, HTML ... 246 [organic] - http://lamp-dev.com/php-website-scraping-using-chrome-web-driver/635 - PHP Website Scraping using Chrome Web Driver | LAMPDev ... 247 [organic] - http://www.ebookgoogle.com/633701-phparchitects-guide-web-scraping-php-repost - php|architect's Guide to Web Scraping with PHP (Repost) - Study ... 248 [organic] - http://www.techsupportforum.com/forums/f49/php-screen-scraping-596493.html - PHP screen scraping - Tech Support Forum 249 [organic] - http://board.issociate.de/thread/495564/Static-andor-Dynamic-site-scraping-using-PHP.html - Static and/or Dynamic site scraping using PHP 250 [organic] - http://www.simplyhired.com/k-scraping-php-jobs.html - Scraping Php Jobs | Job Search with Simply Hired 251 [organic] - http://www.script-home.com/php-multithreaded-scraping-of-the-page-implementation-code.html - PHP multithreaded scraping of the page implementation code ... 252 [organic] - http://rottentomatoesdatascraping.blogspot.com/2013/05/managing-online-data-by-php-web-scraping.html - Managing Online Data by PHP Web Scraping - Rottentomatoes.com ... 253 [organic] - http://www.freelancer.co.uk/projects/PHP-Software-Architecture/web-scraping-php-script.html - web scraping php script | PHP | Software Architecture 254 [organic] - http://www.b.shuttle.de/hayek/Hayek/Jochen/wp/blog-en/2011/11/17/book-guide-to-web-scraping-with-php/ - book: Guide to Web Scraping with PHP | Jochen Hayek's Blog in ... 255 [organic] - http://www.solveerrors.com/forums/scraping-an-aspx-site-with-php-33513.asp - Scraping an aspx site with php - SolveErrors.com 256 [organic] - http://umuwa.com/php-web-scraping-script-download - php web scraping script download - at Umuwa 257 [organic] - http://avaxsearch.com/?q=Web%20Scraping%20PHP - Web Scraping PHP - Data on AvaxHome 258 [organic] - http://www.filestube.to/p2/php+architect+s+guide+to+web+scraping+with+php - Php architect s guide to web scraping with php download - FilesTube 259 [organic] - http://efreedom.net/Question/1-34120/HTML-Scraping-Php - HTML Scraping in Php - efreedom 260 [organic] - http://efreedom.net/Question/1-1332590/HTML-Comment-Scraping-PHP - HTML comment scraping in PHP - efreedom 261 [organic] - http://www.getacoder.com/projects/view.php?id=144412 - Scraping PHP To Mysql Database (MySQL, PHP, PHP/IIS/MS SQL) 262 [organic] - http://www.donanza.com/jobs/p3057980-php_scraping_php_mysql_scraping - Php Scraping - Php Mysql Scraping for Max. $500 - DoNanza 263 [organic] - http://www.freelancer.is/projects/PHP-MySQL/Scraping-PHP-cURL-REGEX-Experts.html - Scraping, PHP, cURL, REGEX Experts | Data Mining ... - Freelancer.is 264 [organic] - http://www.freelancer.in/job-search/web-scraping-php-simplexml-script/ - web scraping php simplexml script Freelancers and Jobs ... 265 [organic] - http://www.freelancer.com.au/projects/PHP-Software-Architecture/Scraping-site-asp-php.html - Scraping site asp - php | PHP | Software Architecture 266 [organic] - http://www.freelancer.co.za/projects/Perl/Scraping-site-asp-php-repost.html - Scraping site asp - php - repost | Perl - Freelancer.co.za 267 [organic] - http://www.freelancer.com.bd/projects/PHP-Website-Design/PHP-script-for-data-scraping.html - PHP script for data scraping - Freelancer.com.bd 268 [organic] - http://www.freelancer.ph/projects/PHP-MySQL/Web-Scraping-PHP-Preferred.html - Web Scraping (PHP Preferred) | Anything Goes | MySQL | PHP ... 269 [organic] - http://www.freelancer.pk/projects/PHP-Web-Scraping/web-scraping-bot-submit-form.html - web scraping and bot to submit form iMacros or PHP | Data Mining ... 270 [organic] - http://www.freelancer.com.jm/projects/PHP-Software-Architecture/Webpage-scraping-php-mysql-script.html - Webpage scraping php+mysql script - Freelancer.com.jm 271 [organic] - http://coding.derkeiler.com/Archive/PHP/php.general/2005-11/msg00154.html - Re: Web Screen Scraping PHP Help 272 [organic] - http://www.workingbase.com/project/PHP-login-to-a-website-programatically.2785673.html - PHP login to a website programatically (Javascript, PHP, Web ... 273 [organic] - http://www.filestube.com/p/php+architect+s+guide+to+web+scraping - Php architect s guide to web scraping download - FilesTube 274 [organic] - http://savedhistory.org/k/web-scraping-ebook-php - Web Scraping Ebook Php - savedwebhistory.org 275 [organic] - http://hostcabi.net/websites/web-scraping-php - Web Scraping Php Websites - HostCabi.net 276 [organic] - http://books.google.com/books?id=dqI-AQAAMAAJ - The Iron Age - Google Books Result 277 [organic] - http://books.google.com/books?id=64I4AQAAMAAJ - The Literary Digest - Google Books Result 278 [organic] - http://books.google.com/books?id=P54zAQAAMAAJ - Annual Report of the Pennsylvania Agricultural Experiment Station - Google Books Result 279 [organic] - http://alaskagulfcoastexpeditions.com/tf/index.php?hl=lint+traps+for+dryers - Lint traps for dryers - Alaska Gulf Coast Expeditions 280 [organic] - http://books.google.com/books?id=7W0-AQAAMAAJ - Harper's New Monthly Magazine - Google Books Result 281 [organic] - http://www.trapperman.com/forum/ubbthreads.php/topics/4403841/all/First_Time_Fleshing_Beaver - First Time Fleshing Beaver | Trapper Talk | Trapperman.com Forums 282 [organic] - http://books.google.com/books?id=nTYxAQAAMAAJ - Engineering - Google Books Result 283 [organic] - http://books.google.com/books?id=pl8vAAAAYAAJ - The country - Google Books Result 284 [organic] - http://en.wikipedia.org/wiki/Scrap - Scrap - Wikipedia, the free encyclopedia 285 [organic] - http://forum.gamesports.net/dota/showthread.php?84583-Add-metadata-to-website - Add metadata to website 286 [organic] - http://forum.the-west.net/showthread.php?p=716823 - The Tiran Wars: Liberty, at all Costs - Page 83 - Forum The West 287 [organic] - http://www.horseandhound.co.uk/forums/showthread.php?659234-Following-on-from-the-weaving-thread - Following on from the weaving thread - Horse and Hound 288 [organic] - http://forums.digitalspy.co.uk/showthread.php?p=71966376 - Why do people still buy watches? - Page 15 - General Discussion ... 289 [organic] - http://washingtondc.craigslist.org/doc/cps/4391028736.html - Database and application development asp.net php - Craigslist 290 [organic] - http://forum.bodybuilding.com/index.php - Bodybuilding.com Forums - Bodybuilding And Fitness Board 291 [organic] - http://www.disboards.com/showthread.php?p=51078875 - David's DVC rental and MDE?? - The DIS Discussion Forums ... 292 [organic] - http://worldoftanks.mmmos.com/?page=view - Side scraping, a good example - World of Tanks - MMMOs 293 [organic] - http://www.redpowermagazine.com/forums/index.php?showtopic=85925 - Finally made something out of myself. - Page 2 - Coffee Shop - Red ... 294 [organic] - http://www.redpowermagazine.com/forums/index.php?showtopic=85956 - mudslide in Washington state - Page 2 - Coffee Shop - Red Power ... 295 [organic] - http://www.dice.com/job/result/10531322/517235?src=19 - PHP Developer - Aqua Systems Inc - Roslyn, NY | dice.com - 3-28 ... 296 [organic] - http://forums.winamp.com/showthread.php?p=2988914 - Are skins lost? - Winamp Forums 297 [organic] - http://kumb.com/forum/viewtopic.php?f=2 - Knees Up Mother Brown - West Ham United FC Online: Forum • View ... 298 [organic] - http://www.wbaunofficial.org.uk/forum/showthread.php?tid=24834 - Fulham and Cardiff gone for me 299 [organic] - http://abierta.cl/index.php/abierta-act/areas/itemlist/user/706-joomlayldo - joomlayldo - Comunidad Abierta Arte, Ciencia y Tecnología 300 [organic] - http://forums.probetalk.com/showthread.php?s=5365733991fb268c77b6d46da2f40edb - Detailing KLG4. How deep to I go? - ProbeTalk.com Forums
Requirements: * PHP 5.2 or higher, PHP libCURL and PHP DOM * user permissions to write at the local directory (caching) * us proxies API support (professional IP provider)

Download the source code here: search-engine-scraper.php functions-ses.php simple_html_dom.php
search-engine-scraper.php
#!/usr/bin/php
<?php
    
/* License: 
       Open source for private and commercial use but this comment needs to stay untouched on top.
       URL of original source code: http://scraping.compunect.com
       Author of original source code: http://www.compunect.com
       IP rotation API code from here: http://www.us-proxies.com/automate
       Under no circumstances and under no legal theory, whether in tort (including negligence), contract, or otherwise, shall the Licensor be liable to anyone for any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or the use of the Original Work including, without limitation, damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses. This limitation of liability shall not apply to the extent applicable law prohibits such limitation.
       Usage exceptions:
       Public redistributing modifications of this source code project is not allowed without written agreement.
       Using this work for private and commercial projects is allowed, redistributing it is not allowed without our written agreement.
     */

    
ini_set("memory_limit","64M"); // For scraping 100 results pages 32MB memory expected, for scraping the default 10 results pages 4MB are expected. 64MB is selected just in case.
    
ini_set("xdebug.max_nesting_level","2000"); // precaution, might not be required. our parser will require a deep nesting level but I did not check how deep a 100 result page actually is.
    
error_reporting(E_ALL & ~E_NOTICE);
    
// ************************* Configuration variables *************************
    // Your api credentials, you need a plan at us-proxies.com
    // It's optional, you can remove the proxy related parts and just use it as a single-IP tool. Just make sure to implement a request delay of around 3-5 minutes in that case.
    
$pwd your-key;
    
$uid your-account-id;

    
// General configuration
    
$test_website_url "website.com"// The URL, or a sub-string of it, of the indexed website.
    
$test_keywords "keyword,another keyword,more keywords"// comma separated keywords to test the rank for
    
$test_max_pages 3// The number of result pages to test until giving up per keyword.
    
$test_100_resultpage 0// Warning: Google ranking results may  become inaccurate

    /* Local result configuration. Enter 'help' to receive a list of possible choices. use global and en for the default worldwide results in english 
     * You need to define a country as well as the language. Visit the Google domain of the specific country to see the available languages.
     * Only a correct combination of country and language will return the correct search engine result pages. */
    
$test_country "global"// Country code. "global" is default. Use "help" to receive a list of available codes. [com,us,uk,fr,de,...]
    
$test_language "en"// Language code. "EN" is default Use "help" to receive a list. Visit the local Google domain to find available langauges of that domain. [en,fr,de,...]
    
$filter 1// 0 for no filter (recommended for maximizing content), 1 for normal filter (recommended for accuracy)
    
$force_cache 0// set this to 1 if you wish to force the loading of cache files, even if the files are older than 24 hours. Set to -1 if you wish to force a new scrape.
    
$load_all_ranks 1/* set this to 0 if you wish to stop scraping once the $test_website_url has been found in the search engine results,
                         * if set to 1 all $test_max_pages will be downloaded. This might be useful for more detailed ranking analysis.*/

    
$show_html 0// 1 means: output formated with HTML tags. 0 means output for console (recommended script usage)
    
$show_all_ranks 1// set to 1 to display a complete list of all ranks per keyword, set to 0 to only display the ranks for the specified website
    // ***************************************************************************
    
$working_dir "./local_cache"// local directory. This script needs permissions to write into it


    
require "functions-ses.php";


$page 0;
$PROXY = array(); // after the rotate api call this variable contains these elements: [address](proxy host),[port](proxy port),[external_ip](the external IP),[ready](0/1)
$PLAN = array();
$results = array();


if (
$show_html$NL "<br>\n"; else $NL "\n";
if (
$show_html$HR "<hr>\n"; else $HR "_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_\n";
if (
$show_html$B "<b>"; else $B "!";
if (
$show_html$B_ "</b>"; else $B_ "!";


/*
 * Start of main()
 */

if ($show_html)
{
    echo 
"<html><body>";
}

$keywords explode(","$test_keywords);
if (!
count($keywords)) die ("Error: no keywords defined.$NL");
if (!
rmkdir($working_dir)) die("Failed to create/open $working_dir$NL");

$country_data get_google_cc($test_country$test_language);
if (!
$country_data) die("Invalid country/language code specified.$NL");


$ready get_license();
if (!
$ready) die("The specified API key account for user $uid is not active or invalid. $NL");
if (
$PLAN['protocol'] != "http") die("Wrong proxy protocol configured, switch to HTTP. $NL");

echo 
"$NL$B Search Engine Scraper for $test_website_url initated $B_ $NL$NL";

/*
 * This loop iterates through all keyword combinations
 */
$ch NULL;
$rotate_ip 0// variable that triggers an IP rotation (normally only during keyword changes)
$max_errors_total 3// abort script if there are 3 keywords that can not be scraped (something is going wrong and needs to be checked)

$rank_data = array();
$siterank_data = array();

$break=0// variable used to cancel loop without losing ranking data
foreach ($keywords as $keyword)
{
    
$rank 0;
    
$max_errors_page 5// abort script if there are 5 errors in a row, that should not happen

    
if ($test_max_pages <= 0) break;
    
$search_string urlencode($keyword);
    
$rotate_ip 1// IP rotation for each new keyword

    /*
    * This loop iterates through all result pages for the given keyword
    */
    
for ($page 0$page $test_max_pages$page++)
    {
        
$serp_data load_cache($search_string$page$country_data$force_cache); // load results from local cache if available for today
        
$maxpages 0;

        if (!
$serp_data)
        {
            
$ip_ready check_ip_usage(); // test if ip has not been used within the critical time
            
while (!$ip_ready || $rotate_ip)
            {
                
$ok rotate_proxy(); // start/rotate to the IP that has not been started for the longest time, also tests if proxy connection is working
                
if ($ok != 1)
                {
                    die (
"Fatal error: proxy rotation failed:$NL $ok$NL");
                }
                
$ip_ready check_ip_usage(); // test if ip has not been used within the critical time
                
if (!$ip_ready)
                {
                    die(
"ERROR: No fresh IPs left, try again later. $NL");
                } else
                {
                    
$rotate_ip 0// ip rotated
                    
break; // continue
                
}
            }

            
delay_time(); // stop scraping based on the license size to spread scrapes best possible and avoid detection
            
global $scrape_result// contains metainformation from the scrape_serp_google() function
            
$raw_data scrape_google($search_string$page$country_data); // scrape html from search engine
            
if ($scrape_result != "SCRAPE_SUCCESS")
            {
                if (
$max_errors_page--)
                {
                    echo 
"There was an error scraping (Code: $scrape_result), trying again .. $NL";
                    
$page--;
                    continue;
                } else
                {
                    
$page--;
                    if (
$max_errors_total--)
                    {
                        echo 
"Too many errors scraping keyword $search_string (at page $page). Skipping remaining pages of keyword $search_string .. $NL";
                        break;
                    } else
                    {
                        die (
"ERROR: Max keyword errors reached, something is going wrong. $NL");
                    }
                    break;
                }
            }
            
mark_ip_usage(); // store IP usage, this is very important to avoid detection and gray/blacklistings
            
global $process_result// contains metainformation from the process_raw() function
            
$serp_data process_raw_v2($raw_data$page); // process the html and put results into $serp_data

            
if (($process_result == "PROCESS_SUCCESS_MORE") || ($process_result == "PROCESS_SUCCESS_LAST"))
            {
                
$result_count count($serp_data);
                
$serp_data['page'] = $page;
                if (
$process_result != "PROCESS_SUCCESS_LAST")
                {
                    
$serp_data['lastpage'] = 1;
                } else
                {
                    
$serp_data['lastpage'] = 0;
                }
                
$serp_data['keyword'] = $keyword;
                
$serp_data['cc'] = $country_data['cc'];
                
$serp_data['lc'] = $country_data['lc'];
                
$serp_data['result_count'] = $result_count;
                
store_cache($serp_data$search_string$page$country_data); // store results into local cache
            
}

            if (
$process_result != "PROCESS_SUCCESS_MORE")
            {
                
$break=1;
                
//break;
            
// last page
            
if (!$load_all_ranks)
            {
                for (
$n 0$n $result_count$n++)
                    if (
strstr($results[$n]['url'], $test_website_url))
                    {
                        
verbose("Located $test_website_url within search results.$NL");
                        
$break=1;
                        
//break;
                    
}
            }

        } 
// scrape clause

        
$result_count $serp_data['result_count'];

        for (
$ref 0$ref $result_count$ref++)
        {
            
$rank++;
            
$rank_data[$keyword][$rank]['title'] = $serp_data[$ref]['title'];
            
$rank_data[$keyword][$rank]['url']  = $serp_data[$ref]['url'];
            
$rank_data[$keyword][$rank]['host'] = $serp_data[$ref]['host'];
            
$rank_data[$keyword][$rank]['desc'] = $serp_data[$ref]['desc'];
            
$rank_data[$keyword][$rank]['type'] = $serp_data[$ref]['type'];
            
//$rank_data[$keyword][$rank]['desc']=$serp_data['desc'']; // not really required
            
if (strstr($rank_data[$keyword][$rank]['url'], $test_website_url))
            {
                
$info = array();
                
$info['rank'] = $rank;
                
$info['url'] = $rank_data[$keyword][$rank]['url'];
                
$siterank_data[$keyword][] = $info;
            }
        }
        if (
$break == 1) break;

    } 
// page loop
// keyword loop

if ($show_all_ranks)
{
    foreach (
$rank_data as $keyword => $ranks)
    {
        echo 
"$NL$NL$B"Ranking information for keyword \"$keyword\" $B_$NL";
        echo 
"$B"Rank [Type] - Website -  Title$B_$NL";
        
$pos 0;
        foreach (
$ranks as $rank)
        {
            
$pos++;
            if (
strstr($rank['url'], $test_website_url))
            {
                echo 
"$B$pos [$rank[type]] - $rank[url] - $rank[title] $B_$NL";
//                    echo $rank['desc']."\n";
            
} else
            {
                echo 
"$pos [$rank[type]] - $rank[url] - $rank[title] $NL";
//                    echo $rank['desc']."\n";
            
}
        }
    }
}


foreach (
$keywords as $keyword)
{
    if (!isset(
$siterank_data[$keyword]))
    {
        echo 
"$NL$B"The specified site was not found in the search results for keyword \"$keyword\". $B_$NL";
    } else
    {
        
$siteranks $siterank_data[$keyword];
        echo 
"$NL$NL$B"Ranking information for keyword \"$keyword\" and website \"$test_website_url\" [$test_country / $test_language$B_$NL";
        foreach (
$siteranks as $siterank)
            echo 
"Rank $siterank[rank] for URL $siterank[url]$NL";
    }
}
//var_dump($siterank_data);


if ($show_html)
{
    echo 
"</body></html>";
}



?>
functions-ses.php
<?PHP
    
/* License: 
       Open source for private and commercial use but this comment needs to stay untouched on top.
       URL of original source code: http://scraping.compunect.com
       Author of original source code: http://www.compunect.com
       IP rotation API code from here: http://www.us-proxies.com/automate
       Under no circumstances and under no legal theory, whether in tort (including negligence), contract, or otherwise, shall the Licensor be liable to anyone for any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or the use of the Original Work including, without limitation, damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses. This limitation of liability shall not apply to the extent applicable law prohibits such limitation.
       Usage exceptions:
       Public redistributing modifications of this source code project is not allowed without written agreement.
       Using this work for private and commercial projects is allowed, redistributing it is not allowed without our written agreement.
     */

    
function verbose($text)
    {
        echo 
$text;
    }

    
/*
     * By default (no force) the function will load cached data within 24 hours otherwise reject the cache.
     * Google does not change its ranking too frequently, that's why 24 hours has been chosen.
     *
     * Multithreading: When multithreading you need to work on a proper locking mechanism
     */
    
function load_cache($search_string$page$country_data$force_cache)
    {
        global 
$working_dir;
        global 
$NL;
        global 
$test_100_resultpage;

        if (
$force_cache 0) return NULL;
        
$lc $country_data['lc'];
        
$cc $country_data['cc'];
        if (
$test_100_resultpage)
        {
            
$hash md5($search_string "_" $lc "_" $cc "." $page ".100p");
        } else
        {
            
$hash md5($search_string "_" $lc "_" $cc "." $page);
        }
        
$file "$working_dir/$hash.cache";
        
$now time();
        if (
file_exists($file))
        {
            
$ut filemtime($file);
            
$dif $now $ut;
            
$hour = (int)($dif / (60 60));
            if (
$force_cache || ($dif < (60 60 24)))
            {
                
$serdata file_get_contents($file);
                
$serp_data unserialize($serdata);
                
verbose("Cache: loaded file $file for $search_string and page $page. File age: $hour hours$NL");

                return 
$serp_data;
            }

            return 
NULL;
        } else
        {
            return 
NULL;
        }

    }

    
/*
     * Multithreading: When multithreading you need to work on a proper locking mechanism
     */
    
function store_cache($serp_data$search_string$page$country_data)
    {
        global 
$working_dir;
        global 
$NL;
        global 
$test_100_resultpage;

        
$lc $country_data['lc'];
        
$cc $country_data['cc'];
        if (
$test_100_resultpage)
        {
            
$hash md5($search_string "_" $lc "_" $cc "." $page ".100p");
        } else
        {
            
$hash md5($search_string "_" $lc "_" $cc "." $page);
        }
        
$file "$working_dir/$hash.cache";
        
$now time();
        if (
file_exists($file))
        {
            
$ut filemtime($file);
            
$dif $now $ut;
            if (
$dif < (60 60 24)) echo "Warning: cache storage initated for $search_string page $page which was already cached within the past 24 hours!$NL";
        }
        
$serdata serialize($serp_data);
        
file_put_contents($file$serdataLOCK_EX);
        
verbose("Cache: stored file $file for $search_string and page $page.$NL");
    }

    
// check_ip_usage() must be called before first use of mark_ip_usage()
    
function check_ip_usage()
    {
        global 
$PROXY;
        global 
$working_dir;
        global 
$NL;
        global 
$ip_usage_data// usage data object as array

        
if (!isset($PROXY['ready'])) return 0// proxy not ready/started
        
if (!$PROXY['ready']) return 0// proxy not ready/started

        
if (!isset($ip_usage_data))
        {
            if (!
file_exists($working_dir "/ipdata.obj")) // usage data object as file
            
{
                echo 
"Warning!$NL"The ipdata.obj file was not found, if this is the first usage of the rank checker everything is alright.$NL"Otherwise removal or failure to access the ip usage data will lead to damage of the IP quality.$NL$NL";
                
sleep(5);
                
$ip_usage_data = array();
            } else
            {
                
$ser_data file_get_contents($working_dir "/ipdata.obj");
                
$ip_usage_data unserialize($ser_data);
            }
        }

        if (!isset(
$ip_usage_data[$PROXY['external_ip']]))
        {
            
verbose("IP $PROXY[external_ip] is ready for use $NL");

            return 
1// the IP was not used yet
        
}
        if (!isset(
$ip_usage_data[$PROXY['external_ip']]['requests'][20]['ut_google']))
        {
            
verbose("IP $PROXY[external_ip] is ready for use $NL");

            return 
1// the IP has not been used 20+ times yet, return true
        
}
        
$ut_last = (int)$ip_usage_data[$PROXY['external_ip']]['ut_last-usage']; // last time this IP was used
        
$req_total = (int)$ip_usage_data[$PROXY['external_ip']]['request-total']; // total number of requests made by this IP
        
$req_20 = (int)$ip_usage_data[$PROXY['external_ip']]['requests'][10]['ut_google']; // the 20th request (if IP was used 20+ times) unixtime stamp [changed to 10 due to Google issues]

        
$now time();
        if ((
$now $req_20) > (60 60))
        {
            
verbose("IP $PROXY[external_ip] is ready for use $NL");

            return 
1// more than an hour passed since 20th usage of this IP [changed to 10]
        
} else
        {
            
$cd_sec = (60 60) - ($now $req_20);
            
verbose("IP $PROXY[external_ip] needs $cd_sec seconds cooldown, not ready for use yet $NL");

            return 
0// the IP is overused, it can not be used for scraping without being detected by the search engine yet
        
}

    }


    
// return 1 if license is ready, otherwise 0
    
function get_license()
    {
        global 
$uid;
        global 
$pwd;
        global 
$PLAN;
        global 
$NL;

        
$res ip_service("plan");
        
$ip "";
        if (
$res <= 0)
        {
            
verbose("API error: Proxy API connection failed (Error $res). trying again later..$NL$NL");

            return 
0;
        } else
        {
            (
$PLAN['active'] == 1) ? $ready "active" $ready "not active";
            
verbose("API success: Account is $ready.$NL");
            if (
$PLAN['active'] == 1) return 1;

            return 
0;
        }

        return 
$PLAN;
    }

    
/* Delay (sleep) based on the license size to allow optimal scraping
     *
     * Warning!
     * Do NOT change the delay to be shorter than the specified delay.
     * When scraping Google you should never do more than 20 requests per hour per IP address
     * The recommended value is 10, if you must go higher you can go up to 20 but I'd stay lower
     * This function will create a delay based on your total IP addresses.
     *
     * Together with the IP management functions this will ensure that your IPs stay healthy (no wrong rankings) and undetected (no virus warnings, blacklists, captchas)
     *
     * Multithreading:
     * When multithreading you need to multiply the delay time ($d) by the number of threads
     *
     * Due to Google getting stricter and stricter you might even have to lower the rate.
     */
    
function delay_time()
    {
        global 
$NL;
        global 
$PLAN;

        
$d = (3600 1000000 / (((float)$PLAN['total_ips']) * 10));
        
verbose("Delay based on plan size.. $NL");
        
usleep($d);
    }

    
/*
     * Updates and stores the ip usage data object
     * Marks an IP as used and re-sorts the access array 
     */
    
function mark_ip_usage()
    {
        global 
$PROXY;
        global 
$working_dir;
        global 
$NL;
        global 
$ip_usage_data// usage data object as array

        
if (!isset($ip_usage_data)) die("ERROR: Incorrect usage. check_ip_usage() needs to be called once before mark_ip_usage()!$NL");
        
$now time();

        
$ip_usage_data[$PROXY['external_ip']]['ut_last-usage'] = $now// last time this IP was used
        
if (!isset($ip_usage_data[$PROXY['external_ip']]['request-total'])) $ip_usage_data[$PROXY['external_ip']]['request-total'] = 0;
        
$ip_usage_data[$PROXY['external_ip']]['request-total']++; // total number of requests made by this IP
        // shift fifo queue
        
for ($req 19$req >= 1$req--)
        {
            if (isset(
$ip_usage_data[$PROXY['external_ip']]['requests'][$req]['ut_google']))
            {
                
$ip_usage_data[$PROXY['external_ip']]['requests'][$req 1]['ut_google'] = $ip_usage_data[$PROXY['external_ip']]['requests'][$req]['ut_google'];
            }
        }
        
$ip_usage_data[$PROXY['external_ip']]['requests'][1]['ut_google'] = $now;

        
$serdata serialize($ip_usage_data);
        
file_put_contents($working_dir "/ipdata.obj"$serdataLOCK_EX);

    }


    
// access google based on parameters and return raw html or "0" in case of an error
    
function scrape_google($search_string$page$local_data)
    {
        global 
$ch;
        global 
$NL;
        global 
$PROXY;
        global 
$PLAN;
        global 
$scrape_result;
        global 
$test_100_resultpage;
        global 
$filter;
        
$scrape_result "";

        
$google_ip $local_data['domain'];
        
$hl $local_data['lc'];

        if (
$page == 0)
        {
            if (
$test_100_resultpage)
            {
                
$url "http://$google_ip/search?q=$search_string&hl=$hl&ie=utf-8&as_qdr=all&aq=t&rls=org:mozilla:us:official&client=firefox&num=100&filter=$filter";
            } else
            {
                
$url "http://$google_ip/search?q=$search_string&hl=$hl&ie=utf-8&as_qdr=all&aq=t&rls=org:mozilla:us:official&client=firefox&num=10&filter=$filter";
            }
        } else
        {

            if (
$test_100_resultpage)
            {
                
$num $page 100;
                
$url "http://$google_ip/search?q=$search_string&hl=$hl&ie=utf-8&as_qdr=all&aq=t&rls=org:mozilla:us:official&client=firefox&start=$num&num=100&filter=$filter";
            } else
            {
                
$num $page 10;
                
$url "http://$google_ip/search?q=$search_string&hl=$hl&ie=utf-8&as_qdr=all&aq=t&rls=org:mozilla:us:official&client=firefox&start=$num&num=10&filter=$filter";
            }
        }
        
//verbose("Debug, Search URL: $url$NL");

        
curl_setopt($chCURLOPT_URL$url);
        
$htmdata curl_exec($ch);
        if (!
$htmdata)
        {
            
$error curl_error($ch);
            
$info curl_getinfo($ch);
            echo 
"\tError scraping: $error [ $error ]$NL";
            
$scrape_result "SCRAPE_ERROR";
            
sleep(3);

            return 
"";
        } else
        {
            if (
strlen($htmdata) < 20)
            {
                
$scrape_result "SCRAPE_EMPTY_SERP";
                
sleep(3);

                return 
"";
            }
        }


        if (
strstr($htmdata"computer virus or spyware application"))
        {
            echo(
"Google blocked us, we need more proxies ! Make sure you did not damage the IP management functions. Consider changing keywords and lowering request rates. $NL");
            
$scrape_result "SCRAPE_DETECTED";
            die();
        }
        if (
strstr($htmdata"entire network is affected"))
        {
            echo(
"Google blocked us, we need more proxies ! Make sure you did not damage the IP management functions. Consider changing keywords and lowering request rates. $NL");
            
$scrape_result "SCRAPE_DETECTED";
            die();
        }
        if (
strstr($htmdata"http://www.download.com/Antivirus"))
        {
            echo(
"Google blocked us, we need more proxies ! Make sure you did not damage the IP management functions. Consider changing keywords and lowering request rates. $NL");
            
$scrape_result "SCRAPE_DETECTED";
            die();
        }
        if (
strstr($htmdata"/images/yellow_warning.gif"))
        {
            echo(
"Google blocked us, we need more proxies ! Make sure you did not damage the IP management functions. Consider changing keywords and lowering request rates. $NL");
            
$scrape_result "SCRAPE_DETECTED";
            die();
        }
        if (
strstr($htmdata"This page appears when Google automatically detects requests coming from your computer network"))
        {
            echo(
"Google blocked us, we need more proxies ! Make sure you did not damage the IP management functions. Consider changing keywords and lowering request rates. $NL");
            
$scrape_result "SCRAPE_DETECTED";
            die();
        }
        
$scrape_result "SCRAPE_SUCCESS";

        return 
$htmdata;
    }

    require_once 
"simple_html_dom.php";
    function 
process_raw_v2($data$page)
    {
        global 
$process_result// contains metainformation from the process_raw() function
        
global $test_100_resultpage;
        global 
$NL;
        global 
$B;
        global 
$B_;
        
$results=array();

        
$html = new simple_html_dom();
        
$html->load($data);
        
/** @var $interest simple_html_dom_node */
        
$interest $html->find('div#ires ol div.g');
        echo 
"found interesting elements: ".count($interest)."\n";
        
$interest_num=0;
        foreach (
$interest as $li)
        {
            
$result = array('title'=>'undefined','host'=>'undefined','url'=>'undefined','desc'=>'undefined','type'=>'organic');
            
$interest_num ++;
            
$h3 $li->find('h3.r',0);
            if (!
$h3)
            {
                continue;
            }
            
$a $h3->find('a',0);
            if (!
$a) continue;
            
$result['title'] = html_entity_decode($a->plaintext);
            
$lnk urldecode($a->href);
            if (
$lnk)
            {
                
preg_match('/(ht[^&]*)/'$lnk$m);
                if (
$m && $m[1])
                {
                    
$result['url']=$m[1];
                    
$tmp=parse_url($m[1]);
                    
$result['host']=$tmp['host'];
                } else
                {
                    if (
strstr($result['title'],'News')) $result['type']='news';
                    if (
strstr($result['title'],'Images')) $result['type']='images';
                }
            }
            if (
$result['type']=='organic')
            {
                
$sp $li->find('span.st',0);
                if (
$sp)
                {
                    
$result['desc']=html_entity_decode($sp->plaintext);
                    
$sp->clear();
                }
            }
            
$h3->clear();
            
$a->clear();
            
$li->clear();
            
$results[]=$result;
        }
        
$html->clear;





        
// Analyze if more results are available (next page)
        
$next 0;
        if (
strstr($data"Next</a>"))
        {
            
$next 1;
        } else
        {
            if (
$test_100_resultpage)
            {
                
$needstart = ($page 1) * 100;
            } else
            {
                
$needstart = ($page 1) * 10;
            }
            
$findstr "start=$needstart";
            if (
strstr($data$findstr)) $next 1;
        }
        
$page++;
        if (
$next)
        {
            
$process_result "PROCESS_SUCCESS_MORE"// more data available
        
} else
        {
            
$process_result "PROCESS_SUCCESS_LAST";
        } 
// last page reached

        
return $results;
    }

    function 
rotate_proxy()
    {
        global 
$PROXY;
        global 
$ch;
        global 
$NL;
        
$max_errors 3;
        
$success 0;
        while (
$max_errors--)
        {
            
$res ip_service("rotate"); // will fill $PROXY
            
$ip "";
            if (
$res <= 0)
            {
                
verbose("API error: Proxy API connection failed (Error $res). trying again soon..$NL$NL");
                
sleep(21); // retry after a while
            
} else
            {
                
verbose("API success: Received proxy IP $PROXY[external_ip] on port $PROXY[port]$NL");
                
$success 1;
                break;
            }
        }
        if (
$success)
        {
            
$ch new_curl_session($ch);

            return 
1;
        } else
        {
            return 
"API rotation failed. Check license, firewall and API credentials.$NL";
        }
    }


    function 
extractBody($response_str)
    {
        
$parts preg_split('|(?:\r?\n){2}|m'$response_str2);
        if (isset(
$parts[1])) return $parts[1];

        return 
'';
    }

    
/*
     * This is the API function to retrieve US IP addresses
     * On success this function will define the global $PROXY variable, adding the elements ready,address,port,external_ip and return 1
     * On failure the return is 0 or smaller and the PROXY variable ready element is set to "0"
     * To obtain a plan please check out us-proxies.com, this can often be handled within a day
     */

    
function ip_service($cmd$x "")
    {
        global 
$pwd;
        global 
$uid;
        global 
$PROXY;
        global 
$PLAN;
        global 
$NL;

        
$fp fsockopen("us-proxies.com"80);
        if (!
$fp)
        {
            echo 
"Unable to connect to API $NL";

            return -
1// connection not possible
        
} else
        {
            if (
$cmd == "plan")
            {
                
fwrite($fp"GET /api.php?api=1&uid=$uid&pwd=$pwd&cmd=plan&extended=1 HTTP/1.0\r\nHost: us-proxies.com\r\nAccept: text/html, text/plain, text/*, */*;q=0.01\r\nAccept-Encoding: plain\r\nAccept-Language: en\r\n\r\n");

                
stream_set_timeout($fp8);
                
$res "";
                
$n 0;
                while (!
feof($fp))
                {
                    if (
$n++ > 4) break;
                    
$res .= fread($fp8192);
                }
                
$info stream_get_meta_data($fp);
                
fclose($fp);

                if (
$info['timed_out'])
                {
                    echo 
'API: Connection timed out! $NL';
                    
$PLAN['active'] = 0;

                    return -
2// api timeout
                
} else
                {
                    if (
strlen($res) > 1000) return -3// invalid api response (check the API website for possible problems)
                    
$data extractBody($res);
                    
$ar explode(":"$data);
                    if (
count($ar) < 4) return -100// invalid api response
                    
switch ($ar[0])
                    {
                        case 
"ERROR":
                            echo 
"API Error: $res $NL";
                            
$PLAN['active'] = 0;

                            return 
0// Error received
                            
break;
                        case 
"PLAN":
                            
$PLAN['max_ips'] = $ar[1]; // number of IPs licensed
                            
$PLAN['total_ips'] = $ar[2]; // number of IPs assigned
                            
$PLAN['protocol'] = $ar[3]; // current proxy protocol (http, socks, ..)
                            
$PLAN['processes'] = $ar[4]; // number of available proxy processes
                            
if ($PLAN['total_ips'] > 0$PLAN['active'] = 1; else $PLAN['active'] = 0;

                            return 
1;
                            break;
                        default:
                            echo 
"API Error: Received answer $ar[0], expected \"PLAN\"";
                            
$PLAN['active'] = 0;

                            return -
101// unknown API response
                    
}
                }

            } 
// cmd==plan


            
if ($cmd == "rotate")
            {
                
$PROXY['ready'] = 0;
                
fwrite($fp"GET /api.php?api=1&uid=$uid&pwd=$pwd&cmd=rotate&randomness=0&offset=0 HTTP/1.0\r\nHost: us-proxies.com\r\nAccept: text/html, text/plain, text/*, */*;q=0.01\r\nAccept-Encoding: plain\r\nAccept-Language: en\r\n\r\n");
                
stream_set_timeout($fp8);
                
$res "";
                
$n 0;
                while (!
feof($fp))
                {
                    if (
$n++ > 4) break;
                    
$res .= fread($fp8192);
                }
                
$info stream_get_meta_data($fp);
                
fclose($fp);

                if (
$info['timed_out'])
                {
                    echo 
'API: Connection timed out! $NL';

                    return -
2// api timeout
                
} else
                {
                    if (
strlen($res) > 1000) return -3// invalid api response (check the API website for possible problems)
                    
$data extractBody($res);
                    
$ar explode(":"$data);
                    if (
count($ar) < 4) return -100// invalid api response
                    
switch ($ar[0])
                    {
                        case 
"ERROR":
                            echo 
"API Error: $res $NL";

                            return 
0// Error received
                            
break;
                        case 
"ROTATE":
                            
$PROXY['address'] = $ar[1];
                            
$PROXY['port'] = $ar[2];
                            
$PROXY['external_ip'] = $ar[3];
                            
$PROXY['ready'] = 1;
                            
usleep(230000); // additional time to avoid connecting during proxy bootup phase, removing this can cause random connection failures but will increase overall performance for large IP licenses
                            
return 1;
                            break;
                        default:
                            echo 
"API Error: Received answer $ar[0], expected \"ROTATE\"";

                            return -
101// unknown API response
                    
}
                }
            } 
// cmd==rotate
        
}
    }




    function 
getip()
    {
        global 
$PROXY;
        if (!
$PROXY['ready']) return -1// proxy not ready

        
$curl_handle curl_init();
        
curl_setopt($curl_handleCURLOPT_URL'http://ipcheck.ipnetic.com/remote_ip.php'); // returns the real IP
        
curl_setopt($curl_handleCURLOPT_CONNECTTIMEOUT10);
        
curl_setopt($curl_handleCURLOPT_TIMEOUT10);
        
curl_setopt($curl_handleCURLOPT_RETURNTRANSFER1);
        
$curl_proxy "$PROXY[address]:$PROXY[port]";
        
curl_setopt($curl_handleCURLOPT_PROXY$curl_proxy);
        
$tested_ip curl_exec($curl_handle);

        if (
preg_match("^([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}^"$tested_ip))
        {
            
curl_close($curl_handle);

            return 
$tested_ip;
        } else
        {
            
$info curl_getinfo($curl_handle);
            
curl_close($curl_handle);

            return 
0// possible error would be a wrong authentication IP or a firewall
        
}
    }


    function 
new_curl_session($ch NULL)
    {
        global 
$PROXY;
        if ((!isset(
$PROXY['ready'])) || (!$PROXY['ready'])) return $ch// proxy not ready

        
if (isset($ch) && ($ch != NULL))
        {
            
curl_close($ch);
        }
        
$ch curl_init();
        
curl_setopt($chCURLOPT_HEADER0);
        
curl_setopt($chCURLOPT_FOLLOWLOCATION1);
        
curl_setopt($chCURLOPT_RETURNTRANSFER1);
        
$curl_proxy "$PROXY[address]:$PROXY[port]";
        
curl_setopt($chCURLOPT_PROXY$curl_proxy);
        
curl_setopt($chCURLOPT_CONNECTTIMEOUT20);
        
curl_setopt($chCURLOPT_TIMEOUT20);
        
curl_setopt($chCURLOPT_USERAGENT"Mozilla/5.0 (Windows; U; Windows NT 5.0; en; rv:1.9.0.4) Gecko/2009011913 Firefox/3.0.6");



        return 
$ch;
    }


    function 
rmkdir($path$mode 0755)
    {
        if (
file_exists($path)) return 1;

        return @
mkdir($path$mode);
    }


    
/*
     * For country&language specific searches
     * The identifier codes require an active plan at us-proxies.com
     * If you plan to omit the IP service just replace that part too or do not use language specifications at all
     */
    
function get_google_cc($cc$lc)
    {
        global 
$pwd;
        global 
$uid;
        global 
$PROXY;
        global 
$PLAN;
        global 
$NL;
        
$fp fsockopen("us-proxies.com"80);
        if (!
$fp)
        {
            echo 
"Unable to connect to google_cc API of us-proxies.com $NL";

            return 
NULL// connection not possible
        
} else
        {
//            echo("GET /g_api.php?api=1&uid=$uid&pwd=$pwd&cmd=google_cc&cc=$cc&lc=$lc HTTP/1.0\r\nHost: us-proxies.com\r\nAccept: text/html, text/plain, text/*, */*;q=0.01\r\nAccept-Encoding: plain\r\nAccept-Language: en\r\n\r\n");
            
fwrite($fp"GET /g_api.php?api=1&uid=$uid&pwd=$pwd&cmd=google_cc&cc=$cc&lc=$lc HTTP/1.0\r\nHost: us-proxies.com\r\nAccept: text/html, text/plain, text/*, */*;q=0.01\r\nAccept-Encoding: plain\r\nAccept-Language: en\r\n\r\n");
            
stream_set_timeout($fp8);
            
$res "";
            
$n 0;
            while (!
feof($fp))
            {
                if (
$n++ > 4) break;
                
$res .= fread($fp8192);
            }
            
$info stream_get_meta_data($fp);
            
fclose($fp);

            if (
$info['timed_out'])
            {
                echo 
'API: Connection timed out! $NL';

                return 
NULL// api timeout
            
} else
            {
                
$data extractBody($res);
                
$obj unserialize($data);
                if (isset(
$obj['error'])) echo $obj['error'] . "$NL";
                if (isset(
$obj['info'])) echo $obj['info'] . "$NL";

                return 
$obj['data'];

                if (
strlen($data) < 4) return NULL// invalid api response
            
}
        }
    }


?>
simple_html_dom.php
<?php
/**
 * Website: http://sourceforge.net/projects/simplehtmldom/
 * Acknowledge: Jose Solorzano (https://sourceforge.net/projects/php-html/)
 * Contributions by:
 *     Yousuke Kumakura (Attribute filters)
 *     Vadim Voituk (Negative indexes supports of "find" method)
 *     Antcs (Constructor with automatically load contents either text or file/url)
 *
 * all affected sections have comments starting with "PaperG"
 *
 * Paperg - Added case insensitive testing of the value of the selector.
 * Paperg - Added tag_start for the starting index of tags - NOTE: This works but not accurately.
 *  This tag_start gets counted AFTER \r\n have been crushed out, and after the remove_noice calls so it will not reflect the REAL position of the tag in the source,
 *  it will almost always be smaller by some amount.
 *  We use this to determine how far into the file the tag in question is.  This "percentage will never be accurate as the $dom->size is the "real" number of bytes the dom was created from.
 *  but for most purposes, it's a really good estimation.
 * Paperg - Added the forceTagsClosed to the dom constructor.  Forcing tags closed is great for malformed html, but it CAN lead to parsing errors.
 * Allow the user to tell us how much they trust the html.
 * Paperg add the text and plaintext to the selectors for the find syntax.  plaintext implies text in the innertext of a node.  text implies that the tag is a text node.
 * This allows for us to find tags based on the text they contain.
 * Create find_ancestor_tag to see if a tag is - at any level - inside of another specific tag.
 * Paperg: added parse_charset so that we know about the character set of the source document.
 *  NOTE:  If the user's system has a routine called get_last_retrieve_url_contents_content_type availalbe, we will assume it's returning the content-type header from the
 *  last transfer or curl_exec, and we will parse that and use it in preference to any other method of charset detection.
 *
 * Found infinite loop in the case of broken html in restore_noise.  Rewrote to protect from that.
 * PaperG (John Schlick) Added get_display_size for "IMG" tags.
 *
 * Licensed under The MIT License
 * Redistributions of files must retain the above copyright notice.
 *
 * @author S.C. Chen <me578022@gmail.com>
 * @author John Schlick
 * @author Rus Carroll
 * @version 1.5 ($Rev: 196 $)
 * @package PlaceLocalInclude
 * @subpackage simple_html_dom
 */

/**
 * All of the Defines for the classes below.
 * @author S.C. Chen <me578022@gmail.com>
 */
define('HDOM_TYPE_ELEMENT'1);
define('HDOM_TYPE_COMMENT'2);
define('HDOM_TYPE_TEXT',    3);
define('HDOM_TYPE_ENDTAG',  4);
define('HDOM_TYPE_ROOT',    5);
define('HDOM_TYPE_UNKNOWN'6);
define('HDOM_QUOTE_DOUBLE'0);
define('HDOM_QUOTE_SINGLE'1);
define('HDOM_QUOTE_NO',     3);
define('HDOM_INFO_BEGIN',   0);
define('HDOM_INFO_END',     1);
define('HDOM_INFO_QUOTE',   2);
define('HDOM_INFO_SPACE',   3);
define('HDOM_INFO_TEXT',    4);
define('HDOM_INFO_INNER',   5);
define('HDOM_INFO_OUTER',   6);
define('HDOM_INFO_ENDSPACE',7);
define('DEFAULT_TARGET_CHARSET''UTF-8');
define('DEFAULT_BR_TEXT'"\r\n");
define('DEFAULT_SPAN_TEXT'" ");
define('MAX_FILE_SIZE'600000);
// helper functions
// -----------------------------------------------------------------------------
// get html dom from file
// $maxlen is defined in the code as PHP_STREAM_COPY_ALL which is defined as -1.
function file_get_html($url$use_include_path false$context=null$offset = -1$maxLen=-1$lowercase true$forceTagsClosed=true$target_charset DEFAULT_TARGET_CHARSET$stripRN=true$defaultBRText=DEFAULT_BR_TEXT$defaultSpanText=DEFAULT_SPAN_TEXT)
{
    
// We DO force the tags to be terminated.
    
$dom = new simple_html_dom(null$lowercase$forceTagsClosed$target_charset$stripRN$defaultBRText$defaultSpanText);
    
// For sourceforge users: uncomment the next line and comment the retreive_url_contents line 2 lines down if it is not already done.
    
$contents file_get_contents($url$use_include_path$context$offset);
    
// Paperg - use our own mechanism for getting the contents as we want to control the timeout.
    //$contents = retrieve_url_contents($url);
    
if (empty($contents) || strlen($contents) > MAX_FILE_SIZE)
    {
        return 
false;
    }
    
// The second parameter can force the selectors to all be lowercase.
    
$dom->load($contents$lowercase$stripRN);
    return 
$dom;
}

// get html dom from string
function str_get_html($str$lowercase=true$forceTagsClosed=true$target_charset DEFAULT_TARGET_CHARSET$stripRN=true$defaultBRText=DEFAULT_BR_TEXT$defaultSpanText=DEFAULT_SPAN_TEXT)
{
    
$dom = new simple_html_dom(null$lowercase$forceTagsClosed$target_charset$stripRN$defaultBRText$defaultSpanText);
    if (empty(
$str) || strlen($str) > MAX_FILE_SIZE)
    {
        
$dom->clear();
        return 
false;
    }
    
$dom->load($str$lowercase$stripRN);
    return 
$dom;
}

// dump html dom tree
function dump_html_tree($node$show_attr=true$deep=0)
{
    
$node->dump($node);
}


/**
 * simple html dom node
 * PaperG - added ability for "find" routine to lowercase the value of the selector.
 * PaperG - added $tag_start to track the start position of the tag in the total byte index
 *
 * @package PlaceLocalInclude
 */
class simple_html_dom_node
{
    public 
$nodetype HDOM_TYPE_TEXT;
    public 
$tag 'text';
    public 
$attr = array();
    public 
$children = array();
    public 
$nodes = array();
    public 
$parent null;
    
// The "info" array - see HDOM_INFO_... for what each element contains.
    
public $_ = array();
    public 
$tag_start 0;
    private 
$dom null;

    function 
__construct($dom)
    {
        
$this->dom $dom;
        
$dom->nodes[] = $this;
    }

    function 
__destruct()
    {
        
$this->clear();
    }

    function 
__toString()
    {
        return 
$this->outertext();
    }

    
// clean up memory due to php5 circular references memory leak...
    
function clear()
    {
        
$this->dom null;
        
$this->nodes null;
        
$this->parent null;
        
$this->children null;
    }

    
// dump node's tree
    
function dump($show_attr=true$deep=0)
    {
        
$lead str_repeat('    '$deep);

        echo 
$lead.$this->tag;
        if (
$show_attr && count($this->attr)>0)
        {
            echo 
'(';
            foreach (
$this->attr as $k=>$v)
                echo 
"[$k]=>\"".$this->$k.'", ';
            echo 
')';
        }
        echo 
"\n";

        if (
$this->nodes)
        {
            foreach (
$this->nodes as $c)
            {
                
$c->dump($show_attr$deep+1);
            }
        }
    }


    
// Debugging function to dump a single dom node with a bunch of information about it.
    
function dump_node($echo=true)
    {

        
$string $this->tag;
        if (
count($this->attr)>0)
        {
            
$string .= '(';
            foreach (
$this->attr as $k=>$v)
            {
                
$string .= "[$k]=>\"".$this->$k.'", ';
            }
            
$string .= ')';
        }
        if (
count($this->_)>0)
        {
            
$string .= ' $_ (';
            foreach (
$this->as $k=>$v)
            {
                if (
is_array($v))
                {
                    
$string .= "[$k]=>(";
                    foreach (
$v as $k2=>$v2)
                    {
                        
$string .= "[$k2]=>\"".$v2.'", ';
                    }
                    
$string .= ")";
                } else {
                    
$string .= "[$k]=>\"".$v.'", ';
                }
            }
            
$string .= ")";
        }

        if (isset(
$this->text))
        {
            
$string .= " text: (" $this->text ")";
        }

        
$string .= " HDOM_INNER_INFO: '";
        if (isset(
$node->_[HDOM_INFO_INNER]))
        {
            
$string .= $node->_[HDOM_INFO_INNER] . "'";
        }
        else
        {
            
$string .= ' NULL ';
        }

        
$string .= " children: " count($this->children);
        
$string .= " nodes: " count($this->nodes);
        
$string .= " tag_start: " $this->tag_start;
        
$string .= "\n";

        if (
$echo)
        {
            echo 
$string;
            return;
        }
        else
        {
            return 
$string;
        }
    }

    
// returns the parent of node
    // If a node is passed in, it will reset the parent of the current node to that one.
    
function parent($parent=null)
    {
        
// I am SURE that this doesn't work properly.
        // It fails to unset the current node from it's current parents nodes or children list first.
        
if ($parent !== null)
        {
            
$this->parent $parent;
            
$this->parent->nodes[] = $this;
            
$this->parent->children[] = $this;
        }

        return 
$this->parent;
    }

    
// verify that node has children
    
function has_child()
    {
        return !empty(
$this->children);
    }

    
// returns children of node
    
function children($idx=-1)
    {
        if (
$idx===-1)
        {
            return 
$this->children;
        }
        if (isset(
$this->children[$idx])) return $this->children[$idx];
        return 
null;
    }

    
// returns the first child of node
    
function first_child()
    {
        if (
count($this->children)>0)
        {
            return 
$this->children[0];
        }
        return 
null;
    }

    
// returns the last child of node
    
function last_child()
    {
        if ((
$count=count($this->children))>0)
        {
            return 
$this->children[$count-1];
        }
        return 
null;
    }

    
// returns the next sibling of node
    
function next_sibling()
    {
        if (
$this->parent===null)
        {
            return 
null;
        }

        
$idx 0;
        
$count count($this->parent->children);
        while (
$idx<$count && $this!==$this->parent->children[$idx])
        {
            ++
$idx;
        }
        if (++
$idx>=$count)
        {
            return 
null;
        }
        return 
$this->parent->children[$idx];
    }

    
// returns the previous sibling of node
    
function prev_sibling()
    {
        if (
$this->parent===null) return null;
        
$idx 0;
        
$count count($this->parent->children);
        while (
$idx<$count && $this!==$this->parent->children[$idx])
            ++
$idx;
        if (--
$idx<0) return null;
        return 
$this->parent->children[$idx];
    }

    
// function to locate a specific ancestor tag in the path to the root.
    
function find_ancestor_tag($tag)
    {
        global 
$debugObject;
        if (
is_object($debugObject)) { $debugObject->debugLogEntry(1); }

        
// Start by including ourselves in the comparison.
        
$returnDom $this;

        while (!
is_null($returnDom))
        {
            if (
is_object($debugObject)) { $debugObject->debugLog(2"Current tag is: " $returnDom->tag); }

            if (
$returnDom->tag == $tag)
            {
                break;
            }
            
$returnDom $returnDom->parent;
        }
        return 
$returnDom;
    }

    
// get dom node's inner html
    
function innertext()
    {
        if (isset(
$this->_[HDOM_INFO_INNER])) return $this->_[HDOM_INFO_INNER];
        if (isset(
$this->_[HDOM_INFO_TEXT])) return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);

        
$ret '';
        foreach (
$this->nodes as $n)
            
$ret .= $n->outertext();
        return 
$ret;
    }

    
// get dom node's outer text (with tag)
    
function outertext()
    {
        global 
$debugObject;
        if (
is_object($debugObject))
        {
            
$text '';
            if (
$this->tag == 'text')
            {
                if (!empty(
$this->text))
                {
                    
$text " with text: " $this->text;
                }
            }
            
$debugObject->debugLog(1'Innertext of tag: ' $this->tag $text);
        }

        if (
$this->tag==='root') return $this->innertext();

        
// trigger callback
        
if ($this->dom && $this->dom->callback!==null)
        {
            
call_user_func_array($this->dom->callback, array($this));
        }

        if (isset(
$this->_[HDOM_INFO_OUTER])) return $this->_[HDOM_INFO_OUTER];
        if (isset(
$this->_[HDOM_INFO_TEXT])) return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);

        
// render begin tag
        
if ($this->dom && $this->dom->nodes[$this->_[HDOM_INFO_BEGIN]])
        {
            
$ret $this->dom->nodes[$this->_[HDOM_INFO_BEGIN]]->makeup();
        } else {
            
$ret "";
        }

        
// render inner text
        
if (isset($this->_[HDOM_INFO_INNER]))
        {
            
// If it's a br tag...  don't return the HDOM_INNER_INFO that we may or may not have added.
            
if ($this->tag != "br")
            {
                
$ret .= $this->_[HDOM_INFO_INNER];
            }
        } else {
            if (
$this->nodes)
            {
                foreach (
$this->nodes as $n)
                {
                    
$ret .= $this->convert_text($n->outertext());
                }
            }
        }

        
// render end tag
        
if (isset($this->_[HDOM_INFO_END]) && $this->_[HDOM_INFO_END]!=0)
            
$ret .= '</'.$this->tag.'>';
        return 
$ret;
    }

    
// get dom node's plain text
    
function text()
    {
        if (isset(
$this->_[HDOM_INFO_INNER])) return $this->_[HDOM_INFO_INNER];
        switch (
$this->nodetype)
        {
            case 
HDOM_TYPE_TEXT: return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);
            case 
HDOM_TYPE_COMMENT: return '';
            case 
HDOM_TYPE_UNKNOWN: return '';
        }
        if (
strcasecmp($this->tag'script')===0) return '';
        if (
strcasecmp($this->tag'style')===0) return '';

        
$ret '';
        
// In rare cases, (always node type 1 or HDOM_TYPE_ELEMENT - observed for some span tags, and some p tags) $this->nodes is set to NULL.
        // NOTE: This indicates that there is a problem where it's set to NULL without a clear happening.
        // WHY is this happening?
        
if (!is_null($this->nodes))
        {
            foreach (
$this->nodes as $n)
            {
                
$ret .= $this->convert_text($n->text());
            }

            
// If this node is a span... add a space at the end of it so multiple spans don't run into each other.  This is plaintext after all.
            
if ($this->tag == "span")
            {
                
$ret .= $this->dom->default_span_text;
            }


        }
        return 
$ret;
    }

    function 
xmltext()
    {
        
$ret $this->innertext();
        
$ret str_ireplace('<![CDATA['''$ret);
        
$ret str_replace(']]>'''$ret);
        return 
$ret;
    }

    
// build node's text with tag
    
function makeup()
    {
        
// text, comment, unknown
        
if (isset($this->_[HDOM_INFO_TEXT])) return $this->dom->restore_noise($this->_[HDOM_INFO_TEXT]);

        
$ret '<'.$this->tag;
        
$i = -1;

        foreach (
$this->attr as $key=>$val)
        {
            ++
$i;

            
// skip removed attribute
            
if ($val===null || $val===false)
                continue;

            
$ret .= $this->_[HDOM_INFO_SPACE][$i][0];
            
//no value attr: nowrap, checked selected...
            
if ($val===true)
                
$ret .= $key;
            else {
                switch (
$this->_[HDOM_INFO_QUOTE][$i])
                {
                    case 
HDOM_QUOTE_DOUBLE$quote '"'; break;
                    case 
HDOM_QUOTE_SINGLE$quote '\''; break;
                    default: 
$quote '';
                }
                
$ret .= $key.$this->_[HDOM_INFO_SPACE][$i][1].'='.$this->_[HDOM_INFO_SPACE][$i][2].$quote.$val.$quote;
            }
        }
        
$ret $this->dom->restore_noise($ret);
        return 
$ret $this->_[HDOM_INFO_ENDSPACE] . '>';
    }

    
// find elements by css selector
    //PaperG - added ability for find to lowercase the value of the selector.
    
function find($selector$idx=null$lowercase=false)
    {
        
$selectors $this->parse_selector($selector);
        if ((
$count=count($selectors))===0) return array();
        
$found_keys = array();

        
// find each selector
        
for ($c=0$c<$count; ++$c)
        {
            
// The change on the below line was documented on the sourceforge code tracker id 2788009
            // used to be: if (($levle=count($selectors[0]))===0) return array();
            
if (($levle=count($selectors[$c]))===0) return array();
            if (!isset(
$this->_[HDOM_INFO_BEGIN])) return array();

            
$head = array($this->_[HDOM_INFO_BEGIN]=>1);

            
// handle descendant selectors, no recursive!
            
for ($l=0$l<$levle; ++$l)
            {
                
$ret = array();
                foreach (
$head as $k=>$v)
                {
                    
$n = ($k===-1) ? $this->dom->root $this->dom->nodes[$k];
                    
//PaperG - Pass this optional parameter on to the seek function.
                    
$n->seek($selectors[$c][$l], $ret$lowercase);
                }
                
$head $ret;
            }

            foreach (
$head as $k=>$v)
            {
                if (!isset(
$found_keys[$k]))
                    
$found_keys[$k] = 1;
            }
        }

        
// sort keys
        
ksort($found_keys);

        
$found = array();
        foreach (
$found_keys as $k=>$v)
            
$found[] = $this->dom->nodes[$k];

        
// return nth-element or array
        
if (is_null($idx)) return $found;
        else if (
$idx<0$idx count($found) + $idx;
        return (isset(
$found[$idx])) ? $found[$idx] : null;
    }

    
// seek for given conditions
    // PaperG - added parameter to allow for case insensitive testing of the value of a selector.
    
protected function seek($selector, &$ret$lowercase=false)
    {
        global 
$debugObject;
        if (
is_object($debugObject)) { $debugObject->debugLogEntry(1); }

        list(
$tag$key$val$exp$no_key) = $selector;

        
// xpath index
        
if ($tag && $key && is_numeric($key))
        {
            
$count 0;
            foreach (
$this->children as $c)
            {
                if (
$tag==='*' || $tag===$c->tag) {
                    if (++
$count==$key) {
                        
$ret[$c->_[HDOM_INFO_BEGIN]] = 1;
                        return;
                    }
                }
            }
            return;
        }

        
$end = (!empty($this->_[HDOM_INFO_END])) ? $this->_[HDOM_INFO_END] : 0;
        if (
$end==0) {
            
$parent $this->parent;
            while (!isset(
$parent->_[HDOM_INFO_END]) && $parent!==null) {
                
$end -= 1;
                
$parent $parent->parent;
            }
            
$end += $parent->_[HDOM_INFO_END];
        }

        for (
$i=$this->_[HDOM_INFO_BEGIN]+1$i<$end; ++$i) {
            
$node $this->dom->nodes[$i];

            
$pass true;

            if (
$tag==='*' && !$key) {
                if (
in_array($node$this->childrentrue))
                    
$ret[$i] = 1;
                continue;
            }

            
// compare tag
            
if ($tag && $tag!=$node->tag && $tag!=='*') {$pass=false;}
            
// compare key
            
if ($pass && $key) {
                if (
$no_key) {
                    if (isset(
$node->attr[$key])) $pass=false;
                } else {
                    if ((
$key != "plaintext") && !isset($node->attr[$key])) $pass=false;
                }
            }
            
// compare value
            
if ($pass && $key && $val  && $val!=='*') {
                
// If they have told us that this is a "plaintext" search then we want the plaintext of the node - right?
                
if ($key == "plaintext") {
                    
// $node->plaintext actually returns $node->text();
                    
$nodeKeyValue $node->text();
                } else {
                    
// this is a normal search, we want the value of that attribute of the tag.
                    
$nodeKeyValue $node->attr[$key];
                }
                if (
is_object($debugObject)) {$debugObject->debugLog(2"testing node: " $node->tag " for attribute: " $key $exp $val " where nodes value is: " $nodeKeyValue);}

                
//PaperG - If lowercase is set, do a case insensitive test of the value of the selector.
                
if ($lowercase) {
                    
$check $this->match($expstrtolower($val), strtolower($nodeKeyValue));
                } else {
                    
$check $this->match($exp$val$nodeKeyValue);
                }
                if (
is_object($debugObject)) {$debugObject->debugLog(2"after match: " . ($check "true" "false"));}

                
// handle multiple class
                
if (!$check && strcasecmp($key'class')===0) {
                    foreach (
explode(' ',$node->attr[$key]) as $k) {
                        
// Without this, there were cases where leading, trailing, or double spaces lead to our comparing blanks - bad form.
                        
if (!empty($k)) {
                            if (
$lowercase) {
                                
$check $this->match($expstrtolower($val), strtolower($k));
                            } else {
                                
$check $this->match($exp$val$k);
                            }
                            if (
$check) break;
                        }
                    }
                }
                if (!
$check$pass false;
            }
            if (
$pass$ret[$i] = 1;
            unset(
$node);
        }
        
// It's passed by reference so this is actually what this function returns.
        
if (is_object($debugObject)) {$debugObject->debugLog(1"EXIT - ret: "$ret);}
    }

    protected function 
match($exp$pattern$value) {
        global 
$debugObject;
        if (
is_object($debugObject)) {$debugObject->debugLogEntry(1);}

        switch (
$exp) {
            case 
'=':
                return (
$value===$pattern);
            case 
'!=':
                return (
$value!==$pattern);
            case 
'^=':
                return 
preg_match("/^".preg_quote($pattern,'/')."/"$value);
            case 
'$=':
                return 
preg_match("/".preg_quote($pattern,'/')."$/"$value);
            case 
'*=':
                if (
$pattern[0]=='/') {
                    return 
preg_match($pattern$value);
                }
                return 
preg_match("/".$pattern."/i"$value);
        }
        return 
false;
    }

    protected function 
parse_selector($selector_string) {
        global 
$debugObject;
        if (
is_object($debugObject)) {$debugObject->debugLogEntry(1);}

        
// pattern of CSS selectors, modified from mootools
        // Paperg: Add the colon to the attrbute, so that it properly finds <tag attr:ibute="something" > like google does.
        // Note: if you try to look at this attribute, yo MUST use getAttribute since $dom->x:y will fail the php syntax check.
// Notice the \[ starting the attbute?  and the @? following?  This implies that an attribute can begin with an @ sign that is not captured.
// This implies that an html attribute specifier may start with an @ sign that is NOT captured by the expression.
// farther study is required to determine of this should be documented or removed.
//        $pattern = "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";
        
$pattern "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";
        
preg_match_all($patterntrim($selector_string).' '$matchesPREG_SET_ORDER);
        if (
is_object($debugObject)) {$debugObject->debugLog(2"Matches Array: "$matches);}

        
$selectors = array();
        
$result = array();
        
//print_r($matches);

        
foreach ($matches as $m) {
            
$m[0] = trim($m[0]);
            if (
$m[0]==='' || $m[0]==='/' || $m[0]==='//') continue;
            
// for browser generated xpath
            
if ($m[1]==='tbody') continue;

            list(
$tag$key$val$exp$no_key) = array($m[1], nullnull'='false);
            if (!empty(
$m[2])) {$key='id'$val=$m[2];}
            if (!empty(
$m[3])) {$key='class'$val=$m[3];}
            if (!empty(
$m[4])) {$key=$m[4];}
            if (!empty(
$m[5])) {$exp=$m[5];}
            if (!empty(
$m[6])) {$val=$m[6];}

            
// convert to lowercase
            
if ($this->dom->lowercase) {$tag=strtolower($tag); $key=strtolower($key);}
            
//elements that do NOT have the specified attribute
            
if (isset($key[0]) && $key[0]==='!') {$key=substr($key1); $no_key=true;}

            
$result[] = array($tag$key$val$exp$no_key);
            if (
trim($m[7])===',') {
                
$selectors[] = $result;
                
$result = array();
            }
        }
        if (
count($result)>0)
            
$selectors[] = $result;
        return 
$selectors;
    }

    function 
__get($name) {
        if (isset(
$this->attr[$name]))
        {
            return 
$this->convert_text($this->attr[$name]);
        }
        switch (
$name) {
            case 
'outertext': return $this->outertext();
            case 
'innertext': return $this->innertext();
            case 
'plaintext': return $this->text();
            case 
'xmltext': return $this->xmltext();
            default: return 
array_key_exists($name$this->attr);
        }
    }

    function 
__set($name$value) {
        switch (
$name) {
            case 
'outertext': return $this->_[HDOM_INFO_OUTER] = $value;
            case 
'innertext':
                if (isset(
$this->_[HDOM_INFO_TEXT])) return $this->_[HDOM_INFO_TEXT] = $value;
                return 
$this->_[HDOM_INFO_INNER] = $value;
        }
        if (!isset(
$this->attr[$name])) {
            
$this->_[HDOM_INFO_SPACE][] = array(' ''''');
            
$this->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_DOUBLE;
        }
        
$this->attr[$name] = $value;
    }

    function 
__isset($name) {
        switch (
$name) {
            case 
'outertext': return true;
            case 
'innertext': return true;
            case 
'plaintext': return true;
        }
        
//no value attr: nowrap, checked selected...
        
return (array_key_exists($name$this->attr)) ? true : isset($this->attr[$name]);
    }

    function 
__unset($name) {
        if (isset(
$this->attr[$name]))
            unset(
$this->attr[$name]);
    }

    
// PaperG - Function to convert the text from one character set to another if the two sets are not the same.
    
function convert_text($text)
    {
        global 
$debugObject;
        if (
is_object($debugObject)) {$debugObject->debugLogEntry(1);}

        
$converted_text $text;

        
$sourceCharset "";
        
$targetCharset "";

        if (
$this->dom)
        {
            
$sourceCharset strtoupper($this->dom->_charset);
            
$targetCharset strtoupper($this->dom->_target_charset);
        }
        if (
is_object($debugObject)) {$debugObject->debugLog(3"source charset: " $sourceCharset " target charaset: " $targetCharset);}

        if (!empty(
$sourceCharset) && !empty($targetCharset) && (strcasecmp($sourceCharset$targetCharset) != 0))
        {
            
// Check if the reported encoding could have been incorrect and the text is actually already UTF-8
            
if ((strcasecmp($targetCharset'UTF-8') == 0) && ($this->is_utf8($text)))
            {
                
$converted_text $text;
            }
            else
            {
                
$converted_text iconv($sourceCharset$targetCharset$text);
            }
        }

        
// Lets make sure that we don't have that silly BOM issue with any of the utf-8 text we output.
        
if ($targetCharset == 'UTF-8')
        {
            if (
substr($converted_text03) == "\xef\xbb\xbf")
            {
                
$converted_text substr($converted_text3);
            }
            if (
substr($converted_text, -3) == "\xef\xbb\xbf")
            {
                
$converted_text substr($converted_text0, -3);
            }
        }

        return 
$converted_text;
    }

    
/**
    * Returns true if $string is valid UTF-8 and false otherwise.
    *
    * @param mixed $str String to be tested
    * @return boolean
    */
    
static function is_utf8($str)
    {
        
$c=0$b=0;
        
$bits=0;
        
$len=strlen($str);
        for(
$i=0$i<$len$i++)
        {
            
$c=ord($str[$i]);
            if(
$c 128)
            {
                if((
$c >= 254)) return false;
                elseif(
$c >= 252$bits=6;
                elseif(
$c >= 248$bits=5;
                elseif(
$c >= 240$bits=4;
                elseif(
$c >= 224$bits=3;
                elseif(
$c >= 192$bits=2;
                else return 
false;
                if((
$i+$bits) > $len) return false;
                while(
$bits 1)
                {
                    
$i++;
                    
$b=ord($str[$i]);
                    if(
$b 128 || $b 191) return false;
                    
$bits--;
                }
            }
        }
        return 
true;
    }
    
/*
    function is_utf8($string)
    {
        //this is buggy
        return (utf8_encode(utf8_decode($string)) == $string);
    }
    */

    /**
     * Function to try a few tricks to determine the displayed size of an img on the page.
     * NOTE: This will ONLY work on an IMG tag. Returns FALSE on all other tag types.
     *
     * @author John Schlick
     * @version April 19 2012
     * @return array an array containing the 'height' and 'width' of the image on the page or -1 if we can't figure it out.
     */
    
function get_display_size()
    {
        global 
$debugObject;

        
$width = -1;
        
$height = -1;

        if (
$this->tag !== 'img')
        {
            return 
false;
        }

        
// See if there is aheight or width attribute in the tag itself.
        
if (isset($this->attr['width']))
        {
            
$width $this->attr['width'];
        }

        if (isset(
$this->attr['height']))
        {
            
$height $this->attr['height'];
        }

        
// Now look for an inline style.
        
if (isset($this->attr['style']))
        {
            
// Thanks to user gnarf from stackoverflow for this regular expression.
            
$attributes = array();
            
preg_match_all("/([\w-]+)\s*:\s*([^;]+)\s*;?/"$this->attr['style'], $matchesPREG_SET_ORDER);
            foreach (
$matches as $match) {
              
$attributes[$match[1]] = $match[2];
            }

            
// If there is a width in the style attributes:
            
if (isset($attributes['width']) && $width == -1)
            {
                
// check that the last two characters are px (pixels)
                
if (strtolower(substr($attributes['width'], -2)) == 'px')
                {
                    
$proposed_width substr($attributes['width'], 0, -2);
                    
// Now make sure that it's an integer and not something stupid.
                    
if (filter_var($proposed_widthFILTER_VALIDATE_INT))
                    {
                        
$width $proposed_width;
                    }
                }
            }

            
// If there is a width in the style attributes:
            
if (isset($attributes['height']) && $height == -1)
            {
                
// check that the last two characters are px (pixels)
                
if (strtolower(substr($attributes['height'], -2)) == 'px')
                {
                    
$proposed_height substr($attributes['height'], 0, -2);
                    
// Now make sure that it's an integer and not something stupid.
                    
if (filter_var($proposed_heightFILTER_VALIDATE_INT))
                    {
                        
$height $proposed_height;
                    }
                }
            }

        }

        
// Future enhancement:
        // Look in the tag to see if there is a class or id specified that has a height or width attribute to it.

        // Far future enhancement
        // Look at all the parent tags of this image to see if they specify a class or id that has an img selector that specifies a height or width
        // Note that in this case, the class or id will have the img subselector for it to apply to the image.

        // ridiculously far future development
        // If the class or id is specified in a SEPARATE css file thats not on the page, go get it and do what we were just doing for the ones on the page.

        
$result = array('height' => $height,
                        
'width' => $width);
        return 
$result;
    }

    
// camel naming conventions
    
function getAllAttributes() {return $this->attr;}
    function 
getAttribute($name) {return $this->__get($name);}
    function 
setAttribute($name$value) {$this->__set($name$value);}
    function 
hasAttribute($name) {return $this->__isset($name);}
    function 
removeAttribute($name) {$this->__set($namenull);}
    function 
getElementById($id) {return $this->find("#$id"0);}
    function 
getElementsById($id$idx=null) {return $this->find("#$id"$idx);}
    function 
getElementByTagName($name) {return $this->find($name0);}
    function 
getElementsByTagName($name$idx=null) {return $this->find($name$idx);}
    function 
parentNode() {return $this->parent();}
    function 
childNodes($idx=-1) {return $this->children($idx);}
    function 
firstChild() {return $this->first_child();}
    function 
lastChild() {return $this->last_child();}
    function 
nextSibling() {return $this->next_sibling();}
    function 
previousSibling() {return $this->prev_sibling();}
    function 
hasChildNodes() {return $this->has_child();}
    function 
nodeName() {return $this->tag;}
    function 
appendChild($node) {$node->parent($this); return $node;}

}

/**
 * simple html dom parser
 * Paperg - in the find routine: allow us to specify that we want case insensitive testing of the value of the selector.
 * Paperg - change $size from protected to public so we can easily access it
 * Paperg - added ForceTagsClosed in the constructor which tells us whether we trust the html or not.  Default is to NOT trust it.
 *
 * @package PlaceLocalInclude
 */
class simple_html_dom
{
    public 
$root null;
    public 
$nodes = array();
    public 
$callback null;
    public 
$lowercase false;
    
// Used to keep track of how large the text was when we started.
    
public $original_size;
    public 
$size;
    protected 
$pos;
    protected 
$doc;
    protected 
$char;
    protected 
$cursor;
    protected 
$parent;
    protected 
$noise = array();
    protected 
$token_blank " \t\r\n";
    protected 
$token_equal ' =/>';
    protected 
$token_slash " />\r\n\t";
    protected 
$token_attr ' >';
    
// Note that this is referenced by a child node, and so it needs to be public for that node to see this information.
    
public $_charset '';
    public 
$_target_charset '';
    protected 
$default_br_text "";
    public 
$default_span_text "";

    
// use isset instead of in_array, performance boost about 30%...
    
protected $self_closing_tags = array('img'=>1'br'=>1'input'=>1'meta'=>1'link'=>1'hr'=>1'base'=>1'embed'=>1'spacer'=>1);
    protected 
$block_tags = array('root'=>1'body'=>1'form'=>1'div'=>1'span'=>1'table'=>1);
    
// Known sourceforge issue #2977341
    // B tags that are not closed cause us to return everything to the end of the document.
    
protected $optional_closing_tags = array(
        
'tr'=>array('tr'=>1'td'=>1'th'=>1),
        
'th'=>array('th'=>1),
        
'td'=>array('td'=>1),
        
'li'=>array('li'=>1),
        
'dt'=>array('dt'=>1'dd'=>1),
        
'dd'=>array('dd'=>1'dt'=>1),
        
'dl'=>array('dd'=>1'dt'=>1),
        
'p'=>array('p'=>1),
        
'nobr'=>array('nobr'=>1),
        
'b'=>array('b'=>1),
        
'option'=>array('option'=>1),
    );

    function 
__construct($str=null$lowercase=true$forceTagsClosed=true$target_charset=DEFAULT_TARGET_CHARSET$stripRN=true$defaultBRText=DEFAULT_BR_TEXT$defaultSpanText=DEFAULT_SPAN_TEXT)
    {
        if (
$str)
        {
            if (
preg_match("/^http:\/\//i",$str) || is_file($str))
            {
                
$this->load_file($str);
            }
            else
            {
                
$this->load($str$lowercase$stripRN$defaultBRText$defaultSpanText);
            }
        }
        
// Forcing tags to be closed implies that we don't trust the html, but it can lead to parsing errors if we SHOULD trust the html.
        
if (!$forceTagsClosed) {
            
$this->optional_closing_array=array();
        }
        
$this->_target_charset $target_charset;
    }

    function 
__destruct()
    {
        
$this->clear();
    }

    
// load html from string
    
function load($str$lowercase=true$stripRN=true$defaultBRText=DEFAULT_BR_TEXT$defaultSpanText=DEFAULT_SPAN_TEXT)
    {
        global 
$debugObject;

        
// prepare
        
$this->prepare($str$lowercase$stripRN$defaultBRText$defaultSpanText);
        
// strip out comments
        
$this->remove_noise("'<!--(.*?)-->'is");
        
// strip out cdata
        
$this->remove_noise("'<!\[CDATA\[(.*?)\]\]>'is"true);
        
// Per sourceforge http://sourceforge.net/tracker/?func=detail&aid=2949097&group_id=218559&atid=1044037
        // Script tags removal now preceeds style tag removal.
        // strip out <script> tags
        
$this->remove_noise("'<\s*script[^>]*[^/]>(.*?)<\s*/\s*script\s*>'is");
        
$this->remove_noise("'<\s*script\s*>(.*?)<\s*/\s*script\s*>'is");
        
// strip out <style> tags
        
$this->remove_noise("'<\s*style[^>]*[^/]>(.*?)<\s*/\s*style\s*>'is");
        
$this->remove_noise("'<\s*style\s*>(.*?)<\s*/\s*style\s*>'is");
        
// strip out preformatted tags
        
$this->remove_noise("'<\s*(?:code)[^>]*>(.*?)<\s*/\s*(?:code)\s*>'is");
        
// strip out server side scripts
        
$this->remove_noise("'(<\?)(.*?)(\?>)'s"true);
        
// strip smarty scripts
        
$this->remove_noise("'(\{\w)(.*?)(\})'s"true);

        
// parsing
        
while ($this->parse());
        
// end
        
$this->root->_[HDOM_INFO_END] = $this->cursor;
        
$this->parse_charset();

        
// make load function chainable
        
return $this;

    }

    
// load html from file
    
function load_file()
    {
        
$args func_get_args();
        
$this->load(call_user_func_array('file_get_contents'$args), true);
        
// Throw an error if we can't properly load the dom.
        
if (($error=error_get_last())!==null) {
            
$this->clear();
            return 
false;
        }
    }

    
// set callback function
    
function set_callback($function_name)
    {
        
$this->callback $function_name;
    }

    
// remove callback function
    
function remove_callback()
    {
        
$this->callback null;
    }

    
// save dom as string
    
function save($filepath='')
    {
        
$ret $this->root->innertext();
        if (
$filepath!==''file_put_contents($filepath$retLOCK_EX);
        return 
$ret;
    }

    
// find dom node by css selector
    // Paperg - allow us to specify that we want case insensitive testing of the value of the selector.
    
function find($selector$idx=null$lowercase=false)
    {
        return 
$this->root->find($selector$idx$lowercase);
    }

    
// clean up memory due to php5 circular references memory leak...
    
function clear()
    {
        foreach (
$this->nodes as $n) {$n->clear(); $n null;}
        
// This add next line is documented in the sourceforge repository. 2977248 as a fix for ongoing memory leaks that occur even with the use of clear.
        
if (isset($this->children)) foreach ($this->children as $n) {$n->clear(); $n null;}
        if (isset(
$this->parent)) {$this->parent->clear(); unset($this->parent);}
        if (isset(
$this->root)) {$this->root->clear(); unset($this->root);}
        unset(
$this->doc);
        unset(
$this->noise);
    }

    function 
dump($show_attr=true)
    {
        
$this->root->dump($show_attr);
    }

    
// prepare HTML data and init everything
    
protected function prepare($str$lowercase=true$stripRN=true$defaultBRText=DEFAULT_BR_TEXT$defaultSpanText=DEFAULT_SPAN_TEXT)
    {
        
$this->clear();

        
// set the length of content before we do anything to it.
        
$this->size strlen($str);
        
// Save the original size of the html that we got in.  It might be useful to someone.
        
$this->original_size $this->size;

        
//before we save the string as the doc...  strip out the \r \n's if we are told to.
        
if ($stripRN) {
            
$str str_replace("\r"" "$str);
            
$str str_replace("\n"" "$str);

            
// set the length of content since we have changed it.
            
$this->size strlen($str);
        }

        
$this->doc $str;
        
$this->pos 0;
        
$this->cursor 1;
        
$this->noise = array();
        
$this->nodes = array();
        
$this->lowercase $lowercase;
        
$this->default_br_text $defaultBRText;
        
$this->default_span_text $defaultSpanText;
        
$this->root = new simple_html_dom_node($this);
        
$this->root->tag 'root';
        
$this->root->_[HDOM_INFO_BEGIN] = -1;
        
$this->root->nodetype HDOM_TYPE_ROOT;
        
$this->parent $this->root;
        if (
$this->size>0$this->char $this->doc[0];
    }

    
// parse html content
    
protected function parse()
    {
        if ((
$s $this->copy_until_char('<'))==='')
        {
            return 
$this->read_tag();
        }

        
// text
        
$node = new simple_html_dom_node($this);
        ++
$this->cursor;
        
$node->_[HDOM_INFO_TEXT] = $s;
        
$this->link_nodes($nodefalse);
        return 
true;
    }

    
// PAPERG - dkchou - added this to try to identify the character set of the page we have just parsed so we know better how to spit it out later.
    // NOTE:  IF you provide a routine called get_last_retrieve_url_contents_content_type which returns the CURLINFO_CONTENT_TYPE from the last curl_exec
    // (or the content_type header from the last transfer), we will parse THAT, and if a charset is specified, we will use it over any other mechanism.
    
protected function parse_charset()
    {
        global 
$debugObject;

        
$charset null;

        if (
function_exists('get_last_retrieve_url_contents_content_type'))
        {
            
$contentTypeHeader get_last_retrieve_url_contents_content_type();
            
$success preg_match('/charset=(.+)/'$contentTypeHeader$matches);
            if (
$success)
            {
                
$charset $matches[1];
                if (
is_object($debugObject)) {$debugObject->debugLog(2'header content-type found charset of: ' $charset);}
            }

        }

        if (empty(
$charset))
        {
            
$el $this->root->find('meta[http-equiv=Content-Type]',0);
            if (!empty(
$el))
            {
                
$fullvalue $el->content;
                if (
is_object($debugObject)) {$debugObject->debugLog(2'meta content-type tag found' $fullvalue);}

                if (!empty(
$fullvalue))
                {
                    
$success preg_match('/charset=(.+)/'$fullvalue$matches);
                    if (
$success)
                    {
                        
$charset $matches[1];
                    }
                    else
                    {
                        
// If there is a meta tag, and they don't specify the character set, research says that it's typically ISO-8859-1
                        
if (is_object($debugObject)) {$debugObject->debugLog(2'meta content-type tag couldn\'t be parsed. using iso-8859 default.');}
                        
$charset 'ISO-8859-1';
                    }
                }
            }
        }

        
// If we couldn't find a charset above, then lets try to detect one based on the text we got...
        
if (empty($charset))
        {
            
// Have php try to detect the encoding from the text given to us.
            
$charset mb_detect_encoding($this->root->plaintext "ascii"$encoding_list = array( "UTF-8""CP1252" ) );
            if (
is_object($debugObject)) {$debugObject->debugLog(2'mb_detect found: ' $charset);}

            
// and if this doesn't work...  then we need to just wrongheadedly assume it's UTF-8 so that we can move on - cause this will usually give us most of what we need...
            
if ($charset === false)
            {
                if (
is_object($debugObject)) {$debugObject->debugLog(2'since mb_detect failed - using default of utf-8');}
                
$charset 'UTF-8';
            }
        }

        
// Since CP1252 is a superset, if we get one of it's subsets, we want it instead.
        
if ((strtolower($charset) == strtolower('ISO-8859-1')) || (strtolower($charset) == strtolower('Latin1')) || (strtolower($charset) == strtolower('Latin-1')))
        {
            if (
is_object($debugObject)) {$debugObject->debugLog(2'replacing ' $charset ' with CP1252 as its a superset');}
            
$charset 'CP1252';
        }

        if (
is_object($debugObject)) {$debugObject->debugLog(1'EXIT - ' $charset);}

        return 
$this->_charset $charset;
    }

    
// read tag info
    
protected function read_tag()
    {
        if (
$this->char!=='<')
        {
            
$this->root->_[HDOM_INFO_END] = $this->cursor;
            return 
false;
        }
        
$begin_tag_pos $this->pos;
        
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next

        // end tag
        
if ($this->char==='/')
        {
            
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
            // This represents the change in the simple_html_dom trunk from revision 180 to 181.
            // $this->skip($this->token_blank_t);
            
$this->skip($this->token_blank);
            
$tag $this->copy_until_char('>');

            
// skip attributes in end tag
            
if (($pos strpos($tag' '))!==false)
                
$tag substr($tag0$pos);

            
$parent_lower strtolower($this->parent->tag);
            
$tag_lower strtolower($tag);

            if (
$parent_lower!==$tag_lower)
            {
                if (isset(
$this->optional_closing_tags[$parent_lower]) && isset($this->block_tags[$tag_lower]))
                {
                    
$this->parent->_[HDOM_INFO_END] = 0;
                    
$org_parent $this->parent;

                    while ((
$this->parent->parent) && strtolower($this->parent->tag)!==$tag_lower)
                        
$this->parent $this->parent->parent;

                    if (
strtolower($this->parent->tag)!==$tag_lower) {
                        
$this->parent $org_parent// restore origonal parent
                        
if ($this->parent->parent$this->parent $this->parent->parent;
                        
$this->parent->_[HDOM_INFO_END] = $this->cursor;
                        return 
$this->as_text_node($tag);
                    }
                }
                else if ((
$this->parent->parent) && isset($this->block_tags[$tag_lower]))
                {
                    
$this->parent->_[HDOM_INFO_END] = 0;
                    
$org_parent $this->parent;

                    while ((
$this->parent->parent) && strtolower($this->parent->tag)!==$tag_lower)
                        
$this->parent $this->parent->parent;

                    if (
strtolower($this->parent->tag)!==$tag_lower)
                    {
                        
$this->parent $org_parent// restore origonal parent
                        
$this->parent->_[HDOM_INFO_END] = $this->cursor;
                        return 
$this->as_text_node($tag);
                    }
                }
                else if ((
$this->parent->parent) && strtolower($this->parent->parent->tag)===$tag_lower)
                {
                    
$this->parent->_[HDOM_INFO_END] = 0;
                    
$this->parent $this->parent->parent;
                }
                else
                    return 
$this->as_text_node($tag);
            }

            
$this->parent->_[HDOM_INFO_END] = $this->cursor;
            if (
$this->parent->parent$this->parent $this->parent->parent;

            
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
            
return true;
        }

        
$node = new simple_html_dom_node($this);
        
$node->_[HDOM_INFO_BEGIN] = $this->cursor;
        ++
$this->cursor;
        
$tag $this->copy_until($this->token_slash);
        
$node->tag_start $begin_tag_pos;

        
// doctype, cdata & comments...
        
if (isset($tag[0]) && $tag[0]==='!') {
            
$node->_[HDOM_INFO_TEXT] = '<' $tag $this->copy_until_char('>');

            if (isset(
$tag[2]) && $tag[1]==='-' && $tag[2]==='-') {
                
$node->nodetype HDOM_TYPE_COMMENT;
                
$node->tag 'comment';
            } else {
                
$node->nodetype HDOM_TYPE_UNKNOWN;
                
$node->tag 'unknown';
            }
            if (
$this->char==='>'$node->_[HDOM_INFO_TEXT].='>';
            
$this->link_nodes($nodetrue);
            
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
            
return true;
        }

        
// text
        
if ($pos=strpos($tag'<')!==false) {
            
$tag '<' substr($tag0, -1);
            
$node->_[HDOM_INFO_TEXT] = $tag;
            
$this->link_nodes($nodefalse);
            
$this->char $this->doc[--$this->pos]; // prev
            
return true;
        }

        if (!
preg_match("/^[\w-:]+$/"$tag)) {
            
$node->_[HDOM_INFO_TEXT] = '<' $tag $this->copy_until('<>');
            if (
$this->char==='<') {
                
$this->link_nodes($nodefalse);
                return 
true;
            }

            if (
$this->char==='>'$node->_[HDOM_INFO_TEXT].='>';
            
$this->link_nodes($nodefalse);
            
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
            
return true;
        }

        
// begin tag
        
$node->nodetype HDOM_TYPE_ELEMENT;
        
$tag_lower strtolower($tag);
        
$node->tag = ($this->lowercase) ? $tag_lower $tag;

        
// handle optional closing tags
        
if (isset($this->optional_closing_tags[$tag_lower]) )
        {
            while (isset(
$this->optional_closing_tags[$tag_lower][strtolower($this->parent->tag)]))
            {
                
$this->parent->_[HDOM_INFO_END] = 0;
                
$this->parent $this->parent->parent;
            }
            
$node->parent $this->parent;
        }

        
$guard 0// prevent infinity loop
        
$space = array($this->copy_skip($this->token_blank), '''');

        
// attributes
        
do
        {
            if (
$this->char!==null && $space[0]==='')
            {
                break;
            }
            
$name $this->copy_until($this->token_equal);
            if (
$guard===$this->pos)
            {
                
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
                
continue;
            }
            
$guard $this->pos;

            
// handle endless '<'
            
if ($this->pos>=$this->size-&& $this->char!=='>') {
                
$node->nodetype HDOM_TYPE_TEXT;
                
$node->_[HDOM_INFO_END] = 0;
                
$node->_[HDOM_INFO_TEXT] = '<'.$tag $space[0] . $name;
                
$node->tag 'text';
                
$this->link_nodes($nodefalse);
                return 
true;
            }

            
// handle mismatch '<'
            
if ($this->doc[$this->pos-1]=='<') {
                
$node->nodetype HDOM_TYPE_TEXT;
                
$node->tag 'text';
                
$node->attr = array();
                
$node->_[HDOM_INFO_END] = 0;
                
$node->_[HDOM_INFO_TEXT] = substr($this->doc$begin_tag_pos$this->pos-$begin_tag_pos-1);
                
$this->pos -= 2;
                
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
                
$this->link_nodes($nodefalse);
                return 
true;
            }

            if (
$name!=='/' && $name!=='') {
                
$space[1] = $this->copy_skip($this->token_blank);
                
$name $this->restore_noise($name);
                if (
$this->lowercase$name strtolower($name);
                if (
$this->char==='=') {
                    
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
                    
$this->parse_attr($node$name$space);
                }
                else {
                    
//no value attr: nowrap, checked selected...
                    
$node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_NO;
                    
$node->attr[$name] = true;
                    if (
$this->char!='>'$this->char $this->doc[--$this->pos]; // prev
                
}
                
$node->_[HDOM_INFO_SPACE][] = $space;
                
$space = array($this->copy_skip($this->token_blank), '''');
            }
            else
                break;
        } while (
$this->char!=='>' && $this->char!=='/');

        
$this->link_nodes($nodetrue);
        
$node->_[HDOM_INFO_ENDSPACE] = $space[0];

        
// check self closing
        
if ($this->copy_until_char_escape('>')==='/')
        {
            
$node->_[HDOM_INFO_ENDSPACE] .= '/';
            
$node->_[HDOM_INFO_END] = 0;
        }
        else
        {
            
// reset parent
            
if (!isset($this->self_closing_tags[strtolower($node->tag)])) $this->parent $node;
        }
        
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next

        // If it's a BR tag, we need to set it's text to the default text.
        // This way when we see it in plaintext, we can generate formatting that the user wants.
        // since a br tag never has sub nodes, this works well.
        
if ($node->tag == "br")
        {
            
$node->_[HDOM_INFO_INNER] = $this->default_br_text;
        }

        return 
true;
    }

    
// parse attributes
    
protected function parse_attr($node$name, &$space)
    {
        
// Per sourceforge: http://sourceforge.net/tracker/?func=detail&aid=3061408&group_id=218559&atid=1044037
        // If the attribute is already defined inside a tag, only pay atetntion to the first one as opposed to the last one.
        
if (isset($node->attr[$name]))
        {
            return;
        }

        
$space[2] = $this->copy_skip($this->token_blank);
        switch (
$this->char) {
            case 
'"':
                
$node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_DOUBLE;
                
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
                
$node->attr[$name] = $this->restore_noise($this->copy_until_char_escape('"'));
                
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
                
break;
            case 
'\'':
                
$node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_SINGLE;
                
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
                
$node->attr[$name] = $this->restore_noise($this->copy_until_char_escape('\''));
                
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
                
break;
            default:
                
$node->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_NO;
                
$node->attr[$name] = $this->restore_noise($this->copy_until($this->token_attr));
        }
        
// PaperG: Attributes should not have \r or \n in them, that counts as html whitespace.
        
$node->attr[$name] = str_replace("\r"""$node->attr[$name]);
        
$node->attr[$name] = str_replace("\n"""$node->attr[$name]);
        
// PaperG: If this is a "class" selector, lets get rid of the preceeding and trailing space since some people leave it in the multi class case.
        
if ($name == "class") {
            
$node->attr[$name] = trim($node->attr[$name]);
        }
    }

    
// link node's parent
    
protected function link_nodes(&$node$is_child)
    {
        
$node->parent $this->parent;
        
$this->parent->nodes[] = $node;
        if (
$is_child)
        {
            
$this->parent->children[] = $node;
        }
    }

    
// as a text node
    
protected function as_text_node($tag)
    {
        
$node = new simple_html_dom_node($this);
        ++
$this->cursor;
        
$node->_[HDOM_INFO_TEXT] = '</' $tag '>';
        
$this->link_nodes($nodefalse);
        
$this->char = (++$this->pos<$this->size) ? $this->doc[$this->pos] : null// next
        
return true;
    }

    protected function 
skip($chars)
    {
        
$this->pos += strspn($this->doc$chars$this->pos);
        
$this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null// next
    
}

    protected function 
copy_skip($chars)
    {
        
$pos $this->pos;
        
$len strspn($this->doc$chars$pos);
        
$this->pos += $len;
        
$this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null// next
        
if ($len===0) return '';
        return 
substr($this->doc$pos$len);
    }

    protected function 
copy_until($chars)
    {
        
$pos $this->pos;
        
$len strcspn($this->doc$chars$pos);
        
$this->pos += $len;
        
$this->char = ($this->pos<$this->size) ? $this->doc[$this->pos] : null// next
        
return substr($this->doc$pos$len);
    }

    protected function 
copy_until_char($char)
    {
        if (
$this->char===null) return '';

        if ((
$pos strpos($this->doc$char$this->pos))===false) {
            
$ret substr($this->doc$this->pos$this->size-$this->pos);
            
$this->char null;
            
$this->pos $this->size;
            return 
$ret;
        }

        if (
$pos===$this->pos) return '';
        
$pos_old $this->pos;
        
$this->char $this->doc[$pos];
        
$this->pos $pos;
        return 
substr($this->doc$pos_old$pos-$pos_old);
    }

    protected function 
copy_until_char_escape($char)
    {
        if (
$this->char===null) return '';

        
$start $this->pos;
        while (
1)
        {
            if ((
$pos strpos($this->doc$char$start))===false)
            {
                
$ret substr($this->doc$this->pos$this->size-$this->pos);
                
$this->char null;
                
$this->pos $this->size;
                return 
$ret;
            }

            if (
$pos===$this->pos) return '';

            if (
$this->doc[$pos-1]==='\\') {
                
$start $pos+1;
                continue;
            }

            
$pos_old $this->pos;
            
$this->char $this->doc[$pos];
            
$this->pos $pos;
            return 
substr($this->doc$pos_old$pos-$pos_old);
        }
    }

    
// remove noise from html content
    // save the noise in the $this->noise array.
    
protected function remove_noise($pattern$remove_tag=false)
    {
        global 
$debugObject;
        if (
is_object($debugObject)) { $debugObject->debugLogEntry(1); }

        
$count preg_match_all($pattern$this->doc$matchesPREG_SET_ORDER|PREG_OFFSET_CAPTURE);

        for (
$i=$count-1$i>-1; --$i)
        {
            
$key '___noise___'.sprintf('% 5d'count($this->noise)+1000);
            if (
is_object($debugObject)) { $debugObject->debugLog(2'key is: ' $key); }
            
$idx = ($remove_tag) ? 1;
            
$this->noise[$key] = $matches[$i][$idx][0];
            
$this->doc substr_replace($this->doc$key$matches[$i][$idx][1], strlen($matches[$i][$idx][0]));
        }

        
// reset the length of content
        
$this->size strlen($this->doc);
        if (
$this->size>0)
        {
            
$this->char $this->doc[0];
        }
    }

    
// restore noise to html content
    
function restore_noise($text)
    {
        global 
$debugObject;
        if (
is_object($debugObject)) { $debugObject->debugLogEntry(1); }

        while ((
$pos=strpos($text'___noise___'))!==false)
        {
            
// Sometimes there is a broken piece of markup, and we don't GET the pos+11 etc... token which indicates a problem outside of us...
            
if (strlen($text) > $pos+15)
            {
                
$key '___noise___'.$text[$pos+11].$text[$pos+12].$text[$pos+13].$text[$pos+14].$text[$pos+15];
                if (
is_object($debugObject)) { $debugObject->debugLog(2'located key of: ' $key); }

                if (isset(
$this->noise[$key]))
                {
                    
$text substr($text0$pos).$this->noise[$key].substr($text$pos+16);
                }
                else
                {
                    
// do this to prevent an infinite loop.
                    
$text substr($text0$pos).'UNDEFINED NOISE FOR KEY: '.$key substr($text$pos+16);
                }
            }
            else
            {
                
// There is no valid key being given back to us... We must get rid of the ___noise___ or we will have a problem.
                
$text substr($text0$pos).'NO NUMERIC NOISE KEY' substr($text$pos+11);
            }
        }
        return 
$text;
    }

    
// Sometimes we NEED one of the noise elements.
    
function search_noise($text)
    {
        global 
$debugObject;
        if (
is_object($debugObject)) { $debugObject->debugLogEntry(1); }

        foreach(
$this->noise as $noiseElement)
        {
            if (
strpos($noiseElement$text)!==false)
            {
                return 
$noiseElement;
            }
        }
    }
    function 
__toString()
    {
        return 
$this->root->innertext();
    }

    function 
__get($name)
    {
        switch (
$name)
        {
            case 
'outertext':
                return 
$this->root->innertext();
            case 
'innertext':
                return 
$this->root->innertext();
            case 
'plaintext':
                return 
$this->root->text();
            case 
'charset':
                return 
$this->_charset;
            case 
'target_charset':
                return 
$this->_target_charset;
        }
    }

    
// camel naming conventions
    
function childNodes($idx=-1) {return $this->root->childNodes($idx);}
    function 
firstChild() {return $this->root->first_child();}
    function 
lastChild() {return $this->root->last_child();}
    function 
createElement($name$value=null) {return @str_get_html("<$name>$value</$name>")->first_child();}
    function 
createTextNode($value) {return @end(str_get_html($value)->nodes);}
    function 
getElementById($id) {return $this->find("#$id"0);}
    function 
getElementsById($id$idx=null) {return $this->find("#$id"$idx);}
    function 
getElementByTagName($name) {return $this->find($name0);}
    function 
getElementsByTagName($name$idx=-1) {return $this->find($name$idx);}
    function 
loadFile() {$args func_get_args();$this->load_file($args);}
}

?>