March 3, 2013

How to extract HTML tags from the web page generated at runtime

Question by shailbenq

I am using a SimpleHTMLDOM parser to extract HTML data from web pages. But I came across websites such as www.coursera.com wherein the webpage is generated at runtime.

I need to know has anyone tried parsing such pages?

I am new to this field so some theory on this topic would help my understanding in parsing webpages.

Answer by Starx

John Resig wrote an HTML Parser.

Demo: http://ejohn.org/blog/pure-javascript-html-parser/

This can workout for you.

Author: Nabin Nepal (Starx)

Hello, I am Nabin Nepal and you can call me Starx. This is my blog where write about my life and my involvements. I am a Software Developer, A Cyclist and a Realist. I hope you will find my blog interesting. Follow me on Google+

...

Please fill the form - I will response as fast as I can!