RegEx for <title> with leading, trailing, linebreak
Question by user1377738
Most website I can parse its title easily with RegEx “(.)” or “s(.+?)s*”. However some sites have a bit different formatting, like http://www.youtube.com (see below). The expression above does not work. Any help catching this kind of format and any other HTML formats?
Thanks
-Tim.
<title>
YouTube - Broadcast Yourself.
Answer by Fèlix Galindo Allué
If you want to include the line break to the regular expression, in most cases you would only need to use the n
inside the expression. That said, which language/interpreter are you using? Some of them doesn’t allow multiline expressions.
If they are permitted, something like (.|n|r)*
would suffice.
In case your language or interpreter is not compatible to multiline regular expressions, you could always replace the newlines characters with spaces, and then pass the resulting string to the regular expression parser. That again also depends on your programming environment.
Hope helped!
Answer by Starx
There are various ways to get this done. For only title, SIMPLEHTMLDOM is more than enough.
$html = file_get_html('http://www.youtube.com/');
$title = $html -> find("title") -> innerHTML;
echo $title;