February 28, 2012

Better Way to write this to escape HTML content

Question by Viren

I have string which rich text content

something like this for example

<p>Hello</p>

<br/>

<p> Christian </p>

<pre> Don't Know what to do </pre>

Now I want dont want script to be present in the above content and if present esape it

so If I have content which look like this

<p>Hello</p>

<br/>

<p> Christian </p>
<script type="text/javascript"> alert("Hello")</script>
<pre> Don't Know what to do </pre>

Need to be replace with

<p>Hello</p>

<br/>

<p> Christian </p>
&lt;script type="text/javascript"&gt; alert("Hello")&lt;/script&gt;
<pre> Don't Know what to do </pre>

I have currently developed regex for it

so my code look something like this

if content.match(/<script(.+?)>/) {
  content = content.replace(content.match(/<script(.+?)>/)[0],content.match(/<script(.+?)>/)[0].replace("<","&lt;").replace(">","&gt;"))
}
if content.match(/<scripts*>/)
 {
content = content.replace(content.match(/</scripts*>/)[0],content.match(/</scripts*>/)[0].replace("<","&lt;").replace(">","&gt;"))
}

so the result content will have script tag escaped

Can anyone suggest me cleaner way to achieve this?

Answer by jensgram

Cleaner:

content = content.replace(/<(script[^>]*|/script)>/g, '&lt;$1&gt;');

However, this is probably not the way to go about this. Why are these <script> tags in the JS string in the first place?

Answer by Starx

Not the answer you are looking for but what if javascript is disabled? Are you going to let the unescaped content show up on the page. Hopefully NOT

Escaping must be done using a server side scripting like PHP, ASP.NET etc.

As in PHP, the htmlentities()[docs here] will do just fine

$escaped = htmlentities($content)
...

Please fill the form - I will response as fast as I can!