HomeBlog › How to turn messy copied web text into clean plain text

How to turn messy copied web text into clean plain text

You copy a few paragraphs from a web page or a Word doc, paste them into a plain-text field — a CMS box, an email, a code comment — and what arrives is a disaster. Strange line breaks, double spaces, leftover bullet characters, sometimes literal HTML tags. The text looked fine on the page; pasted, it is a mess. Here is why, and a reliable cleanup order that fixes it every time.

Why copied web text is dirty

When you copy from a rendered web page, you are not copying the clean words you see — you are copying a fragment of the underlying HTML, complete with tags, inline styling, and the whitespace the browser was ignoring. Most editors try to paste as rich text and strip some of it, but they leave behind artifacts: stray tags, non-breaking spaces, and the hard line breaks baked into how the text was laid out.

Step 1: strip the markup

If your pasted text shows visible tags like <p> or <span>, or you copied raw HTML source, start by removing the tags. Paste it into Strip HTML Tags and you get just the words, markup gone. This alone solves the worst cases.

Step 2: fix the spacing

Stripping tags often leaves uneven spacing where elements used to be — double or triple spaces, gaps that were really margins. Run the result through Remove Extra Spaces to collapse runs of whitespace into single spaces and trim the ends of each line.

Step 3: remove the empty lines

Web layouts love vertical space, which copies across as blank lines between paragraphs — sometimes several in a row. Remove Empty Lines compacts those down so your text flows the way you want it.

Step 4 (optional): rejoin broken lines

If the source wrapped text at a fixed width — common with copied code comments, emails, or PDF-derived pages — you will have a hard line break at the end of every visual line. Remove Line Breaks with "join with a space" and "preserve paragraph breaks" turned on rebuilds proper paragraphs. There is a whole post on why this happens if you are curious.

The order matters

Tags first, then spaces, then empty lines, then line breaks. In that sequence each step works on progressively cleaner input, and you never have to redo one. The whole chain takes under a minute and runs in your browser, so internal documents stay private. Bookmark the four tools together and messy paste will never cost you more than a minute again.

← Back to all posts