The following gives multiple approaches in EmacsLisp code for managing duplicate lines in a file.  Try them out by EvaluatingExpressions or by adding to your InitFile.  For duplicating lines see CopyFromAbove.

== delete-duplicate-lines ==

See this blog post by Bozhidar Batsov on how to use `delete-duplicate-lines':

http://emacsredux.com/blog/2014/03/01/a-peek-at-emacs-24-dot-4-delete-duplicate-lines/

== Interactive search and replace won't work ==

You can try to remove duplicate lines with ReplaceRegexp or `##C-M-%##' (`query-replace-regexp'), by replacing ##\([^ C-q C-j ]+ C-q C-j \)\1+## with ##\1##.  A newline is entered with a `C-q C-j'.  The Lisp version of this command would be

  (replace-regexp "\\([^\n]+\n\\)\\1+" "\\1")

Since it doesn't backtrack, there's no way remove a line with more than one duplicate.

{{{
Duplicate line 1
Unique line 1
Duplicate line 1
Duplicate line 1
}}}

It will leave duplicate lines.

{{{
Duplicate line 1
Unique line 1
Duplicate line 1
}}}

== Search and replace in Lisp that works ==

See [https://debbugs.gnu.org/cgi/bugreport.cgi?bug=13032 bug #13032] for delete-duplicate-lines in core emacs.

Here's some Lisp to find duplicate lines and keep only the first occurrence by starting each search and replace at the start of the last duplicate -- ##(goto-char start)##.

  (defun uniquify-all-lines-region (start end)
    "Find duplicate lines in region START to END keeping first occurrence."
    (interactive "*r")
    (save-excursion
      (let ((end (copy-marker end)))
        (while
            (progn
              (goto-char start)
              (re-search-forward "^\\(.*\\)\n\\(\\(.*\n\\)*\\)\\1\n" end t))
          (replace-match "\\1\n\\2")))))
  
  (defun uniquify-all-lines-buffer ()
    "Delete duplicate lines in buffer and keep first occurrence."
    (interactive "*")
    (uniquify-all-lines-region (point-min) (point-max)))

So for this buffer:

{{{
Duplicate line 1
Unique line 1


Duplicate line 1
Unique line 2
Unique line 3

Duplicate line 1


Duplicate line 2
Duplicate line 2
Unique line 4

}}}

running `M-x uniquify-all-lines-buffer' produces:

{{{
Duplicate line 1
Unique line 1

Unique line 2
Unique line 3
Duplicate line 2
Unique line 4
}}}

== A Lisp command using a list containing each line ==

Another implementation that adds distinct lines to a temporary list, and checks each line in the list with `assoc' will give the same result as the previous.

  (defun uniquify-all-lines-region (start end)
    "Find duplicate lines in region START to END keeping first occurrence."
    (interactive "*r")
    (save-excursion
      (let ((lines) (end (copy-marker end)))
        (goto-char start)
        (while (and (< (point) (marker-position end))
                    (not (eobp)))
          (let ((line (buffer-substring-no-properties
                       (line-beginning-position) (line-end-position))))
            (if (member line lines)
                (delete-region (point) (progn (forward-line 1) (point)))
              (push line lines)
              (forward-line 1)))))))

== The uniq command with consecutive lines ==

The unix utility called ##uniq##
removes duplicate consecutive lines, keeping only one instance.
The output of ##uniq## on the example above is:

{{{
Duplicate line 1
Unique line 1

Duplicate line 1
Unique line 2
Unique line 3

Duplicate line 1

Duplicate line 2
Unique line 4

}}}

Note that non-consecutive duplication of the first line are not removed.

== The sort -u command for non-consecutive duplicates ==

The unix utility ##sort## and its ##-u## argument can give the same result of unique lines as `M-x uniquify-all-lines-buffer' .  However, the lines are sorted rather than kept in the order of first appearance.

The output of ##sort -u## on the example above is:

{{{

Duplicate line 1
Duplicate line 2
Unique line 1
Unique line 2
Unique line 3
Unique line 4
}}}

== Lisp commands removing consecutive duplicates ==

The command `M-x uniquify-buffer-lines' will remove identical adjacent lines
in the current buffer, similar to what is obtained with the unix ##uniq## command.

  (defun uniquify-region-lines (beg end)
    "Remove duplicate adjacent lines in region."
    (interactive "*r")
    (save-excursion
      (goto-char beg)
      (while (re-search-forward "^\\(.*\n\\)\\1+" end t)
        (replace-match "\\1"))))
  
  (defun uniquify-buffer-lines ()
    "Remove duplicate adjacent lines in the current buffer."
    (interactive)
    (uniquify-region-lines (point-min) (point-max)))

It is important to note that functions which find duplicate lines don't always sort lines before looking for dups as this may or may not be what one expects or desires of a particular function. 


== Lisp command to retrieve duplicates  ==

Where the lines of a file are presorted it can be convenient to use something like this:

  (defun find-duplicate-lines (&optional insertp interp)
    (interactive "i\np")
    (let ((max-pon (line-number-at-pos (point-max)))
	  (gather-dups))
      (while (< (line-number-at-pos) max-pon) (= (forward-line) 0)
	     (let ((this-line (buffer-substring-no-properties (line-beginning-position 1) (line-end-position 1)))
		   (next-line (buffer-substring-no-properties (line-beginning-position 2) (line-end-position 2))))
	       (when  (equal this-line next-line)  (setq gather-dups (cons this-line gather-dups)))))
      (if (or insertp interp)
	  (save-excursion (new-line) (princ gather-dups (current-buffer)))
	gather-dups)))

This function, while inefficient (note cons in tail of while form) is quite handy for locating duplicates _before_ removing them, i.e. situations of type: `uniquify-maybe'. Extend `find-duplicate-lines' by comparing its result list with one or more of the list comparison procedures `set-difference', `union', `intersection', etc. from the CL package (require 'cl).

== Indexed lines ==

I had a file that looks like 
{{{
0001 line1 original
0001 line1 modified
0002 line2 
0003 line3 original
0003 line3 modified
}}}

Use this regexp in ##query-replace-regexp## to find the duplicated lines : 
##\(\([0-9]\{6,7\}\).*\)<newline, insert it with C-q C-j>\(\2.*\)##
then you can replace it with ##\1## to keep the original lines, or ##\3## to keep the modified ones

----

CategoryEditing