The following gives multiple approaches in EmacsLisp code for managing duplicate lines in a file. Try them out by EvaluatingExpressions or by adding to your InitFile. For duplicating lines see CopyFromAbove. == delete-duplicate-lines == See this blog post by Bozhidar Batsov on how to use `delete-duplicate-lines': http://emacsredux.com/blog/2014/03/01/a-peek-at-emacs-24-dot-4-delete-duplicate-lines/ == Interactive search and replace won't work == You can try to remove duplicate lines with ReplaceRegexp or `##C-M-%##' (`query-replace-regexp'), by replacing ##\([^ C-q C-j ]+ C-q C-j \)\1+## with ##\1##. A newline is entered with a `C-q C-j'. The Lisp version of this command would be (replace-regexp "\\([^\n]+\n\\)\\1+" "\\1") Since it doesn't backtrack, there's no way remove a line with more than one duplicate. {{{ Duplicate line 1 Unique line 1 Duplicate line 1 Duplicate line 1 }}} It will leave duplicate lines. {{{ Duplicate line 1 Unique line 1 Duplicate line 1 }}} == Search and replace in Lisp that works == See [https://debbugs.gnu.org/cgi/bugreport.cgi?bug=13032 bug #13032] for delete-duplicate-lines in core emacs. Here's some Lisp to find duplicate lines and keep only the first occurrence by starting each search and replace at the start of the last duplicate -- ##(goto-char start)##. (defun uniquify-all-lines-region (start end) "Find duplicate lines in region START to END keeping first occurrence." (interactive "*r") (save-excursion (let ((end (copy-marker end))) (while (progn (goto-char start) (re-search-forward "^\\(.*\\)\n\\(\\(.*\n\\)*\\)\\1\n" end t)) (replace-match "\\1\n\\2"))))) (defun uniquify-all-lines-buffer () "Delete duplicate lines in buffer and keep first occurrence." (interactive "*") (uniquify-all-lines-region (point-min) (point-max))) So for this buffer: {{{ Duplicate line 1 Unique line 1 Duplicate line 1 Unique line 2 Unique line 3 Duplicate line 1 Duplicate line 2 Duplicate line 2 Unique line 4 }}} running `M-x uniquify-all-lines-buffer' produces: {{{ Duplicate line 1 Unique line 1 Unique line 2 Unique line 3 Duplicate line 2 Unique line 4 }}} == A Lisp command using a list containing each line == Another implementation that adds distinct lines to a temporary list, and checks each line in the list with `assoc' will give the same result as the previous. (defun uniquify-all-lines-region (start end) "Find duplicate lines in region START to END keeping first occurrence." (interactive "*r") (save-excursion (let ((lines) (end (copy-marker end))) (goto-char start) (while (and (< (point) (marker-position end)) (not (eobp))) (let ((line (buffer-substring-no-properties (line-beginning-position) (line-end-position)))) (if (member line lines) (delete-region (point) (progn (forward-line 1) (point))) (push line lines) (forward-line 1))))))) == The uniq command with consecutive lines == The unix utility called ##uniq## removes duplicate consecutive lines, keeping only one instance. The output of ##uniq## on the example above is: {{{ Duplicate line 1 Unique line 1 Duplicate line 1 Unique line 2 Unique line 3 Duplicate line 1 Duplicate line 2 Unique line 4 }}} Note that non-consecutive duplication of the first line are not removed. == The sort -u command for non-consecutive duplicates == The unix utility ##sort## and its ##-u## argument can give the same result of unique lines as `M-x uniquify-all-lines-buffer' . However, the lines are sorted rather than kept in the order of first appearance. The output of ##sort -u## on the example above is: {{{ Duplicate line 1 Duplicate line 2 Unique line 1 Unique line 2 Unique line 3 Unique line 4 }}} == Lisp commands removing consecutive duplicates == The command `M-x uniquify-buffer-lines' will remove identical adjacent lines in the current buffer, similar to what is obtained with the unix ##uniq## command. (defun uniquify-region-lines (beg end) "Remove duplicate adjacent lines in region." (interactive "*r") (save-excursion (goto-char beg) (while (re-search-forward "^\\(.*\n\\)\\1+" end t) (replace-match "\\1")))) (defun uniquify-buffer-lines () "Remove duplicate adjacent lines in the current buffer." (interactive) (uniquify-region-lines (point-min) (point-max))) It is important to note that functions which find duplicate lines don't always sort lines before looking for dups as this may or may not be what one expects or desires of a particular function. == Lisp command to retrieve duplicates == Where the lines of a file are presorted it can be convenient to use something like this: (defun find-duplicate-lines (&optional insertp interp) (interactive "i\np") (let ((max-pon (line-number-at-pos (point-max))) (gather-dups)) (while (< (line-number-at-pos) max-pon) (= (forward-line) 0) (let ((this-line (buffer-substring-no-properties (line-beginning-position 1) (line-end-position 1))) (next-line (buffer-substring-no-properties (line-beginning-position 2) (line-end-position 2)))) (when (equal this-line next-line) (setq gather-dups (cons this-line gather-dups))))) (if (or insertp interp) (save-excursion (new-line) (princ gather-dups (current-buffer))) gather-dups))) This function, while inefficient (note cons in tail of while form) is quite handy for locating duplicates _before_ removing them, i.e. situations of type: `uniquify-maybe'. Extend `find-duplicate-lines' by comparing its result list with one or more of the list comparison procedures `set-difference', `union', `intersection', etc. from the CL package (require 'cl). == Indexed lines == I had a file that looks like {{{ 0001 line1 original 0001 line1 modified 0002 line2 0003 line3 original 0003 line3 modified }}} Use this regexp in ##query-replace-regexp## to find the duplicated lines : ##\(\([0-9]\{6,7\}\).*\)\(\2.*\)## then you can replace it with ##\1## to keep the original lines, or ##\3## to keep the modified ones ---- CategoryEditing