Implementing Undo For Text Editors

HomePage | RecentChanges | EditorIndex | TextEditorFamilies | Preferences

No diff available--this is the first major revision. (minor diff)
Implementing Undo for Text Editors

 Craig Durland
 craig@cv.hp.com

 Copyright 1991
 February 1991
Revised
 September 1991   
 February 1992

Probably anyone who has used a text editor has, at one time or another, made a boo boo and immediately wished they could to go back in time and not do it all over again. The Undo command gives that ability by moving backwards in time, reversing the effects of the most recent changes to the text. Some undo commands can go back to the beginning of time, undoing all changes that have occurred to the text.

The undo described in this article is a subset of the idealized undo - it only reverses changes to text. It doesn't keep track of cursor movements, buffer changes or the zillions of other things we do while editing text. While this can make undoing changes disconcerting because it can jump from change to change differently from what you remember, it also reduces the amount of stuff that the editor needs to remember (saving memory) and helps keep the editor from bogging down trying to keep track of whats been going on. Finally, the material presented is for text editors. Editors of other types may find some of the discussion relevant but that will be a happy accident.

Jonathan Payne (author of JOVE) has stated that "Undo/redo is trivial to implement" (comp.editors, Tue, 2 Jan 1990). Having just implemented undo for the editor I am working on, I think that straight forward is a better term than trivial. Since it took me a long time to figure out a workable undo scheme, I thought others might be interested in what I did.

Caveats: The algorithms, code and data structures used in this article are based on working and tested code that has been "sanitized" to remove obscuring details (like what stack data structures I used). If you use the code, make sure you understand it so that you can properly fill in the missing details.

Conditions: You are free to use the algorithms, code and data structures contained in this article in anyway you please. If you use large chunks, I would appreciate an acknowledgment. If you distribute this article, in whole or part, please retain the authorship notice.

What is Not Discussed

Redo is a undo for undo. Very handy if you undo one time to many and need to undo that last undo. I haven't implemented it yet but I think that it will be easy to modify the algorithms to support it. I also think some dragons are there but they are hiding from me. We'll see when I actually get around to doing it.

Editor Implementations and How They Affect Undo

Two popular ways text editors store text are "buffer gap" and "linked list of lines" (described in detail else where - see comp.editors). As you might imagine, different implementations for storing text can affect undo implementations. Fortunately for this article, the affects are relatively minor. As I discuss data structures and algorithms, I will point out the differences.

Note: I will refer to buffer gap editors as BGE and linked list editors as LLE.

What You Need to Save to Implement Undo

I use Undo Events to represent text changes. These events are kept in a undo stack, most recent event at the top of the stack. There is one stack per text object (buffer in Emacs) so undo can only track changes on a per buffer basis. This simplifies things quite a bit over tracking all changes globally in a multibuffer editor.

I found that four event types seem to be enough describe all text changes. They are:

  I also store the buffer modified state here so I know to mark the buffer as unchanged if it was before this change happened.

It should be obvious by now that much editing would generate a LOT of undo events. Unless your computer has an infinite amount of memory, you need some rules about when to start throwing away events. Since different people will have different ideas about this and the amount of memory can greatly affect this, you need to let the user control it. There are three parameters that need to be specified:

It can be a real guessing game to figure out how to balance the number of events saved against text saved. Different editing operations use up one or the other at different rates. I'm currently using 600 events and a 5K save buffer.

When an Editor Does Undo

Given the undo event types and rules, the editor just has to save undo events when needed and add sequence points at the "proper" times. Now is the time you will be glad you funneled all you buffer changes through just a couple of insert/delete routines. Here is where I save events for my Emacs-like editor:

This routine is used when doing things like a kill buffer yank. It is also used to undo a delete so care has to be taken not to get into a do/undo tug of war.

We mark the buffer as unchanged because it has been cleared and filled with new text. Note that it is up the buffer clear routine to save the delete text. Also note that to do a "real" undo, we should also save_for_undo(BU_INSERT,<characters in file>). I don't because I don't think it makes sense and because of the clear buffer problem mentioned below.

This is used for the self inserting characters.

Have to be careful here because this routine is also used to undo character inserts. Actually, the undo routine is the one who cares and protects against this.

Where I should (but don't) save undo events:

To do undo, the editor just calls the undo routine.

Data Structures

Each text object (buffer) has a pointer to a undo header. This header contains a undo event stack, number of events in the stack (so I can easily check to see if the stack has grown too big), a flag that indicates whether or not the BU_UNCHANGED events are valid and an area in which to save deleted text.

In C, this looks like:

     typedef struct
     {
       Undo *undo_stack;	/* where the undo events are */
       int undo_count;		/* number of undos in the stack */
       int there_is_a_valid_unchanged_event;
       Bag *bag;		/* deleted text saved here */
     } UndoHeader?;

An undo stack is a linked list of undo events. I used a linked list because it was easy and worked well but there are many other ways to store the data that would probably work as well. An event contains a pointer to the next (older) event, the event type (BU_INSERT, etc), a marker to where the event took place, the number of characters involved and a pointer to the saved text (if any). Note that some events may not use some of these fields. The clever programmer could probably save quite a bit of space by having different (sized) structures for each event type.

LLEs (Linked List Editors) can run into problems when asked to save delete events. Its quite possible that the we could be asked to save more text than buffer contains. For example, in Emacs, the user could hit control-U a bunch of times and hit the delete key - delete 16384 characters when the buffer only contains 53. With a BGE (Buffer Gap Editor), this is very easy to check but is very time consuming with a LLE. What I do with LLE is "trust but verify". I make enough room to hold that much deleted text, then go ahead and try and save it. I then figure out how much I really saved (a side affect of copying the text) and save that number. The big drawback with this is that its easy to clear out most of the undo stack when you didn't need to and looks very fishy when you can't undo very much. The other way around this is to put the "save-deleted-text" stuff in the routine that actually deletes the text but this would add a lot of noise to a already hard to understand routine.

A problem related to the I-don't-know-how-much-is-going-to-be-deleted problem is the clear-buffer problem. With BGE, you know how much you are going to remove, with LLE, you don't. So, I just clear the undo stack when the buffer is cleared because 1) most buffers will probably be bigger than the undo buffer anyway and cause it to be cleared and 2) buffer clears don't happen very often - usually for buffer deletes and file reads.

C code:

     typedef struct Undo
     {
       struct Undo *next;	/* pointer to next older event */
       int type;
       UnMark mark;		/* where the event took place */
       int n;			/* number of characters involved */
       char *text;		/* where deleted text is saved */
     } Undo;

In my editor, I have the concept of bags. Bags hold various types of text objects and I have a lot of builtin support for them. It was natural and very easy to use these to hold deleted text. Your editor is probably different so you'll have to use something else (like malloc()ing some space). I use one per buffer and holds it all the deleted text. The undo delete events point into the bag (actually, I use offsets but the concept is the same).

The undo markers are where in the buffer the event took place. These only matter for insert and delete events. This is one of the areas where editor implementation really matters. A BGE (buffer gap editor) is a real win here. Since undo events are effectively snapshots of buffer history, as you undo, the buffer returns to its exact state back in time. A BGE mark is just an offset into the buffer. We can just store this offset in the event and never mess with it because we know that it will be correct when we undo to the point where we want to use it. We can't do this with a LLE (linked list editor). A LLE mark is a pair: (line pointer, offset in line). Both editor types have to adjust marks as text is inserted and deleted but unlike the BGE, LLE marks can become invalid as time goes on (and lines unlinked). One example of this is if you insert text into the middle of a block and then delete that block. The delete has unlinked the lines that were inserted so the marks with insert are invalid. When you undo the delete, you can't undo the insert because you no longer know where it took place. This is not a problem with BGE because even though the delete made the insert mark point to the wrong place, when the delete was undoed, the mark became valid again.

To get around this problem a LLE has to either 1) track all cursor movement or 2) calculate the pair (line number, offset in line).

1) Pros: Can undo cursor movements.

Cons:

2) Pros:

Cons:

I chose to do 2) (since my editor is a LLE). Here is the C code:

     typedef struct
     {
       int line;
       int offset;
     } UnMark;

For a BGE, just use the mark type you already have.

Some combinations of events occur together frequently and it might be a win to create an event to take advantage of them. For example, replace operations involve a delete and insert at the same place. If you packed these two events into one, you could potentially save lots of space in a big query replace.

Alternative Data Structures

I've glossed over how I actually store the undo stack. This is because there many (good) ways to do it. I use linked lists because they are simple. A more clever and compact method would be to pack the deleted text into the event structure (using the "clever" technique mentioned above) and store the event plus text in a fixed size data area. This has several advantages: You don't have to worry the number of events - when you run out space to store them, start removing the old events. It is much easier to garbage collect the undo stack during undo. Its also easier to make room for a new event. The disadvantages are that you have to worry about structure alignment (and machine architecture) and the some of the code will get messy (in my opinion). The old trade off of fast and small versus big and simple (or, as Julie Brown would say, "I like 'em big and stupid").

Global Variables

I use a few global variables to make controlling undo a little easier.

Bool do_undo;

  This variable is TRUE if undo is turned on for the current buffer.
  The switch buffer code sets this so people who need to know can do so
  quickly.  The user can change it by toggling a buffer flag.

Buffer *current_buffer;

  Pointer to the buffer where changes are taking place.  The current
  undo header hangs off this buffer.

int max_undos = 600;

  The maximum number of undo events.

int max_undo_saved = 5000;

  The maximum amount of deleted characters that can be saved.

Algorithm for Saving Undo Events

I save undo events as they occur. I leave it to the undo routine to figure out how to undo this type of event because its easy, undo has more time to do it and this routine should be as fast as possible because most undo events will never be used and you want the editor to keep up with you.

     save_for_undo(type,n)
       int type;	/* event type:  BU_INSERT, etc */
       int n;		/* number of characters involved (if any) */
     {
       Undo undo;
       UnMark mark;

       if (!do_undo) return;	/* don't save undos for this buffer */

       switch (type)
       {
	 case BU_INSERT:			/* Characters inserted */
	   set_unmark(&mark);
	   pack_sequential_insert_events();

	   undo.n = n;
	   undo.mark = mark;

	   break;
	 case BU_DELETE:			/* Characters deleted */
	 {
	   if (n == 0) return;			/* that was easy! */

	   set_unmark(&mark);
	   undo.mark = mark;
	   open_bag(n);		     /* make sure there is enough space */
	   undo.n = save_text(n);    /* returns number of characters saved */

	   break;
	 }
	 case BU_SP:				/* Set sequence point */
	   if (last_event_was_a_sequence_point()) return;
	   break;
	 case BU_UNCHANGED:		/* Buffer has been marked safe */
	   if (last_event_was_a_save()) return;
	   header->there_is_a_valid_unchanged_event = TRUE;
	   break;
       }

       undo.type = type;

       push_undo(&undo);
     }

Algorithm for Undoing Events

This routine performs undo by reversing the effects of undo events from most recent until it finds a sequence point or runs out of events.

Note that undo is turned off so we don't try and re-save undo events as we march through the stack.

If the last event undid was a UNCHANGED, we need to add a new one to the undo list (to replace the one we popped). If we don't, when we undo back to this point in the future, there won't be a UNCHANGED event to tell us the buffer is unchanged. Ditto if we empty the undo stack.

UNCHANGED events are a no-op if the buffer is not modified. This makes <save buffer><undo> work "nicer" and lets the above work (if it wasn't, undo would stop at the UNCHANGED event and you couldn't undo any more).

     undo()
     {
       int undo_flag, num_events, something_happened, push_an_unchanged;
       Undo *ptr;

       if (no_undos()) return;

       undo_flag = do_undo; do_undo = FALSE;

       num_events = 0;
       something_happened = FALSE;
       push_an_unchanged = FALSE;

       while (ptr = pop_undo())
       {
	 num_events++;
	 switch (ptr->type)
	 {
	   case BU_INSERT:	/* To undo: Delete inserted characters */
	     something_happened = TRUE;
	     goto_unmark(&ptr->mark);
	     delete_text(ptr->n);
	     break;
	   case BU_DELETE:	/* To undo: Insert deleted characters */
	     something_happened = TRUE;
	     goto_unmark(&ptr->mark);
	     insert_block(ptr->text, ptr->n);
	     break;
	   case BU_UNCHANGED:	/* At this point, buffer might be unchanged */
	     if (header->there_is_a_valid_unchanged_event &&
		 buffer_is_modified())
	     {
		buffer_modified(FALSE);
		something_happened = TRUE;
		push_an_unchanged = TRUE;
	     }
	     header->there_is_a_valid_unchanged_event = FALSE;
	     /* fall through:  a BU_UNCHANGED acts like a sequence point */
	   case BU_SP:		/* Hit a sequence point, stop undoing */
	     if (something_happened) goto done;
	     break;
	 }
       }

     done:

       gc_undos(num_events);	/* clean up the undo stack needed */

       do_undo = undo_flag;

       if (push_an_unchanged || (undo_stack_empty() && buffer_not_modified()))
	     save_for_undo(BU_UNCHANGED,0);
     }

Miscellaneous Algorithms

Here are some miscellaneous routines to flesh out undo. Note that there are a lot missing, for example undo stack management and garbage collection. The routines presented here aren't as "clean" as I'd like because they have to know how the undo stack is implemented.

Note: When I say walk the stack, I mean move towards the older events. I do this because it is easy and in the case of undo() and that's what you want.

The next routine is used to make space available to hold a copy of the text that will be deleted. Basically, it figures out which (if any) of the older events need to be removed from the stack to make room in the bag. It does this by looking at delete events (the only event that saves text) and finding an event such that by deleting that event and all older events, it will remove the minimum number of events necessary to free enough space so that the bag can hold space_needed more characters. The key here is that the number of characters in the bag is the sum of all delete events (the n in the Undo structure). Note it would be faster, in general, to walk the stack backwards (from oldest event to youngest) and find the first event that met the requirements but that is not possible the way I implemented the stacks.

	 /* Check to see if space_needed is available in the bag.  If not,
	  *   try and pack the bag and stack to make enough room.  If more
	  *   space is requested than we allow, clear the stack because if
	  *   we can't save this undo, every undo before this one is
	  *   invalid.
	  * Returns:
	  *   TRUE:  Bag (now) has enough room.
	  *   FALSE:  You want too much space and can't have it.  Undo stack
	  *     cleared.  Try again with a more reasonable request.
	  */
     static int open_bag(space_needed) int space_needed;
     {
       int tot, da_total, total_so_far;
       Undo *ptr, *da_winner;
       UndoHeader? *header;

       header = <the undo header for the current buffer>;

       total_so_far = bag_size(header->bag); /* number of characters in bag */
       tot = space_needed -(max_undo_saved -total_so_far);

       if (tot <= 0) return TRUE;		/* Plenty of room! */

       if (space_needed > max_undo_saved)	/* You want too much! */
       {
	 clear_undos();
	 return FALSE;
       }

       da_winner = NULL;
       for (ptr = <walk undo stack starting at the most recent event>)
       {
	 if (ptr->type == BU_DELETE)
	 {
	   if (total_so_far >= tot)	/* found enough space here */
	   {
	     da_winner = ptr;
	     da_total = total_so_far;
	   }
	   else break;		/* the last one was the one we wanted */
	   total_so_far -= ptr->n;
	 }
       }

	   /* We KNOW (and can prove it) that da_winner != NULL because
	    *   bag_size >= tot > max - bag_size
	    *   (from
	    *   max_undo_saved >= space_needed > max_undo_saved -bag_size
	    *   among other things).  Since bag_size is the sum of all the
	    *   BU_DELETEs, there must be one or more events that are
	    *   winners.
	    */
       for (ptr = <walk undo stack starting at da_winner>)
       {
	 free_undo(ptr);
       }
       pack_bag(da_total);	/* remove text from bag, fix up pointers */

       return TRUE;
     }

	 /* Go though the undo stack and adjust the text indices to reflect
	  * n characters being removed from the front of the bag.  Shift n
	  * characters off the front of the bag.
	  */
     static void pack_bag(n) int n;
     {
       Undo *ptr;
       UndoHeader? *header;

       header = <the undo header for the current buffer>;

       for (ptr = <walk undo stack starting at the most recent event>)
	 if (ptr->type == BU_DELETE)
	 {
	   <adjust ptr->text back by n characters>
	 }

		/* remove n characters from front of bag */
       slide_bag(header->bag,n);
     }

Here are a couple of routines that you will need if you have a LLE. This is how to calculate and use undo marks.

     static void goto_unmark(mark) UnMark *mark;
     {
       goto_line(mark->line);
       next_character(mark->offset);
     }

     static void set_unmark(mark) UnMark *mark;
     {
       extern Mark *the_dot;	/* the dot in the current buffer */

       register Line *lp, *dot_line;
       register int line;

       dot_line = the_dot->line;
       for (line = 1, lp = BUFFER_FIRST_LINE(current_buffer);
		lp != dot_line; lp = lp->l_next)
	 line++;

       mark->line = line;
       mark->offset = the_dot->offset;
     }


HomePage | RecentChanges | EditorIndex | TextEditorFamilies | Preferences
Edit text of this page | View other revisions
Last edited February 20, 2016 2:34 pm (diff)
Search: