Data-Chunking Example

Introduction

Here's the problem. I tell you that object1 is a large, red car. Then later, I ask you. What was object1? You're meant to be able to tell me, it's a large, red car.
In Soar, it turns out this is hard to learn. In particular it's difficult to learn a chunk which does this:
	^item object1 -> ^recall (O1 ^isa car ^color red ^size large)
The basic reason for this is that when you try to learn the chunk you need to test the object that is to be recalled and you get a chunk like this :
	^item object1 ^object (O1 ^isa car ^color red ^size large)
	-->
	^recall (O2 ^isa car ^color red ^size large)
which is known as a recognition chunk. (If later, Soar sees object1 and a large, red car this chunk will fire, which is enough to tell Soar that it's seen this object before. If you showed it object1 and a small, blue car this chunk wouldn't fire--it wouldn't be recognized). The trick is to learn the recall rule and the method for doing this goes by the name "data-chunking". This set of code is an example of how to do it, based on ideas Rick Lewis presented at an earlier Soar workshop. The key thing is that it doesn't rely on the fact that Soar fails to backtrace through search-control when building chunks, which is the traditional way to data-chunk. This way seems better.

Loading & Running

  1. Source the file "data.soar".
    This will load the other files. It's only 18 productions total.
  2. Then "run".
    Soar will prompt for the type of run, instance or recall. During an instance run, Soar is given the cue "object1" and the features "large, red car". It then tries to remember them.
    During a recall run, Soar is only given the cue "object1" and it has to recall the features.
    On the first run, give it an instance (i.e. type "i"). This is when you tell Soar "object1 is a large, red car".
    You should see :
         0: ==>S: S1 
    Type of run (i - instance or r - recall) : i
         1:    O: O1 (instance)
         2:    ==>S: S2 (operator no-change)
         3:       ==>S: S3 (operator tie)
    Building chunk-1
    Building chunk-2
    Building chunk-3
         4:       O: O5 (build) (color red)
    Retracting chunk-1
    
         5:       O: O4 (build) (size large)
    Retracting chunk-2
    
         6:       O: O3 (build) (isa car)
    Retracting chunk-3
    
         7:       O: O6 (finished)
    Building chunk-4
    System halted.
    
    The key product of this run are the chunks, chunk-1, chunk-2 and chunk-3 (which propose best preferences for the build operators. Interestingly, chunk-4 is like the recognition rule I described above :
    (sp chunk-4
      :chunk
      (state  ^operator )
      ( ^recall-cue object1 ^object )
      ( ^isa car ^size large ^color red)
    -->
      ( ^recall-obj  +)
      ( ^color red + ^size large + ^isa car +))
    
  3. Then "init-soar" and "run" again.
    This time choose recall ("r").
         0: ==>S: S1 
    Type of run (i - instance or r - recall) : r
         1:    O: O1 (recall)
         2:    ==>S: S2 (operator no-change)
    Firing chunk-2
    Firing chunk-1
    Firing chunk-3
    
         3:       O: O47 (build) (isa car)
    Retracting chunk-3
    
         4:       O: O38 (build) (color red)
    Retracting chunk-1
    
         5:       O: O31 (build) (size large)
    Retracting chunk-2
    
         6:       O: O2 (finished)
    Building chunk-5
    System halted.
    
    The key this time is chunk-6 which looks like this :
    (sp chunk-5
      :chunk
      (state  ^operator )
      ( -^object  ^recall-cue object1)
    -->
      ( ^recall-obj  +)
      ( ^isa car + ^color red + ^size large +))
    
    Now Soar has learned to recall this object. Compare this chunk to chunk-5 to see the point of doing "data-chunking".

  4. Test it by "init-soar" and "run".
    This time when you choose "r", chunk-6 fires immediately, recalling the object :
         0: ==>S: S1 
    Type of run (i - instance or r - recall) : r
    
         1:    O: O1 (recall)
    Firing chunk-5
    
    System halted.
    
    There's not much else you can do with this demo...except take a look at the productions and try to understand it.

Method

The basic method is to take the initial set of features (red, large, car) and force a tie between build operators--one for each feature. Why do that you say? Well, so that then you can learn chunks which recognize each feature individually, which are chunks 1-3. They take the form of search control rules for the build operators, making them "best" (and indifferent but that's not important).
(sp chunk-3
  :chunk
  (state  ^recall-cue object1 ^operator  +)
  ( ^name build ^value car ^attribute isa)
-->
  ( ^operator  > ^operator  =))
During the recall phase, Soar proposes every possible feature it can; (red, blue, yellow, green etc. small, medium, large, cars, vans, trucks). These take the form of build operator proposals. The chunks 1-3 then make just the right build operators best. They get chosen and they build up the object piece by piece. Once no more best operators are proposed, the finished operator comes in to signal the object is complete and returns a completed object to the top operator, which leads to chunk-5 :
(sp chunk-5
  :chunk
  (state  ^operator )
  ( -^object  ^recall-cue object1)
-->
  ( ^recall-obj  +)
  ( ^isa car + ^color red + ^size large +))
which is the chunk we want.
All data-chunking takes this form. Learn a recognition rule (or in this case a series of recognition rules), then go through a big generation stage (where you generate every possible feature), select the correct ones (using the recognition rules) and then learn the recall rule.



Douglas Pearson [2007]