## Difficulty with tree restructuring

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

sgfw
Posts: 46
Joined: Wed Jun 19, 2013 11:47 pm UTC

### Difficulty with tree restructuring

I'm making a CAS program in python and I have a class for expressions that is organized as a tree with a node for each operator ('+', '-', '*', '/', '^') and one for several functions ('sin', 'cos', 'log', etc) It works for a lot of purposes, but I would really like to make a function to "restructure" an expression. To elaborate:

The function would take an expression, a skeleton of an "old" structure, and a skeleton of a "new" structure. Lets say the expression is (5/1 + 3 + a)/1 where 'a' can be anything, including another node. Presented in a more visual form, this would be:

The 'old' structure argument would be a/1, where 'a' can be anything, and the 'new' structure would be just ''a, or in a more visual form:

The function would then output 5 + 3 + a, as it would have gone through and replaced every part of the form a/1 with just 'a'. This function would have several applications, such as replacing anything of the form a^log(a, b) with b, or replacing anything of the form a/c + b/c with (a + b)/c. The problem is, I can't figure out how to write it. It has some aspects that should be programmed recursively, and some that shouldn't. I have tried breaking it into two functions, one that traces the given expression for occurrences of the 'old' structure, and another that replaces them, but I still can't seem to do it. I think it would be a lot easier if operators such as '+' and '*' weren't communicative, because as they are, I have to make the program search pretty much every single branch on that node for one that coincides with the correct structure. Can anyone think of a relatively simple way to write this function? I feel like there is one, but I can't think of it.

I'm not sure how well I explained this. If anyone would like me to elaborate more, I'd be happy to. Thanks.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Contact:

### Re: Difficulty with tree restructuring

I get the problem, but I'm not sure how to suggest writing it because I don't really have a good idea of what you tried and what you think doesn't work (and what does). One idea: look at something called Bottom-Up Rewrite Systems. If you look around you'll see lots of people talking about it for instruction generation in compilers, but this seems like a decent application of it as well. I'm not sure how overkill the idea is -- I don't think you need the full power of BURS, which excels when there are multiple solutions and you want to pick one with least cost. But you might be able to find a library or something you can use that will allow you to give a nice declarative specification of the rewriting rules.

sgfw
Posts: 46
Joined: Wed Jun 19, 2013 11:47 pm UTC

### Re: Difficulty with tree restructuring

I don't really have a good idea of what you tried and what you think doesn't work (and what does)

I can't imagine a solution to this problem that isn't recursive, so I started by making a program that starts at the top, and goes down every branch looking for a chain of nodes that is the same as the structure given. The problem is, once it traces the branches all the way down the structure to finally discover the it is in fact the same as the "old" structure argument, it has no way of retracing it's steps to regain all of the information it needs to fill in the blanks in the "new" structure. So I tried writing two functions, one that would go through the expression and map all of the occurrences of the "old" structure, and then another that would use the map to fill in the new structure, but there is another problem with this: Some restructures such as replacing a/c + b/c with (a + b)/c rely on the two "c"s being equal, which the mapping function has no way of determining because once again, by the time the function has followed the expression to the bottom branches, a/c and b/c, the two branches are seperate and the program has no way of determining that they are in fact equal.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Contact:

### Re: Difficulty with tree restructuring

Here's how I would structure it, starting with a big digression.

First, I'll toss out another buzzword, which is a functional data structure. (Also called applicative data structures and, with caveats, persistent data structures.) Consider a typical binary search tree implementing a dictionary or something. If you want to change a value in the dictionary, you go and change the data structure so that the node has the new value. In a functional data structure, you're not allowed to change things, so you have to make a "copy" of the old data structure with whatever you want changed. So instead of writing mytree.update(key, val), you write newtree = mytree.update(key, value). Existing references to the old version won't be updated, but now you've got a new copy saved in a new reference. (I'm using the term "reference" here, but of course analogous things can happen in CC++.) This sounds like it'd be really expensive, but it actually isn't: the new version of the tree will be able to share references to most of the old tree. Only nodes that are on the path from the root to key's node will need to be copied; this is called the "spine". (If you need to rebalance, that can widen the spine by a node, but not more.) See the picture; the red node is what is being updated, the green nodes and links are what gets copied, and the grey stuff is untouched.

So how do you implement this? It'd be something like (untested)

Code: Select all

`def update(node, key, value):    assert node is not None     # precondition: key is in tree.    newnode = node.clone()    if key == node.key:        newnode.value = value    if key < node.key:        newnode.left = update(node.left, key, value)    if key > node.key:        newnode.right = update(node.right, key, value)    return newnode`
(As an optimization, if there is no actual change the tree shouldn't be rewritten and the old reference returned.)

Why do I bring this up? Because it's got a similar structure as to what I'd do for your rewriting problem, but is hopefully simpler. First, don't rewrite on the way down, rewrite on the way up. That means: (1) rewrite the current node's children then (2) look for each pattern and see if it matches.

So suppose you wanted to match x + 0, x * 1, and a/c+b/c (to (a+b)/c). You could do something like the following. Assume that node.op holds a string with the operator for internal nodes and an integer for integers.

Code: Select all

`def simplify(node):    copy = node.clone()    if node.op == "*":        copy.left = simplify(node.left)        copy.right = simplify(node.right)        if copy.left.op == 1:            return copy.right  # 1 * x = x        if copy.right.op == 1:            return copy.left # x * 1 = x        return copy    if node.op == "+":        copy.left = simplify(node.left)        copy.right = simplify(node.right)        if copy.left == 0:            return copy.right # 0 + x = x        if copy.right == 0:            return copy.left # x + 0 = x        if (copy.left.op == "/"                and copy.right.op == "/"                and copy.left.right == copy.right.right):            # a/c + b/c  => (a + b)/c            a = copy.left.left            b = copy.right.left            c = copy.left.right            a_plus_b = Node("+", a, b)            sum_div_c = Node("/", a_plus_b, c)            return sum_div_c        return copy    ...`

(I would write that differently in real code (more meta to avoid repetition) but the above is intended to more explicitly show what I'm doing. I'd also look for a way to declaritively specify patterns.)

It should be possible to write a mutating version too... but the functional one seems simpler. Maybe I am weird. But again, to make the functional one more reasonable you should also check that you're actually changing something before returning a new copy, to save on memory (or at least allocations).

sgfw
Posts: 46
Joined: Wed Jun 19, 2013 11:47 pm UTC

### Re: Difficulty with tree restructuring

That's not exactly what I'm going for, It would be helpful if there was a way to do it such that the function is not made to make only one replacement, but instead is capable of making any specified replacement it is given. However, this is a nice new way to look at the entire problem, so I'll see if that gets me anywhere. Thanks for the the help!

heatsink
Posts: 86
Joined: Fri Jun 30, 2006 8:58 am UTC

### Re: Difficulty with tree restructuring

I've done arithmetic expression simplification before. EvanED's description points out the right recursion pattern, but I don't think that tree insertion is helpful for showing how simplification works. I'll use diagrams instead. I will draw triangles to stand for expression trees, like this:
trees.png (11.37 KiB) Viewed 2933 times

Every expression tree consists of a root node with subtrees, which is a hint that the problem should be solved by recursion over expression trees. Inside an expression, there may be simplifiable subexpressions. These should all be found and simplified. We want a simplify function that takes a tree and returns the equivalent simplified tree.
simplify_type.png (12.41 KiB) Viewed 2933 times

The input to simplify is a tree, which consists of a root node and subtrees. The subtrees are recursively simplified first. This puts the subtrees in simplified form.
simplify_children.png (8.41 KiB) Viewed 2933 times

Now that the subtrees are simplified, the only possible place for further simplification is at the root node. So, simplify_root tries to match the root node to any simplification pattern, and transforms it if it matches. Typical simplification rules don't modify subtrees, so the subtrees stay in simplified form.
simplify_root.png (11.94 KiB) Viewed 2933 times

Putting these two steps together gives you a recursive simplify function.
simplify_composed.png (25.13 KiB) Viewed 2933 times