Let's first fill up the sack with as small items as possible, until we can't fill it anymore. The difference between current weight and max weight can be at most 8 (with the exception of trivial inputs) and there are only 8 distinctive items. So if we need to create a sequence of actions which takes us from the current weight to the optimal weight, we don't need many actions to do that. Now, here comes the funny part: instead of constructing a short, smart sequence of actions, we can simply construct a long, dumb sequence of actions. If we choose every action in a greedy, randomized way, we will probably visit the optimal weight at some point in the sequence. What do I mean by greedy? If we are under max weight, let's add a random item to the sack. If we are over max weight, let's remove a random item from the sack. That's it!
There's still time to hack my solution. Have fun!