Learning Options Through Inference-Based Policy Gradients