Visionary Lab | Ali Soliman

A full-stack app for generating and managing visual content using Azure OpenAI’s image and video models. Built to show what’s possible when you combine GPT-Image-1, Sora, and GPT-4.1 in a single workflow.

Generate images from text prompts or input images. Create videos from text, images, or both — with audio included, up to 1080p, in 4s/8s/12s durations. Refine prompts using AI best practices. Analyze outputs for quality control and metadata tagging. Manage everything in an organized asset library with folder support.

The backend handles prompt enhancement, brand protection guardrails, and automatic video analysis. Built with Python (FastAPI + uv) and a React frontend. Ships with Jupyter notebooks for exploring the APIs directly.

Built at Microsoft. Runs on Azure OpenAI, Blob Storage, and Cosmos DB.