Fluid solvers are essential in simulating and studying physical systems. Parallelizing these solvers is often straightforward for structured problems, however, unstructured domains typically introduce complex data access patterns and load imbalance. These lead to poor performance and resource utilization. To address load imbalance we study the use of adaptive mesh refinement and Charm++ to partition work between processing elements. At a finer granularity we use novel GPU based algorithms to accelerate the numerics in spite of the data access patterns. Our research in this area aims to solve these issues for high-order numerical methods with non-standard stencils.