What you're asking is called "establish shot", so you show the turtle in 3 sequences: 1.) The turtle and its surrounding, 2.) the whole turtle, 3.) the face of the turtle.
You don't shoot all 3 sequence in one go. First you have to take a custom white balance when you are about 3 meters away from the turtle to get the color correct (if you have wide angle lens, you could be closer to 3 meter). Set focus, then start shooting for about 30 seconds. Stop recording, swim closer like 1.5 meter, take custom white balance (or turn on light), depend on the size of the turtle, you might have to remove the wide angle lens. Set focus, then take another 30 seconds shot. Then swim closer within arm length to the face of the turtle, take white balance/turn on light, extend your arm out, set focus then shoot the face of the turtle.
Once you have that 3 shots, just put them together during post. The reason I shoot 30 seconds or so because the first 5 and last 5 seconds tend to be shaky since I would be pressing the record button.
This is how I would shoot the scene you asked because I only have point and shoot experience. Other people may have different/better way, especially those who use dSLR with lens that have limited focal length.
I think I have not expressed well.
What I meant was that I may don't choose option 1. Let's say I want it all in a single shot.