Giving a Hoot about Squareroots
Calculators don't typically facilitate rational arithmetic, merely an uninspiring blend of floatingpoint & integer (though potentially also in binary or hexadecimal). Perhaps because of this, & their ubiquitous use in imperial units, it's tempting to think of fractional numerical representations as rather quirky & oldfashioned; arithmetic before the Sinclair Executive fixed it. I can't prove that nothing is further from the truth, but I don't need to, because rational arithmetic, in contrast to floatingpoint, doesn't suffer from roundingerrors, & is therefore the only viable representation, when a precise operation on a real number is required.
Haskell's datatype "Data.Ratio.Ratio", facilitates the exploration of problemdomains which previously would have fallen into the "mañana" category; for example squareroots. The squareroot of all but a few privileged numbers (those which can be expressed as a ratio of perfect squares) is actually irrational, but rational arithmetic allows one to approximate their value to an arbitrary precision, far exceeding the most precise standardised floatingpoint arithmetic. One might naïvely question the burning requirement to estimate the squareroot of a number to an arbitrarily precise extent; many of the most effective algorithms to generate Pi to an arbitrary precision, rely on the ability to determine squareroots to a similar precision (e.g. Ramanujan's formula, the Chudnovsky brothers' formula, the Borwein brothers' formula & the BrentSalamin algorithm). One might then question why Pi is so interesting, but since this philosophical line could extend indefinitely, I'll end where it starts by saying, it isn't, unless mathematics is.
There are almost as many algorithms by which the squareroot of a number can be estimated, as there are reasons to use any of them. Each of them takes an estimate for the required value, & improves it, but the mechanism by which this improvement is achieved, falls broadly into two categories; seriesexpansions & iterations.
Defining the error as the arithmetic discrepancy between the estimated value of the exact solution, then a convergent algorithm will progressively reduce this error, but since the required result is typically irrational, (& therefore can't be represented precisely as a rational number), the error can never reduce to zero, but merely tends towards zero as the algorithm is applied adinfinitum. As one approaches this limit, the ratio of successive errors is called the "Rate of convergence", which quantifies the number of terms in a series or iterations, required to improve the current estimate to the required degree.
The rate of convergence may be subcategorised into orders of convergence. If the ratio of successive errors settles to a constant value, then the series is said to have a linear order of convergence, whereas if successive errors settle to a power of the previous error, it is said to converge "superlinearly". Superlinearly convergent algorithms, in which the error is the square of the previous error are said to have a "quadratic" order of convergence, whereas as those in which the error is the cube of the previous are said to have a "cubic" order of convergence … you get the idea. It's less complicated than my explanation makes it; the linear algorithms add a fixed number of correct digits to the estimate at each step, whereas the superlinear algorithms multiply the number of correct digits by their order of convergence. Of the two broad classifications into which the available algorithms may be categorised, the seriesexpansions converge linearly with the addition of each successive term, whereas the iterations are typically superlinear. Things get more complicated if one evaluates a linearly convergent series to a fixed number of terms, then uses the resulting improved estimate to redefine iteratively a more rapidly convergent series.
This is all pretty standard stuff, & may even be largely correct, but what's missing is an analysis of which algorithm is most efficient in practice. Whilst it's tempting to assume that a higher order of convergence is better, because it sounds like someone resembling one of the Mekon, furrowed their brow & ploughed some serious thought into the algorithm, the thought that went in didn't necessarily address the number of steps required to implement the algorithm. To redress this, I've gathered a motley selection of algorithms, implemented them in Haskell (any conclusion drawn from this effort, must account for my limited capabilities), using its Rational typeclass, & measured the performance.
Algorithm  Order of Convergence  Notes 

Continued Fraction  Linear  
NewtonRaphson iteration  Quadratic  AKA the Babylonian or Heron's method. 
Bakhshali Approximation  Cubic  
Halley's Method  Quartic  
Taylorseries  Linear  Converges superlinearly, with an order equal to the number of terms, when applied iteratively. 
Testing

My usual testplatform was used.

Though the algorithms were implemented to accept any Real operand, all tests were actually performed arbitrarily on the integer "2". Subsequent testing suggests that whilst this choice makes little difference to performance, the representation of some rational numbers can be very large (irrespective of whether they represent a number of large magnitude), & this would affect the memoryrequirement.
Results
I've fitted polynomials to the data for each algorithm, so that one can estimate their complexityclass. Because of the loglog plot, these polynomials appear as straight lines.

From this one can see that NewtonRaphson iteration, the Bakhshali Approximation & Halley's Method have similar performance.

The Taylorseries, evaluated rather arbitrarily to both 4 & 16 terms & then applied iteratively to yield quartic convergence & order16 convergence, perform similarly, but uncompetitively. It seems reasonable to conclude that it is also uncompetitive when evaluated to an arbitrary number of terms.
Performance is sensitive to the speed at which lists of Rational numbers can be added, & significant gains were made after replacing use of the standard polymorphic function "Data.List.sum" (the implementation of which results in lazily folding crossmultiplication over the list), with a function specialised for Rational numbers (which calculates the GHC.Real.lcm of the denominators in the list, then scales the numerators to this common denominator, & adds them). This is effective for the short lists of numbers encountered here, but it has the potential disadvantage, that it blocks garbagecollection of the whole list, until the final sum has been determined. Should the list be longer, or contain much larger rational numbers, then this issue will raise its ugly head. I had hoped that further performancegains could be achieved, since in any one iteration, one can evaluate all the terms of the series in parallel. Regrettably, this doesn't seem to be an effective strategy on the twocore hardware on which the testing was conducted.
Performance of all the iterationalgorithms, was greatly improved by the elimination of unquantifiable accidental excess precision in the initial Rational estimate, on which they operate. It might be counterintuitive that excess precision could be problematic, but it is when it results in gross inefficiency. One has to start the iteration with an initial estimate, & that estimate must guarantee a minimum level of accuracy, in order that one may determine the minimum number of iterations required to improve it to the required accuracy. In any specific example, the accuracy of this initial estimate may, by chance, exceed expectations, & when precisely converted to Rational form, may then be burdened by an unfeasibly large numerator & denominator, making arithmetic operations inefficient. This inefficiency can be reduced, if one can find a Rational number of similar magnitude, which has sufficient accuracy, but has a significantly simpler Rational representation. For this purpose, the Haskellfunction "Data.Ratio.approxRational" was used.

Finally, the implementation using Continued fractions, is not only slower than the others at any tested precisionrequirement, but given the apparent timecomplexity, one may reasonably conclude that it is also the worst at all greater precisionrequirements.
The memoryrequirement, is a second factor in the suitability of these algorithms. In this respect Continued Fractions joined the most competitive performers; NewtonRaphson iteration, the Bakhshali Approximation & Halley's Method; with the two instances of Taylorseries bringing up the rear (though with so few samples that it's rather difficult to draw any reliable conclusions about how much worse they were).
Conclusions
Algorithms with a low order of convergence, have an unexpected benefit, since the lower it is, the smaller is the likely accuracyovershoot, & the smaller is the wasted effort for the task inhand. In the case of the Taylorseries; evaluated to 16 terms, then iteratively recomposed using this improved estimate, to form a more rapidly convergent series; convergence is so rapid, that in very few iterations the available memory is exhausted, leaving little choice in the resultant accuracy, & a correspondingly high probability that this choice is unsatisfactory.
In this implementation, NewtonRaphson iteration is marginally the best:
 it was amongst the fastest, & its timecomplexity suggested the it would remain so at higher accuracies;
 it's memoryrequirement was lower than competitors of similar speed;
 it has a very manageable quadratic order of convergence;
 it was the simplest to implement.