LU decomposition

From Rosetta Code
Task
LU decomposition
You are encouraged to solve this task according to the task description, using any language you may know.

Every square matrix can be decomposed into a product of a lower triangular matrix and a upper triangular matrix , as described in LU decomposition.

It is a modified form of Gaussian elimination. While the Cholesky decomposition only works for symmetric, positive definite matrices, the more general LU decomposition works for any square matrix.

There are several algorithms for calculating L and U. To derive Crout's algorithm for a 3x3 example, we have to solve the following system:

We now would have to solve 9 equations with 12 unknowns. To make the system uniquely solvable, usually the diagonal elements of are set to 1

so we get a solvable system of 9 unknowns and 9 equations.

Solving for the other and , we get the following equations:

and for :

We see that there is a calculation pattern, which can be expressed as the following formulas, first for

and then for

We see in the second formula that to get the below the diagonal, we have to divide by the diagonal element (pivot) , so we get problems when is either 0 or very small, which leads to numerical instability.

The solution to this problem is pivoting , which means rearranging the rows of , prior to the decomposition, in a way that the largest element of each column gets onto the diagonal of . Rearranging the columns means to multiply by a permutation matrix :

Example:

The decomposition algorithm is then applied on the rearranged matrix so that


Task description

The task is to implement a routine which will take a square nxn matrix and return a lower triangular matrix , a upper triangular matrix and a permutation matrix , so that the above equation is fullfilled. You should then test it on the following two examples and include your output.

Example 1:

A

1   3   5
2   4   7
1   1   0

L

1.00000   0.00000   0.00000
0.50000   1.00000   0.00000
0.50000  -1.00000   1.00000

U

2.00000   4.00000   7.00000
0.00000   1.00000   1.50000
0.00000   0.00000  -2.00000

P

0   1   0
1   0   0
0   0   1

Example 2:

A

11    9   24    2
 1    5    2    6
 3   17   18    1
 2    5    7    1

L

1.00000   0.00000   0.00000   0.00000
0.27273   1.00000   0.00000   0.00000
0.09091   0.28750   1.00000   0.00000
0.18182   0.23125   0.00360   1.00000

U

11.00000    9.00000   24.00000    2.00000
 0.00000   14.54545   11.45455    0.45455
 0.00000    0.00000   -3.47500    5.68750
 0.00000    0.00000    0.00000    0.51079

P

1   0   0   0
0   0   1   0
0   1   0   0
0   0   0   1

Ada

Works with: Ada 2005

decomposition.ads: <lang Ada>with Ada.Numerics.Generic_Real_Arrays; generic

  with package Matrix is new Ada.Numerics.Generic_Real_Arrays (<>);

package Decomposition is

  -- decompose a square matrix A by PA = LU
  procedure Decompose (A : Matrix.Real_Matrix; P, L, U : out Matrix.Real_Matrix);

end Decomposition;</lang>

decomposition.adb: <lang Ada>package body Decomposition is

  procedure Swap_Rows (M : in out Matrix.Real_Matrix; From, To : Natural) is
     Temporary : Matrix.Real;
  begin
     if From = To then
        return;
     end if;
     for I in M'Range (2) loop
        Temporary := M (M'First (1) + From, I);
        M (M'First (1) + From, I) := M (M'First (1) + To, I);
        M (M'First (1) + To, I) := Temporary;
     end loop;
  end Swap_Rows;
  function Pivoting_Matrix
    (M : Matrix.Real_Matrix)
     return Matrix.Real_Matrix
  is
     use type Matrix.Real;
     Order     : constant Positive := M'Length (1);
     Result    : Matrix.Real_Matrix := Matrix.Unit_Matrix (Order);
     Max       : Matrix.Real;
     Row       : Natural;
  begin
     for J in 0 .. Order - 1 loop
        Max := M (M'First (1) + J, M'First (2) + J);
        Row := J;
        for I in J .. Order - 1 loop
           if M (M'First (1) + I, M'First (2) + J) > Max then
              Max := M (M'First (1) + I, M'First (2) + J);
              Row := I;
           end if;
        end loop;
        if J /= Row then
           -- swap rows J and Row
           Swap_Rows (Result, J, Row);
        end if;
     end loop;
     return Result;
  end Pivoting_Matrix;
  procedure Decompose (A : Matrix.Real_Matrix; P, L, U : out Matrix.Real_Matrix) is
     use type Matrix.Real_Matrix, Matrix.Real;
     Order : constant Positive := A'Length (1);
     A2 : Matrix.Real_Matrix (A'Range (1), A'Range (2));
     S : Matrix.Real;
  begin
     L := (others => (others => 0.0));
     U := (others => (others => 0.0));
     P := Pivoting_Matrix (A);
     A2 := P * A;
     for J in 0 .. Order - 1 loop
        L (L'First (1) + J, L'First (2) + J) := 1.0;
        for I in 0 .. J loop
           S := 0.0;
           for K in 0 .. I - 1 loop
              S := S + U (U'First (1) + K, U'First (2) + J) *
                L (L'First (1) + I, L'First (2) + K);
           end loop;
           U (U'First (1) + I, U'First (2) + J) :=
             A2 (A2'First (1) + I, A2'First (2) + J) - S;
        end loop;
        for I in J + 1 .. Order - 1 loop
           S := 0.0;
           for K in 0 .. J loop
              S := S + U (U'First (1) + K, U'First (2) + J) *
                L (L'First (1) + I, L'First (2) + K);
           end loop;
           L (L'First (1) + I, L'First (2) + J) :=
             (A2 (A2'First (1) + I, A2'First (2) + J) - S) /
             U (U'First (1) + J, U'First (2) + J);
        end loop;
     end loop;
  end Decompose;

end Decomposition;</lang>

Example usage: <lang Ada>with Ada.Numerics.Real_Arrays; with Ada.Text_IO; with Decomposition; procedure Decompose_Example is

  package Real_Decomposition is new Decomposition
    (Matrix => Ada.Numerics.Real_Arrays);
  package Real_IO is new Ada.Text_IO.Float_IO (Float);
  
  procedure Print (M : Ada.Numerics.Real_Arrays.Real_Matrix) is
  begin
     for Row in M'Range (1) loop
        for Col in M'Range (2) loop
           Real_IO.Put (M (Row, Col), 3, 2, 0);
        end loop;
        Ada.Text_IO.New_Line;
     end loop;
  end Print;
  Example_1 : constant Ada.Numerics.Real_Arrays.Real_Matrix :=
    ((1.0, 3.0, 5.0),
     (2.0, 4.0, 7.0),
     (1.0, 1.0, 0.0));
  P_1, L_1, U_1 : Ada.Numerics.Real_Arrays.Real_Matrix (Example_1'Range (1),
                                                        Example_1'Range (2));
  Example_2 : constant Ada.Numerics.Real_Arrays.Real_Matrix :=
    ((11.0, 9.0, 24.0, 2.0),
     (1.0, 5.0, 2.0, 6.0),
     (3.0, 17.0, 18.0, 1.0),
     (2.0, 5.0, 7.0, 1.0));
  P_2, L_2, U_2 : Ada.Numerics.Real_Arrays.Real_Matrix (Example_2'Range (1),
                                                        Example_2'Range (2));

begin

  Real_Decomposition.Decompose (A => Example_1,
                                P => P_1,
                                L => L_1,
                                U => U_1);
  Real_Decomposition.Decompose (A => Example_2,
                                P => P_2,
                                L => L_2,
                                U => U_2);
  Ada.Text_IO.Put_Line ("Example 1:");
  Ada.Text_IO.Put_Line ("A:"); Print (Example_1);
  Ada.Text_IO.Put_Line ("L:"); Print (L_1);
  Ada.Text_IO.Put_Line ("U:"); Print (U_1);
  Ada.Text_IO.Put_Line ("P:"); Print (P_1);
  Ada.Text_IO.New_Line;
  Ada.Text_IO.Put_Line ("Example 2:");
  Ada.Text_IO.Put_Line ("A:"); Print (Example_2);
  Ada.Text_IO.Put_Line ("L:"); Print (L_2);
  Ada.Text_IO.Put_Line ("U:"); Print (U_2);
  Ada.Text_IO.Put_Line ("P:"); Print (P_2);

end Decompose_Example;</lang>

Output:
Example 1:
A:
  1.00  3.00  5.00
  2.00  4.00  7.00
  1.00  1.00  0.00
L:
  1.00  0.00  0.00
  0.50  1.00  0.00
  0.50 -1.00  1.00
U:
  2.00  4.00  7.00
  0.00  1.00  1.50
  0.00  0.00 -2.00
P:
  0.00  1.00  0.00
  1.00  0.00  0.00
  0.00  0.00  1.00

Example 2:
A:
 11.00  9.00 24.00  2.00
  1.00  5.00  2.00  6.00
  3.00 17.00 18.00  1.00
  2.00  5.00  7.00  1.00
L:
  1.00  0.00  0.00  0.00
  0.27  1.00  0.00  0.00
  0.09  0.29  1.00  0.00
  0.18  0.23  0.00  1.00
U:
 11.00  9.00 24.00  2.00
  0.00 14.55 11.45  0.45
  0.00  0.00 -3.47  5.69
  0.00  0.00  0.00  0.51
P:
  1.00  0.00  0.00  0.00
  0.00  0.00  1.00  0.00
  0.00  1.00  0.00  0.00
  0.00  0.00  0.00  1.00

BBC BASIC

<lang bbcbasic> DIM A1(2,2)

     A1() = 1, 3, 5, 2, 4, 7, 1, 1, 0
     PROCLUdecomposition(A1(), L1(), U1(), P1())
     PRINT "L1:" ' FNshowmatrix(L1())
     PRINT "U1:" ' FNshowmatrix(U1())
     PRINT "P1:" ' FNshowmatrix(P1())
     
     DIM A2(3,3)
     A2() = 11, 9, 24, 2, 1, 5, 2, 6, 3, 17, 18, 1, 2, 5, 7, 1
     PROCLUdecomposition(A2(), L2(), U2(), P2())
     PRINT "L2:" ' FNshowmatrix(L2())
     PRINT "U2:" ' FNshowmatrix(U2())
     PRINT "P2:" ' FNshowmatrix(P2())
     END
     
     DEF PROCLUdecomposition(a(), RETURN l(), RETURN u(), RETURN p())
     LOCAL i%, j%, k%, n%, s, b() : n% = DIM(a(),2)
     DIM l(n%,n%), u(n%,n%), b(n%,n%)
     PROCpivot(a(), p())
     b() = p() . a()
     FOR j% = 0 TO n%
       l(j%,j%) = 1
       FOR i% = 0 TO j%
         s = 0
         FOR k% = 0 TO i% : s += u(k%,j%) * l(i%,k%) : NEXT
         u(i%,j%) = b(i%,j%) - s
       NEXT
       FOR i% = j% TO n%
         s = 0
         FOR k% = 0 TO j% : s += u(k%,j%) * l(i%,k%) : NEXT
         IF i%<>j% l(i%,j%) = (b(i%,j%) - s) / u(j%,j%)
       NEXT
     NEXT j%
     ENDPROC
     
     DEF PROCpivot(a(), RETURN p())
     LOCAL i%, j%, m%, n%, r% : n% = DIM(a(),2)
     DIM p(n%,n%) : FOR i% = 0 TO n% : p(i%,i%) = 1 : NEXT
     FOR i% = 0 TO n%
       m% = a(i%,i%)
       r% = i%
       FOR j% = i% TO n%
         IF a(j%,i%) > m% m% = a(j%,i%) : r% = j%
       NEXT
       IF i%<>r% THEN
         FOR j% = 0 TO n% : SWAP p(i%,j%),p(r%,j%) : NEXT
       ENDIF
     NEXT i%
     ENDPROC
     
     DEF FNshowmatrix(a())
     LOCAL @%, i%, j%, a$
     @% = &102050A
     FOR i% = 0 TO DIM(a(),1)
       FOR j% = 0 TO DIM(a(),2)
         a$ += STR$(a(i%,j%)) + ", "
       NEXT
       a$ = LEFT$(LEFT$(a$)) + CHR$(13) + CHR$(10)
     NEXT i%
     = a$</lang>
Output:
L1:
1.00000, 0.00000, 0.00000
0.50000, 1.00000, 0.00000
0.50000, -1.00000, 1.00000

U1:
2.00000, 4.00000, 7.00000
0.00000, 1.00000, 1.50000
0.00000, 0.00000, -2.00000

P1:
0.00000, 1.00000, 0.00000
1.00000, 0.00000, 0.00000
0.00000, 0.00000, 1.00000

L2:
1.00000, 0.00000, 0.00000, 0.00000
0.27273, 1.00000, 0.00000, 0.00000
0.09091, 0.28750, 1.00000, 0.00000
0.18182, 0.23125, 0.00360, 1.00000

U2:
11.00000, 9.00000, 24.00000, 2.00000
0.00000, 14.54545, 11.45455, 0.45455
0.00000, 0.00000, -3.47500, 5.68750
0.00000, 0.00000, 0.00000, 0.51079

P2:
1.00000, 0.00000, 0.00000, 0.00000
0.00000, 0.00000, 1.00000, 0.00000
0.00000, 1.00000, 0.00000, 0.00000
0.00000, 0.00000, 0.00000, 1.00000

C

Compiled with gcc -std=gnu99 -Wall -lm -pedantic. Demonstrating how to do LU decomposition, and how (not) to use macros. <lang C>#include <stdio.h>

  1. include <stdlib.h>
  2. include <math.h>
  1. define foreach(a, b, c) for (int a = b; a < c; a++)
  2. define for_i foreach(i, 0, n)
  3. define for_j foreach(j, 0, n)
  4. define for_k foreach(k, 0, n)
  5. define for_ij for_i for_j
  6. define for_ijk for_ij for_k
  7. define _dim int n
  8. define _swap(x, y) { typeof(x) tmp = x; x = y; y = tmp; }
  9. define _sum_k(a, b, c, s) { s = 0; foreach(k, a, b) s+= c; }

typedef double **mat;

  1. define _zero(a) mat_zero(a, n)

void mat_zero(mat x, int n) { for_ij x[i][j] = 0; }

  1. define _new(a) a = mat_new(n)

mat mat_new(_dim) { mat x = malloc(sizeof(double*) * n); x[0] = malloc(sizeof(double) * n * n);

for_i x[i] = x[0] + n * i; _zero(x);

return x; }

  1. define _copy(a) mat_copy(a, n)

mat mat_copy(void *s, _dim) { mat x = mat_new(n); for_ij x[i][j] = ((double (*)[n])s)[i][j]; return x; }

  1. define _del(x) mat_del(x)

void mat_del(mat x) { free(x[0]); free(x); }

  1. define _QUOT(x) #x
  2. define QUOTE(x) _QUOT(x)
  3. define _show(a) printf(QUOTE(a)" =");mat_show(a, 0, n)

void mat_show(mat x, char *fmt, _dim) { if (!fmt) fmt = "%8.4g"; for_i { printf(i ? " " : " [ "); for_j { printf(fmt, x[i][j]); printf(j < n - 1 ? " " : i == n - 1 ? " ]\n" : "\n"); } } }

  1. define _mul(a, b) mat_mul(a, b, n)

mat mat_mul(mat a, mat b, _dim) { mat c = _new(c); for_ijk c[i][j] += a[i][k] * b[k][j]; return c; }

  1. define _pivot(a, b) mat_pivot(a, b, n)

void mat_pivot(mat a, mat p, _dim) { for_ij { p[i][j] = (i == j); } for_i { int max_j = i; foreach(j, i, n) if (fabs(a[j][i]) > fabs(a[max_j][i])) max_j = j;

if (max_j != i) for_k { _swap(p[i][k], p[max_j][k]); } } }

  1. define _LU(a, l, u, p) mat_LU(a, l, u, p, n)

void mat_LU(mat A, mat L, mat U, mat P, _dim) { _zero(L); _zero(U); _pivot(A, P);

mat Aprime = _mul(P, A);

for_i { L[i][i] = 1; } for_ij { double s; if (j <= i) { _sum_k(0, j, L[j][k] * U[k][i], s) U[j][i] = Aprime[j][i] - s; } if (j >= i) { _sum_k(0, i, L[j][k] * U[k][i], s); L[j][i] = (Aprime[j][i] - s) / U[i][i]; } }

_del(Aprime); }

double A3[][3] = {{ 1, 3, 5 }, { 2, 4, 7 }, { 1, 1, 0 }}; double A4[][4] = {{11, 9, 24, 2}, {1, 5, 2, 6}, {3, 17, 18, 1}, {2, 5, 7, 1}};

int main() { int n = 3; mat A, L, P, U;

_new(L); _new(P); _new(U); A = _copy(A3); _LU(A, L, U, P); _show(A); _show(L); _show(U); _show(P); _del(A); _del(L); _del(U); _del(P);

printf("\n");

n = 4;

_new(L); _new(P); _new(U); A = _copy(A4); _LU(A, L, U, P); _show(A); _show(L); _show(U); _show(P); _del(A); _del(L); _del(U); _del(P);

return 0; }</lang>

Common Lisp

Uses the routine (mmul A B) from Matrix multiplication.

<lang lisp>;; Creates a nxn identity matrix. (defun eye (n)

 (let ((I (make-array `(,n ,n) :initial-element 0)))
   (loop for j from 0 to (- n 1) do
         (setf (aref I j j) 1))
   I))
Swap two rows l and k of a mxn matrix A, which is a 2D array.

(defun swap-rows (A l k)

 (let* ((n (cadr (array-dimensions A)))
        (row (make-array n :initial-element 0)))
   (loop for j from 0 to (- n 1) do
         (setf (aref row j) (aref A l j))
         (setf (aref A l j) (aref A k j))
         (setf (aref A k j) (aref row j)))))
Creates the pivoting matrix for A.

(defun pivotize (A)

 (let* ((n (car (array-dimensions A)))
        (P (eye n)))
   (loop for j from 0 to (- n 1) do
         (let ((max (aref A j j))
               (row j))
           (loop for i from j to (- n 1) do
                 (if (> (aref A i j) max)
                     (setq max (aref A i j)
                           row i)))
           (if (not (= j row))
               (swap-rows P j row))))
 ;; Return P.
 P))
Decomposes a square matrix A by PA=LU and returns L, U and P.

(defun lu (A)

 (let* ((n (car (array-dimensions A)))
        (L (make-array `(,n ,n) :initial-element 0))
        (U (make-array `(,n ,n) :initial-element 0))
        (P (pivotize A))
        (A (mmul P A)))
   (loop for j from 0 to (- n 1) do
         (setf (aref L j j) 1)
         (loop for i from 0 to j do
               (setf (aref U i j)
                     (- (aref A i j)
                        (loop for k from 0 to (- i 1)
                              sum (* (aref U k j)
                                     (aref L i k))))))
         (loop for i from j to (- n 1) do
               (setf (aref L i j)
                     (/ (- (aref A i j)
                           (loop for k from 0 to (- j 1)
                                 sum (* (aref U k j)
                                        (aref L i k))))
                        (aref U j j)))))
 ;; Return L, U and P.
 (values L U P)))</lang>

Example 1:

<lang lisp>(setf g (make-array '(3 3) :initial-contents '((1 3 5) (2 4 7)(1 1 0))))

  1. 2A((1 3 5) (2 4 7) (1 1 0))

(lu g)

  1. 2A((1 0 0) (1/2 1 0) (1/2 -1 1))
  2. 2A((2 4 7) (0 1 3/2) (0 0 -2))
  3. 2A((0 1 0) (1 0 0) (0 0 1))</lang>

Example 2:

<lang lisp>(setf h (make-array '(4 4) :initial-contents '((11 9 24 2)(1 5 2 6)(3 17 18 1)(2 5 7 1))))

  1. 2A((11 9 24 2) (1 5 2 6) (3 17 18 1) (2 5 7 1))

(lup h)

  1. 2A((1 0 0 0) (3/11 1 0 0) (1/11 23/80 1 0) (2/11 37/160 1/278 1))
  2. 2A((11 9 24 2) (0 160/11 126/11 5/11) (0 0 -139/40 91/16) (0 0 0 71/139))
  3. 2A((1 0 0 0) (0 0 1 0) (0 1 0 0) (0 0 0 1))</lang>

D

Translation of: Common Lisp

<lang d>import std.stdio, std.algorithm, std.typecons, std.numeric,

      std.array, std.conv, std.string, std.range;

bool isRectangular(T)(in T[][] m) pure nothrow @nogc {

   return m.all!(r => r.length == m[0].length);

}

bool isSquare(T)(in T[][] m) pure nothrow @nogc {

   return m.isRectangular && m[0].length == m.length;

}

T[][] matrixMul(T)(in T[][] A, in T[][] B) pure nothrow in {

   assert(A.isRectangular && B.isRectangular &&
          !A.empty && !B.empty && A[0].length == B.length);

} body {

   auto result = new T[][](A.length, B[0].length);
   auto aux = new T[B.length];
   foreach (immutable j; 0 .. B[0].length) {
       foreach (immutable k, const row; B)
           aux[k] = row[j];
       foreach (immutable i, const ai; A)
           result[i][j] = dotProduct(ai, aux);
   }
   return result;

}

/// Creates the pivoting matrix for m. T[][] pivotize(T)(immutable T[][] m) pure nothrow in {

   assert(m.isSquare);

} body {

   immutable n = m.length;
   auto id = iota(n)
             .map!((in j) => n.iota.map!(i => T(i == j)).array)
             .array;
   foreach (immutable i; 0 .. n) {
       // immutable row = iota(i, n).reduce!(max!(j => m[j][i]));
       T maxm = m[i][i];
       size_t row = i;
       foreach (immutable j; i .. n)
           if (m[j][i] > maxm) {
               maxm = m[j][i];
               row = j;
           }
       if (i != row)
           swap(id[i], id[row]);
   }
   return id;

}

/// Decomposes a square matrix A by PA=LU and returns L, U and P. Tuple!(T[][],"L", T[][],"U", const T[][],"P") lu(T)(immutable T[][] A) pure nothrow in {

   assert(A.isSquare);

} body {

   immutable n = A.length;
   auto L = new T[][](n, n);
   auto U = new T[][](n, n);
   foreach (immutable i; 0 .. n) {
       L[i][i .. $] = 0;
       U[i][0 .. i] = 0;
   }
   immutable P = A.pivotize!T;
   immutable A2 = matrixMul!T(P, A);
   foreach (immutable j; 0 .. n) {
       L[j][j] = 1;
       foreach (immutable i; 0 .. j+1) {
           T s1 = 0;
           foreach (immutable k; 0 .. i)
               s1 += U[k][j] * L[i][k];
           U[i][j] = A2[i][j] - s1;
       }
       foreach (immutable i; j .. n) {
           T s2 = 0;
           foreach (immutable k; 0 .. j)
               s2 += U[k][j] * L[i][k];
           L[i][j] = (A2[i][j] - s2) / U[j][j];
       }
   }
   return typeof(return)(L, U, P);

}

void main() {

   immutable a = [[1.0, 3, 5],
                  [2.0, 4, 7],
                  [1.0, 1, 0]];
   immutable b = [[11.0, 9, 24, 2],
                  [1.0,  5,  2, 6],
                  [3.0, 17, 18, 1],
                  [2.0,  5,  7, 1]];
   auto f = "[%([%(%.1f, %)],\n %)]]\n\n".replicate(3);
   foreach (immutable m; [a, b])
       writefln(f, lu(m).tupleof);

}</lang>

Output:
[[1.0, 0.0, 0.0],
 [0.5, 1.0, 0.0],
 [0.5, -1.0, 1.0]]

[[2.0, 4.0, 7.0],
 [0.0, 1.0, 1.5],
 [0.0, 0.0, -2.0]]

[[0.0, 1.0, 0.0],
 [1.0, 0.0, 0.0],
 [0.0, 0.0, 1.0]]


[[1.0, 0.0, 0.0, 0.0],
 [0.3, 1.0, 0.0, 0.0],
 [0.0, 0.3, 1.0, 0.0],
 [0.2, 0.2, 0.0, 1.0]]

[[11.0, 9.0, 24.0, 2.0],
 [0.0, 14.5, 11.5, 0.5],
 [0.0, 0.0, -3.5, 5.7],
 [0.0, 0.0, 0.0, 0.5]]

[[1.0, 0.0, 0.0, 0.0],
 [0.0, 0.0, 1.0, 0.0],
 [0.0, 1.0, 0.0, 0.0],
 [0.0, 0.0, 0.0, 1.0]]

Fortran

<lang Fortran> program lu1

 implicit none
 call check( reshape([real(8)::1,2,1,3,4,1,5,7,0                  ],[3,3]) )
 call check( reshape([real(8)::11,1,3,2,9,5,17,5,24,2,18,7,2,6,1,1],[4,4]) )

contains

 subroutine check(a)
   real(8), intent(in)   :: a(:,:)
   integer               :: i,j,n
   real(8), allocatable  :: aa(:,:),l(:,:),u(:,:)
   integer, allocatable  :: p(:,:)
   integer, allocatable  :: ipiv(:)
   n = size(a,1)
   allocate(aa(n,n),l(n,n),u(n,n),p(n,n),ipiv(n))
   forall (j=1:n,i=1:n)
     aa(i,j) = a(i,j)
     u (i,j) = 0d0
     p (i,j) = merge(1  ,0  ,i.eq.j)
     l (i,j) = merge(1d0,0d0,i.eq.j)
   end forall
   call lu(aa, ipiv)
   do i = 1,n
      l(i, :i-1) = aa(ipiv(i), :i-1)
      u(i,i:   ) = aa(ipiv(i),i:   )
   end do
   p(ipiv,:) = p
   call mat_print('a',a)
   call mat_print('p',p)
   call mat_print('l',l)
   call mat_print('u',u)
   print *, "residual"
   print *, "|| P.A - L.U || =  ", maxval(abs(matmul(p,a)-matmul(l,u)))
 end subroutine

 subroutine lu(a,p)

! in situ decomposition, corresponds to LAPACK's dgebtrf

   real(8), intent(inout) :: a(:,:)
   integer, intent(out  ) :: p(:)
   integer                :: n, i,j,k,kmax
   n = size(a,1)
   p = [ ( i, i=1,n ) ]
   do k = 1,n-1
       kmax = maxloc(abs(a(p(k:),k)),1) + k-1
       if (kmax /= k ) p([k, kmax]) = p([kmax, k])
       a(p(k+1:),k) = a(p(k+1:),k) / a(p(k),k) 
       forall (j=k+1:n) a(p(k+1:),j) = a(p(k+1:),j) - a(p(k+1:),k) * a(p(k),j)
   end do
 end subroutine
 subroutine mat_print(amsg,a)
   character(*), intent(in) :: amsg
   class    (*), intent(in) :: a(:,:)
   integer                  :: i
   print*,' '
   print*,amsg
   do i=1,size(a,1)
     select type (a)
       type is (real(8)) ; print'(100f8.2)',a(i,:)
       type is (integer) ; print'(100i8  )',a(i,:)
     end select
   end do
   print*,' '
 end subroutine

end program </lang>

Output:
 a
    1.00     3.00     5.00
    2.00     4.00     7.00
    1.00     1.00     0.00
 p
    0.00     1.00     0.00
    1.00     0.00     0.00
    0.00     0.00     1.00
 l
    1.00     0.00     0.00
    0.50     1.00     0.00
    0.50    -1.00     1.00
 u
    2.00     4.00     7.00
    0.00     1.00     1.50
    0.00     0.00    -2.00
 residual
 || P.A - L.U || =     0.0000000000000000     
 a
   11.00     9.00    24.00     2.00
    1.00     5.00     2.00     6.00
    3.00    17.00    18.00     1.00
    2.00     5.00     7.00     1.00
 p
    1.00     0.00     0.00     0.00
    0.00     0.00     1.00     0.00
    0.00     1.00     0.00     0.00
    0.00     0.00     0.00     1.00
 l
    1.00     0.00     0.00     0.00
    0.27     1.00     0.00     0.00
    0.09     0.29     1.00     0.00
    0.18     0.23     0.00     1.00
 u
   11.00     9.00    24.00     2.00
    0.00    14.55    11.45     0.45
    0.00     0.00    -3.47     5.69
    0.00     0.00     0.00     0.51
 residual
 || P.A - L.U || =     0.0000000000000000