In many cases elementwise assembly requires less numerical costs to assemble, because cellrelated quantities can be calculated together and need not be recalculated in each step. However, most of the cellwise calculations can be stored into the cells as a preprocessing step.
Another feature which makes the elementwise assembly faster than linewise assembly is that the number of operations which have to be performed for a cell is better to predetermine, because cells usually have the same topological shape throughout the simulation domain. If the number of operations is constant, the compiler can perform various optimizations, e.g. perform parallel execution or handle this as a loop of fixed iterations.
For the parallel treatment of big equation systems the parallel assembly can be easier performed via linewise application, because matrix insertions are performed linewise and therefore no concurrent access methods are required.
Compared to elementwise assembly, linewise assembly imposes higher requirements on the underlying data structures, while the matrix assembly is straight forward. For the implementation typically two traversal functions have to be available, for instance a function that yields all vertices incident to a given cell and another function that yields all cells incident to a given vertex. The assembly of the system matrix is rather simple, because the system matrix is separated into disjoint regions, namely lines, where each line can be assembled separately, especially using parallel computing mechanisms.
In contrast, elementwise assembly only requires one traversal function which provides all elements incident subsets of a given element, for instance the set of all vertices which are incident to a given cell. Therefore requires the possibility of inserting element submatrices or local matrices into the system matrix. In this case it is possible that two local matrices overlap. When using parallel processes for the assembly it is possible that two or more processes require access to the same matrix element which imposes various difficulties of synchronization and ressource management.
These different models of assembly lead to completely different matrix interfaces: Whereas for linewise assembly it is only required to access each nonzero matrix element at most once, the elementwise access requires different operations, even deleting single matrix rows [83]. Other solver interfaces require to add single matrix elements into the system matrix, where each matrix element can be accessed for several times [84,36,85]. For the sake of completeness, it has to be said, that the frameworks investigated for elementwise assembly do not require the topological operations which are needed for linewise assembly. Figure 4.4 shows the application of linewise assembly for a finite volume method with an initial vertex and five neighboring vertices . The assembly method yields one matrix line which is directly inserted into the system matrix.
Linear solver environments such as Trilinos [4] and PetSC [48] support elementwise access to single matrix entries. A matrix entry can either be added a given value or is overwritten. Furthermore, it is possible to insert local submatrices for optimized finite element assembly.

Michael 20080116