Python Coding Guidelines


Forecast Systems Laboratory
Modernization Division
Enhanced Forecaster Tools Branch

Introduction


The intent of this document is to define a set of conventions to make the software written at FSL readily usable by others both within and outside the organization.  These conventions are implementation conventions, apply to software developed in the Python and Numeric Python languages.  These conventions attempt to follow general practice within the Python programming community, but with a progressive rather than accomondating emphasis.

Convention Goals

The life-cycle of typical software involves many people, including an original team of software developers, any number of maintenance programmers, and a diverse group of other development teams attempting to reuse existing code. Turnover in personnel within all of these groups is likely to require that many people attempt to understand, maintain, enhance, or otherwise reuse this software. Software must thus be made to readily communicate ideas to many people (and possibly a wide variety of people) if it is to be used effectively.

The speed and reliability with which implementation, maintenance, and enhancement of software may be performed is dependent upon the degree to which software effectively communicates implementation information, shows stylistic consistency, and is amenable to modification. These characteristics, in combination with accuracy and efficiency of execution, indicate the degree to which software will be useful, and thus provide good criteria for evaluating software quality.

Before considering a detailed list of convention definitions, it is profitable to consider the nature of the qualities intended to be brought forth by adherence to these conventions.

        Clarity

The combination of source code and comments within any source file effectively communicates information to readers only to the extent that its contents are made comprehensible. Clarity is enhanced through writing comments in complete sentences consisting of correctly spelled words. Placing declarations and directives so that they are easy to find, using clean and simple logic, and using meaningful names all enhance clarity. Clarity allows desired information to be easily located and extracted from software at various levels of detail with a minimum of effort. Three levels of detail are important:

Visual Coherence

Software placing related information in close visual proximity, and visually separating unrelated information, is more readable than code which does not. This follows from a reader's need to understand the relationships between various pieces of information. This simple technique communicates much more effectively than the extensive commenting which would otherwise be required to communicate the same information.

Consistency

Consistency requires that all software within a project conform to a single set of syntactic and stylistic conventions. Consistency allows the reader to be led into an expectancy regarding the form of syntax to be encountered, enhancing the efficiency and accuracy of interpretation. These requirements span project boundaries when code is reused and shared amongst projects.

Meaningful Form

A meaningful form of syntax enhances a reader's ability to accurately interpret meaning from code. Statements serving a similar purpose are presented in a similar form, while statements having inherent differences are given distinguishing appearances. Visual cues, when used judiciously and consistently, are perceived much more readily than comments, and are thus more effective.

Integrity

The most obvious form of integrity is physical integrity. Source files must be editable, viewable, and printable, using standard utilities available on a variety of platforms. A more subtle aspect of integrity is integrity of design. While software design is beyond the scope of this document, its importance to achieving a good implementation cannot be understated. A high-level design document, whether formal or informal, describes the software architecture being used for the project, and can eliminate misunderstandings which might otherwise develop between team members regarding the design. It also plays a critical role in keeping design documentation out of source files.

Modularity

Maintenance of software is easier when the software is composed of pieces which may be rearranged and/or modified independently of each other. A modular system is composed of components which interact in simple, welldefined ways, and which encapsulate detail in a logical fashion.

Portability

Highly portable software makes little use of system dependencies, and isolates existing dependencies from most software components.

Efficiency

The efficiency with which a software team is able to develop and maintain software is at least as important as the efficiency of the executable code. It is generally possible to style code and comments to be lean yet complete, concise yet readable. Use of excessively verbose code and emphasis on elaborate form reduce the pace of development. Use of excessively terse code and/or comments reduces readability, resulting in difficult and unreliable software maintenance. Neither extreme is desirable. There is a balance to be struck between verbosity and brevity in both code syntax and internal documentation.

Flexibility

Nearly all software is periodically modified or enhanced in some way throughout its lifetime. This may reflect changing requirements, or the need to improve efficiency or responsiveness in some way. Modularity goes a long way towards making such changes possible. Flexibility is also important. If software is designed and implemented with enhancement in mind, the resulting software is much more likely to be extensible with a minimum of effort. If desired changes are in continual conflict with the initial implementation, then it is likely to take a long time to root out inherent inconsistencies between old code and new requirements.

Conformance

It is not profitable to develop software in isolation. The sophistication of software today, and the rapid pace at which systems and software are evolving, is such that developers must make use of existing code if they are to keep pace. Code which is to be reusable by diverse groups, and which is intended to promote clarity and meaningful form, must conform to a reasonable standard. Conformance allows us to communicate easily, and to make efficient use of each other's efforts.


Use of Standards

1) Standard Python programming syntax is used.  Use of vendor-specific language extensions is avoided.
2) Use of standard libraries or extensions are permitted and encouraged if the extensions apply to ALL platforms and are not vendor-specific.  An example is Numeric Python.


Statement Formatting

1) All source code consists of printable ASCII characters, separated into lines by standard control characters. 
2) It is recommended that the line length not exceed 80 characters.  Exceptions may be made for code clarity.
3) One level of indentation corresponds to four columns.  Continuation lines are recommended to be indented two columns, or lined up with the function parenthesis, just as long as the code is clearly shown.

Note: Python forces identation to distinguish control blocks.  As such, the programmer is forced to ident properly.


Spacing and Delimiters

1) Parentheses are used within complex expressions, and whenever there is any possibility of precedence ambiguity.
2) Spaces are used within statements to decompose the statement into logical pieces which enhance the readability of the statement.


Identifiers

1) Identifier conventions apply to identifiers defined within locally defined software.  They are not intended to imply that identifiers within libraries supplied by vendors or third-parties should be modified or converted.
2) Identifiers (names) consist of alphanumeric and underscore characters.  Digits are used sparingly.  Identifiers are composed of some number of word fragments.  A word fragment is some combination of characters conveying a relatively unambiguous meaning to a reader.  Such a fragment may be an English word, a syllable, or a contraction.
3) Identifiers are chosen to clarify the ways in which entities are used, and to relate or distinguish entities to or from each other in a meaningful way. Suitability of a particular identifier generally depends upon the context in which it is used.
4) Typically, variable and method names begin with a non-capital letter and are usually built of complete words for code readability.  If a name consists of multiple words, they can be delineated by using a capital letter at the beginning of the new word ("displayName") OR by using an underscore between words ("phrase_connector_dict").    In general, variable names use the capital letter scheme while methods use either the capital letter or underscore. 
5) Class variables and methods which are public, typically do not begin with an underscore.  Class variables and methods which are "protected", typically begin with a single underscore.  Class variables and methods which are "private", typically begin with two underscores.   Global variables typically begin with an upper case character.


Data Types and Declarations

1) Python data types are "mutable", meaning that any Python variable may take on any data type at any time.  The Python language does not support declaring the type of a variable.
2) Use of "is" is the correct way of comparing a data value to None, rather than using "==".  For example, if x is None.

Control, Comparison, and other Statements

1) Program Flow (control) statements should be done in a readable way.   For example in C++ you must use iterators (for int i=0;i<10;i++), while in Python you can use a similar technique using xrange() or simply get each value in a list (e.g., for a in list).    Use the "for a in list" syntax if appropriate for clarity.
2) Comparison of variables to None should be done using the "is None" rather than "== None" syntax.
3)  Dictionary accesses should generally be limited to one level if in a control loop.

Comments

1) Comments are used to summarize the purpose of associated code, and to clarify the implementation by providing information not contained in the code itself.  They present a sufficient level of detail to make the code easy to follow without being so detailed that they dwarf the code.
2) Comments, for a function or a block of code,  are generally delimited on each line, and are placed and indented the same way as are statements.   Short comments can occasionally be placed to the right of a statement when they refer to only one statement.
3) Excessive commenting, such as individually commenting each of a long series of statements, should be avoided.


Source File Organization

1) Typically a single source file contains a single Python class.   In cases where a Python classes uses an implementation of another class and that class is only used by that class, the definition of the "sub-class" is permitted within the single source file.
2) Within each class, the constructor is usually first.   It is recommended that "similar" functions be kept together in the class, such as all utility functions together, all data processing functions together, etc.


Error Handling

1) Diagnostics are generated for the benefit of software developers and maintenance personnel.  Error handling and the display of error messages are performed for the benefit of the end user.
2) The designer/programmer should implement sufficient error handling in the functions as deemed appropriate by the algorithm.   Error cases need not be handled if the function clearly cannot receive data that could cause that error condition.  Error cases need be considered for all "realistic" data values to a function.


Algorithm Efficiencies

1) Algorithms should be coded in an efficient manner for performance reasons.  Since Python passes virtually everything as a reference, rather than by value, generally performance is fine for passing arguments.
2) Double-nested and triple-nested loops should be avoided when possible. 
3) References to indexed lists within a loop where that indexed list value is not changing is not efficient.   Put the reference outside of the loop.
4) Be extremely careful when using LogStream to print out data values.  Even if the "stream" is not enabled, the Python implementation of LogStream can affect performance when very large data sets are "printed".


Code Reuse and Structuring

1) The code/module should be modular, and algorithms should be decomposed into common functions for readability, code size, and code reuse.
2) There is no particular guideline for the maximum number of lines in a module or function.  However, the purpose of each module should be clear from the set of functions in the module, and the purpose of each function and the "readability" of each function should be clear.  Excessive number of functions or lines in a module or function can reduce readability and maintainability.