Python Coding
Guidelines
Forecast Systems Laboratory
Modernization Division
Enhanced Forecaster Tools Branch
Introduction
The intent of this document is to define a set of conventions to make
the software written at FSL readily usable by others both within and
outside the organization. These conventions are implementation
conventions, apply to software developed in the Python and Numeric
Python languages. These conventions attempt to follow general
practice within the Python programming community, but with a
progressive rather than accomondating emphasis.
Convention Goals
The life-cycle of typical software involves many people, including an
original team of software developers, any number of
maintenance programmers, and a diverse group of other development teams
attempting to reuse existing code. Turnover in personnel within all of
these groups is likely to require that many people attempt to
understand, maintain, enhance, or otherwise reuse this software.
Software must thus be made to readily communicate ideas to many people
(and possibly a wide variety of people) if it is to be used
effectively.
The speed and reliability with which implementation,
maintenance, and enhancement of software may be performed is dependent
upon the degree to which software effectively communicates
implementation information, shows stylistic consistency, and is
amenable to modification. These characteristics, in combination with
accuracy and efficiency of execution, indicate the degree to which
software will be useful, and thus provide good criteria for evaluating
software quality.
Before considering a detailed list of convention definitions,
it is profitable to consider the nature of the qualities intended to be
brought forth by adherence to these conventions.
Clarity
The combination of source code and
comments within any source file
effectively communicates information to readers only to the extent that
its contents are made comprehensible. Clarity is enhanced through
writing comments in complete sentences consisting of correctly spelled
words. Placing declarations and directives so that they are easy to
find, using clean and simple logic, and using meaningful names all
enhance clarity. Clarity allows desired information to be easily
located and extracted from software at various levels of detail with a
minimum of effort. Three levels of detail are important:
- Locating specific entities within files;
- Developing a general understanding of an implementation;
- Following the detailed nuances of the executable code.
-
Visual Coherence
Software placing related information in close visual proximity,
and visually separating unrelated information, is more readable than
code which does not. This follows from a reader's need to understand
the relationships between various pieces of information. This simple
technique communicates much more effectively than the extensive
commenting which would otherwise be required to communicate the same
information.
-
Consistency
Consistency requires that all software within a project conform
to a single set of syntactic and stylistic conventions. Consistency
allows the reader to be led into an expectancy regarding the form of
syntax to be encountered, enhancing the efficiency and accuracy of
interpretation. These requirements span project boundaries when code is
reused and shared amongst projects.
-
Meaningful Form
A meaningful form of syntax enhances a reader's ability to
accurately interpret meaning from code. Statements serving a similar
purpose are presented in a similar form, while statements having
inherent differences are given distinguishing appearances. Visual cues,
when used judiciously and consistently, are perceived much more readily
than comments, and are thus more effective.
-
Integrity
The most obvious form of integrity is physical integrity.
Source files must be editable, viewable, and printable, using standard
utilities available on a variety of platforms. A more subtle aspect of
integrity is integrity of design. While software design is beyond the
scope of this document, its importance to achieving a good
implementation cannot be understated. A high-level design document,
whether formal or informal, describes the software architecture being
used for the project, and can eliminate misunderstandings which might
otherwise develop between team members regarding the design. It also
plays a critical role in keeping design documentation out of source
files.
-
Modularity
Maintenance of software is easier when the software is composed
of pieces which may be rearranged and/or modified independently of each
other. A modular system is composed of components which interact in
simple, welldefined ways, and which encapsulate detail in a logical
fashion.
-
Portability
Highly portable software makes little use of system
dependencies, and isolates existing dependencies from most software
components.
-
Efficiency
The efficiency with which a software team is able to develop
and
maintain software is at least as important as the efficiency of the
executable code. It is generally possible to style code and comments to
be lean yet complete, concise yet readable. Use of excessively verbose
code and emphasis on elaborate form reduce the pace of development. Use
of excessively terse code and/or comments reduces readability,
resulting in difficult and unreliable software maintenance. Neither
extreme is desirable. There is a balance to be struck between verbosity
and brevity in both code syntax and internal documentation.
-
Flexibility
Nearly all software is periodically modified or enhanced in
some way throughout its lifetime. This may reflect changing
requirements, or the need to improve efficiency or responsiveness in
some way. Modularity goes a long way towards making such changes
possible. Flexibility is also important. If software is designed and
implemented with enhancement in mind, the resulting software is much
more likely to be extensible with a minimum of effort. If desired
changes are in continual conflict with the initial implementation, then
it is likely to take a long time to root out inherent inconsistencies
between old code and new requirements.
-
Conformance
It is not profitable to develop software in isolation. The
sophistication of software today, and the rapid pace at which systems
and software are evolving, is such that developers must make use of
existing code if they are to keep pace. Code which is to be reusable by
diverse groups, and which is intended to promote clarity and meaningful
form, must conform to a reasonable standard. Conformance allows us to
communicate easily, and to make efficient use of each other's efforts.
Use of Standards
1) Standard Python programming syntax is used. Use of
vendor-specific language extensions is avoided.
2) Use of standard libraries or extensions are permitted and encouraged
if the extensions apply to ALL platforms and are not
vendor-specific. An example is Numeric Python.
Statement Formatting
1) All source code consists of printable ASCII characters, separated
into lines by standard control characters.
2) It is recommended that the line length not exceed 80
characters. Exceptions may be made for code clarity.
3) One level of indentation corresponds to four columns.
Continuation lines are recommended to be indented two columns, or lined
up with the function parenthesis, just as long as the code is clearly
shown.
Note: Python forces identation to distinguish control blocks. As
such, the programmer is forced to ident properly.
Spacing and Delimiters
1) Parentheses are used within complex expressions, and whenever there
is any possibility of precedence ambiguity.
2) Spaces are used within statements to decompose the statement into
logical pieces which enhance the readability of the statement.
Identifiers
1) Identifier conventions apply to identifiers defined within locally
defined software. They are not intended to imply that identifiers
within libraries supplied by vendors or third-parties should be
modified or converted.
2) Identifiers (names) consist of alphanumeric and underscore
characters. Digits are used sparingly. Identifiers are
composed of some number of word fragments. A word fragment is
some combination of characters conveying a relatively unambiguous
meaning to a reader. Such a fragment may be an English word, a
syllable, or a contraction.
3) Identifiers are chosen to clarify the ways in which entities are
used, and to relate or distinguish entities to or from each other in a
meaningful way. Suitability of a particular identifier generally
depends upon the context in which it is used.
4) Typically, variable and method
names begin with a non-capital letter and are usually built of complete
words for code readability. If a name consists of multiple words,
they can be delineated by using a capital letter at the beginning of
the new word ("displayName") OR by using an underscore between words
("phrase_connector_dict"). In general, variable names
use the capital letter scheme while methods use either the capital
letter or underscore.
5) Class variables and methods which are public, typically do not begin
with an underscore. Class variables and methods which are
"protected", typically begin with a single underscore. Class
variables and methods which are "private", typically begin with two
underscores. Global variables typically begin with an upper
case character.
Data Types and Declarations
1) Python data types are "mutable", meaning that any Python variable
may take on any data type at any time. The Python language does
not support declaring the type of a variable.
2) Use of "is" is the correct way of comparing a data value to None,
rather than using "==". For example, if x is None.
Control, Comparison, and other Statements
1) Program Flow (control) statements should be done in a readable
way. For example in C++ you must use iterators (for int
i=0;i<10;i++), while in Python you can use a similar technique using
xrange() or simply get each value in a list (e.g., for a in
list). Use the "for a in list" syntax if appropriate
for clarity.
2) Comparison of variables to None should be done using the "is None"
rather than "== None" syntax.
3) Dictionary accesses should generally be limited to one level
if in a control loop.
Comments
1) Comments are used to summarize the purpose of associated code, and
to clarify the implementation by providing information not contained in
the code itself. They present a sufficient level of detail to
make the code easy to follow without being so detailed that they dwarf
the code.
2) Comments, for a function or a block of code, are generally
delimited on each line, and are placed and indented the same way as are
statements. Short comments can occasionally be placed to
the right of a statement when they refer to only one statement.
3) Excessive commenting, such as individually commenting each of a long
series of statements, should be avoided.
Source File Organization
1) Typically a single source file contains a single Python
class. In cases where a Python classes uses an
implementation of another class and that class is only used by that
class, the definition of the "sub-class" is permitted within the single
source file.
2) Within each class, the constructor is usually first. It
is recommended that "similar" functions be kept together in the class,
such as all utility functions together, all data processing functions
together, etc.
Error Handling
1) Diagnostics are generated for the benefit of software developers and
maintenance personnel. Error handling and the display of error
messages are performed for the benefit of the end user.
2) The designer/programmer should implement sufficient error handling
in the functions as deemed appropriate by the algorithm.
Error cases need not be handled if the function clearly cannot receive
data that could cause that error condition. Error cases need be
considered for all "realistic" data values to a function.
Algorithm Efficiencies
1) Algorithms should be coded in an efficient manner for performance
reasons. Since Python passes virtually everything as a reference,
rather than by value, generally performance is fine for passing
arguments.
2) Double-nested and triple-nested loops should be avoided when
possible.
3) References to indexed lists within a loop where that indexed list
value is not changing is not efficient. Put the reference
outside of the loop.
4) Be extremely careful when using LogStream to print out data
values. Even if the "stream" is not enabled, the Python
implementation of LogStream can affect performance when very large data
sets are "printed".
Code Reuse and Structuring
1) The code/module should be modular, and algorithms should be
decomposed into common functions for readability, code size, and code
reuse.
2) There is no particular guideline for the maximum number of lines in
a module or function. However, the purpose of each module should
be clear from the set of functions in the module, and the purpose of
each function and the "readability" of each function should be
clear. Excessive number of functions or lines in a module or
function can reduce readability and maintainability.