Software Engineering and Programming Fundamentals

Our modern digital creations are complex: a key job of programmers is to express that complexity as simply as possible.

"The best programs are written so that computing machines can perform them quickly and so that human beings can understand them clearly. A programmer is ideally an essayist who works with traditional aesthetic and literary forms as well as mathematical concepts, to communicate the way that an algorithm works and to convince a reader that the results will be correct." -- Donald Knuth

Lesson 1: What is software engineering?

Do we have to apply software engineering practices to every project?
No, we do not?
On a small program you are writing for your own convenience, or just for fun, you can pretty much ignore SE.
When should we apply SE?
As projects increase along three dimensions:

  1. The number of people involved.
  2. The amount of code involved.
  3. The amount of time we expect the program to be in use.

Why? Well, as the number of people increases, we need to make our code easier to understand, to bring people up to speed quickly.
As the amount of code increases, we must apply SE to keep it comprehensible, even for a single coder.
As the expected life of the program increases, we must make the program easier to change.

Modularity: Our Main Tool

The number one friend of the software engineer is modularity. Good software is composed of components, or modules, that are as independent as possible. They interact with each other through narrow interfaces.

Consider the spark plug.

The sparkplug is labeled 'S'

The spark plug is a modular component in your car. When you need new sparks, you just plug them in, and off you go. Even if there is a new, higher grade spark plug available, as long as it plugs into its interface to the rest of the system with no trouble, it will work.

Imagine your surprise if, when you went to replace the sparks in your car, you were told by the garage that you'd also have to replace the brakes, the radio, and the windshield wipers!
"Why," you'd ask, "are they faulty?"
"Nope: they just aren't compatible with the new spark plugs."
Badly engineered software is like that: to change one part, you often need to change many others. This makes each change expensive, and likely to introduce bugs, since it is hard to keep track of all the parts a change might affect.

Focus on Interfaces!

Again, let's think about a car. This time, we will look at the gas pedal.

The gas pedal is the one on the right.

The gas pedal is an outstanding example of a narrow interface. The driver (the 'user'!) interfaces with the system that accelerates the car with only two inputs:
* Push harder, go faster!
* Ease up, slow down!

The great thing about narrow interfaces is that the engineers can change almost everything about a component, without the users of the component noticing, so long as they don't change the interface. Say, one night, a team of dedicated eco-activists break into your garage and replace your gasoline engine with an electric one. So long as the "gas" pedal still works the same way, you might not even notice, at least until you tried to refill your gas tank.

Early musical synthesizers were examples of an engineered product that did not have a narrow interface:

Keith Emerson playing an early Moog synthesizer.

By contract, in modern synths, the player typically just presses a few buttons to get a new sound.

One reason we are going to focus on building an API server is that doing so puts our focus squarely on interfaces. We will put our early design efforts into thinking about what interface we should provide to the user of our API server.

Lesson 2: Some programming fundamentals

Humans have been programming computers for seven or so decades now. Let us look at some findings on how to write the best software possible!

  • DRY
    This stands for Don't Repeat Yourself! It means that any part of your system that might ever need to change should have a single place where you can make the change. Don't copy blocks of code to wherever you need them in your program: write a function and call it from each of those places. Don't define your data tables in your database, and also in your code: find a way (like the Django models.py file) to define your data one place and use that definition to generate both the database and the code that uses the DB.
  • No magic constants.
    This is a special case of DRY. It is very tempting, when coding your NYU scheduling app, to write code assuming there are two (major) semesters per year. This will be fine... until NYU adopts a tri-mester system. Instead, define a constant NUM_SEMS = 2. You might get away with writing day_of_week = day mod 7, since that number probably will never change. But you really ought to write hour_of_day = hour mod CLOCK_PERIOD, since both 12 and 24 hour timekeeping methods exist.
  • Make functions do one job.
    Funcitons that perform a single job are simpler to understand, easier to change or eliminate, and render the overall system more comprehensible. For instance, if the county writes a tax program with a function called calc_taxes, it would be natural to eliminate that function if the job is later passed off to a microservice running on the cloud. But, if the coders also happened to include the code to clear tax liens (county claims against the property for unpaid taxes) in the same function... Oops! No one who ever had a tax lien can sell their property, because the lien never gets cleared.
  • Keep functions short.
    This is related to the previous principle, but focuses on the size of the one job that should be done. A function named handle_yearly_taxes() is doing one job, but probably way to big a job. It would make more sense to have create_tax_roll(), calculate_taxes(), send_bills(), record_payments(), and perhaps more.
  • Format and indent properly.
    Different languages have different conventions for how to name variables (camelCase, with_underscores, MixedCase, and so on), how to space operators, where to put braces, and so on. You should follow those conventions, unless there is a strong reason not to. Consistent indentation is especially important: it allows a reader of your code to easily line up blocks of control. Irregular indentation is a significant source of bugs, as people modifying the code will make mistakes, for example, about which else goes with which if.
  • Comment judiciously.
    Code should contain some comments, especially things like docstrings for classes that can be extracted to produce a guide to the system, and comments explaining what particularly tricky or unusual bits of code do. But commenting is no substitute for writing clear, readable code in the first place! The best explanation of what your code does is, if you write it correctly, your code itself. Remember that we could, and once did, write code just as a sequence of 1s and 0s. And all higher-level languages need to be translated into such code in the end. So why bother with C, Java, or Python? These languages exist for humans, not for computers: they make it easier for us to understand and reason about what a program will do. The upshot: you should look at your code as being every bit as much about communicating to humans as about directing a computer.
  • Go for the golden mean in naming.
    Sometimes, names of functions and variables can be way too cryptic: there are examples in the widely used CLRS Algorithm book where I have found as many as six single-letter variable names used at once. On the other hand, naming a function something like take_input_of_employee_w2_and_calculate_employee_tax_rate() is absurdly long: please remember, other programmers will have to type your function names in order to call your functions! Such immense names also make it extremely difficult to stay within guidelines like PEP 8's dictum of "no lines longer than 79 characters." A more reasonable middle ground might be something like calc_tax_rate(), where an employee's W2 might be a parameter for the function.
  • Break your code up into modules that handle one aspect of the program... your accounting program might have modules for taxes, payroll, invoices, bills, and bank accounts (perhaps).
  • Keep interfaces between modules narrow (as little data has to pass between them as possible) and clearly defined. Change these interfaces as little as possible.
  • Test, test, test!
    Test small pieces of code as you go along. Write an automated test to go with every program or new feature you write. Test as completely by hand as you can: don't just test that your code fetches the data from the DB correctly: test that it still works properly if there is no data in the DB, or, indeed, if there is no DB! ("Properly" here could mean "Display an informative error message instead of crashing.")
Lesson 3: Python coding standards

For this lesson, please read the Python coding standard, PEP 8. It is a very good example of what a coding standard is like, and most of the guidelines can be applied in other languages.

Lesson 4: Language styles: important or not?
Dave Farley on styles of programming languages.
Other Readings
Quiz

    The most important reason to use consistent indentation is...?

    1. it makes it easy for readers to see the control blocks
    2. to pass the tests of code-style tools
    3. it looks prettier that way
    4. all of the above

    "No magic constants" means you should not...?

    1. scatter constants like 7, 12, or 3.14 through your program
    2. define constants to have values like "Harry Potter"
    3. use integer values in your code
    4. use an irrational number in your code

    DRY means your code should ...?

    1. not be all wet
    2. Delete Random Years
    3. Dally Really Youthfully
    4. not repeat itself

    Modularizations include design decisions which must be made...?

    1. after the workable system has been built
    2. before the work on independent modules begins
    3. any old time
    4. dependent on the module in question

    The main consequence of using the "go to" statements is...?

    1. they create too much repetition within our program
    2. it is hard to implement them on a modern day equipment
    3. they prevent us from effectively using recursive procedures within our program
    4. it becomes hard to find meaningful set of coordinates in which to describe the progress of a process

    The best way to make your code understandable is...?

    1. to use as many comments as possible
    2. to use the longest variable names possible
    3. to write clear code
    4. none of the above

    Comments should be...?

    1. employed only when necessary
    2. scattered through the code as widely as possible
    3. used for every line of code
    4. never used at all

    According to PEP 8, if we need a line break near a binary operator, it should come...?

    1. after the operator
    2. before the operator
    3. wherever looks good
    4. nowhere: rewrite the code so you don't need it

    In the second modular decomposition described in the paper, the criterion used was...?

    1. to increase dependency between different modules compared to the first decomposition
    2. to have as much processing done in the Alphabetizer module
    3. information hiding
    4. flowchart

    When it comes to naming variables, we should...?

    1. always use camel case
    2. always employ underscores
    3. keep everything in ALL Caps
    4. use the convention most widespread in our coding language