Jan 5, 2019

Select for update

It is easy to overlook select for update when using Django. Currently, I am working on a project that uses Django 1.11.x and Django REST Framework 3.8.2 (upgrade to Django 2.1 is planned, but we need to move to Python 3.7.1 first) and ran into an issue (again) of not having .select_for_update() where it is required. The issue appears as if your changes to the database do not happen, but what actually happens is that two parallel requests (or other parallel tasks) overwrite each others data. This is because both requests retrieved a record from database and reconstructed a corresponding model instance via ORM each its own copy. It a some moment one of the requests have modified its instance attributes and saved changes to the database. But it did not modify them in the second copy that sits in the other request. At a later moment second request modifies its copy attributes and saves changes to the database. It may not be an issue (in some cases) if the same attributes are being modified - the database then would contain the most up to date values (probably what actually need), but if it is not the case we have an issue with older values being stored to the database.

With Django default behavior (at least for version 2.11) all model attributes are being saved to the database even if they were not modified. Imagine you have a model A with attributes b and c. Two parallel requests get its instance from the database instance = A(b='b1', c='c1'). Then request 1 changes b = 'b2' and saves instance to the database: instance.save(). A(b='b2', c='c1') will be stored to the database. Note, that although request 1 did not change attribute c it will be stored to the database anyway. At this moment request 2 still contains its own copy as A(b='b1', c='c1'). Then it changes c = 'c2' and saves changes to the database: instance.save(). A(b='b1', c='c2') will be stored to database. Again attribute b was not changed by the request, but its value is being saved to the database by default. Therefore overwrites the value saved by request 1 (b = 'b2') with an older value b = 'b1' that was retrieved before. It is a typical lost update problem (also see write-write conflict).

In development and test environments this issue rarely reproduces. This is because of very low level of concurrency in these environments. But in production environment it may very well happen. And when it does it is really hard to debug (because it is production and because it is hard to figure out what are conditions for reproduction). Therefore this kind of issues should be prevented during development. One should make habit to query objects with .select_for_update() if they are to be modified.

But developers still forget to do it (I do forget sometimes at least). There are two things that could be done here. First, if Django REST Framework is used it could use .select_for_update() for all modification operations by default or at least for PATCH and PUT (my question to the core developer). Second, Django should not save all attributes blindly, but only those that were modified (this will cover cases where different attributes are modified by parallel requests and also improve performance).

UPDATE: While we are waiting a reply from core developer here is a snippet for Django REST Framework:
from rest_framework import mixins
from rest_framework.generics import GenericAPIView
from rest_framework.viewsets import ViewSetMixin


class CustomGenericAPIView(GenericAPIView):
    def get_queryset(self):
        qs = super(CustomGenericAPIView, self).get_queryset()
        if self.request.method in ('PATCH', 'PUT'):
            qs = qs.select_for_update()

        return qs


class CustomGenericViewSet(ViewSetMixin, CustomGenericAPIView):
    pass


class CustomModelViewSet(mixins.CreateModelMixin,
                         mixins.RetrieveModelMixin,
                         mixins.UpdateModelMixin,
                         mixins.DestroyModelMixin,
                         mixins.ListModelMixin,
                         CustomGenericViewSet):
    pass


Apr 22, 2017

Refactor me

This repository represents a step by step refactoring of a dirty code given me as a test task to estimate my coding skills. The only remark about the code was: "refactor_me.py is expected to contain Python 3.5.x code" (actually file naming was not provided in the task).

I did it in a way that every commit contains one particular change described in the commit message. The original dirty code can be found in this commit: 1036c091cb70ef110b4e56702bdc012c8a110336

Remarks on final result:

  • 100 character line length limit is used on purpose


Please, do not hesitate to submit pull requests for improvements if you feel that I missed something.

Apr 16, 2017

My repo stats

My prospective clients and employers often ask me to show some code before agreeing to work with me. Unfortunately (or fortunately), most of the code I have wrote has been written professionally therefore the code is covered by NDA and closed sourced. I can not disclose the code, but it is OK to publish some stats to shed light on what I have done in the past as a software developer.

Lamoda / Senior Developer

Repository Frameworks Period Total lines My lines at the latest commit My contribution My lines + / - Python lines
apilib SQLAlchemy Apr, 2013 - May, 2015 40,695 18,738 46% +32,063 / -12,891 30,002 (74%)
apigateway Spyne Apr, 2013 - May, 2015 5,768 2,725 47% +6,464 / -4,830 4,631 (80%)

Saprun / Team Leader / Architect

Repository Frameworks Period Total lines My lines at the latest commit My contribution My lines + / - Python lines
NDA_core Django, Celery Jun, 2015 - Aug, 2016 43,972 7,223 16% +26,497 / -31,412 36,819 (84%)
NDA_communication Autobahn / Crossbar Aug, 2015 - Jul, 2016 1,657 589 36% - 779 (47%)

Diamondmine / Senior Developer

Repository Frameworks Period Total lines My lines at the latest commit My contribution My lines + / - Python lines
diamondmine-server Django Jul, 2016 - Oct, 2016 5,393 5,204 96% +6,284 / -775 1,801 (33%)
diamondmine-processor Celery Jul, 2016 - Sep, 2016 1,689 1,689 100% +2,501 / -812 1,367 (81%)

Semilimes / Senior Developer

Repository Frameworks Period Total lines My lines at the latest commit My contribution My lines + / - Python lines
NDA Flask Nov, 2016 - Dec, 2016 21,466 2,595 12% +4,589 / -4029 11,506 (54%)

Trounceflow / Team Leader

Repository Frameworks Period Total lines My lines at the latest commit My contribution My lines + / - Python lines
website Django Dec, 2016 - Mar, 2017 22,212 2,797 13% +13,195 / -11,123 19,035 (86%)

Acura Capital / Senior Developer

Repository Frameworks Period Total lines My lines at the latest commit My contribution My lines + / - Python lines
NDA-poller gevent Jan, 2017 - Apr, 2017 6,012 6,012 100% +18,441 / -12429 5,381 (90%)
NDA-bidder gevent Mar, 2017 - Apr, 2017 1,730 1,730 100% +2,366 / -636 1,272 (74%)
NDA-common - Mar, 2017 - Apr, 2017 1,991 1,991 100% +2,280 / -289 1,737 (87%)

Open source projects

Repository Frameworks Period Total lines My lines at the latest commit My contribution My lines + / - Python lines
pascal_triangle - Apr, 2015 - Apr, 2017 2,840 2,840 100% +10,546 / -7,706 1,239 (44%)
dmu-utils - Mar, 2017 - Apr, 2017 372 372 100% +379 / -7 261 (70%)

Mar 25, 2017

HackerRank stats

My current HackerRank stats:
Contest (World) Contest (Russia) Practice (World) Practice (Russia)
Top 10% 30% 10% -
Percentile 91.47 71.08 92 -
Rank 10 247 (out of 120 108) 504 (out of 1 743) 3 049 (out of 757 851) 82 (out of 7 748)

Jan 7, 2017

Definition of Done

This is a sample definition of done that I use in mature development process environments.

Task is considered done if all of the below conditions are met:

  1. Source code that corresponds to the task description has been developed
  2. Unit tests for the task that test the source code are have been developed
  3. All unit tests (including developed for the task) pass successfully
  4. New features are covered by integration test
  5. Integration test passes successfully
  6. Changes that describe installation and migration process related to the task are provided in corresponding documentation or deployment scripts
  7. Change log is updated
  8. Pull request has passed code review (all comments are either covered by fixes or somehow resolved with responsible reviewer) and merged to upstream repository
  9. Notes for QA people are added to the task
  10. Task deployment provided for test environment
  11. The task has passed manual testing successfully: discovered bugs are either fixed or brought out to separate issues for later fixing
  12. The task has been deployed to live, it does not expose bugs and shows correct behavior in live


Dec 15, 2016

Strict dependencies

UPDATE 2019-02-21: It seems that pipenv solves the problem completely.

Use strict version dependencies to prevent unexpected upgrades. Apply the same rule to the dependencies of your dependencies recursively. Example: dependency-package-name==x.y.z

What happens when you do not follow the above rule? Let us see a development cycle on a time line.

Bob is a developer who does not follow the rule above. When he needs to add a new dependecy he just adds it to setup.py or requirements.txt without specifying a version. Then he uses pip install -e . or pip install -r requirements.txt to install the dependecy. Since an exact version of a dependecy is not specified pip installs the latest available version of the dependecy at the moment. Everything work fine and Bob happily continues a development and uses the dependecy in his code. The dependency will not be upgraded when new version is release, because pip does not upgrade by default. It checks for the presence of the dependency only, not its version if a version is not specified.

Time passes, say a month or two or a year, and at some point Alice joins the team to help Bob with the development. Alice sets up her development enviroment and runs pip install -e . or pip install -r requirements.txt to install all dependecies. Since exact versions of dependecies are not specified pip installs the latest available versions of the dependecies at the moment of installation. Modern development cycles are short and releases are frequent, so it is very luckily that Alice gets newer versions that Bob has.

First consequence of the events is that Bob and Alice start developing in different environments which is bad, because they may experience different behavior of the same dependecies but of different versions. Something that works for Alice would not work for Bob and vice verse. It leads to ineffective loss of time investigating this "magical" behavior.

Another consequence is that a newer version of a depency may become backward incompatible intentionally, by error or of improper usage. I had several such cases in my experience. In this case the program will fail with an exception or run with a logical error which is worse. Alice will still need to contact Bob and ask him to pip freeze to learn what is a working version of a dependcy to install it exactly.

The worst case is then a working combination of dependencies is lost. For example Bob has left the company before Alice joined the company. In this case Alice will need to downgrade a dependecy or a combination of dependecies version by version to find a working version.

The same happens when a production environment should be deployed. What versions of dependencies should installed if developers set up their environments a half year ago? Which developer's environment represents a master set of dependency versions?

One more thing that happens when dependecies are not strict. It prevents managed dependecies upgrades. Upgrades happen randomly along with setting up new environments which may lead to unplanned work because of backward incompatibilities or need for upgrading own code along with dependecies.

My other Python development practices