Jan 5, 2019

Select for update

It is easy to overlook select for update when using Django. Currently, I am working on a project that uses Django 1.11.x and Django REST Framework 3.8.2 (upgrade to Django 2.1 is planned, but we need to move to Python 3.7.1 first) and ran into an issue (again) of not having .select_for_update() where it is required. The issue appears as if your changes to the database do not happen, but what actually happens is that two parallel requests (or other parallel tasks) overwrite each others data. This is because both requests retrieved a record from database and reconstructed a corresponding model instance via ORM each its own copy. It a some moment one of the requests have modified its instance attributes and saved changes to the database. But it did not modify them in the second copy that sits in the other request. At a later moment second request modifies its copy attributes and saves changes to the database. It may not be an issue (in some cases) if the same attributes are being modified - the database then would contain the most up to date values (probably what actually need), but if it is not the case we have an issue with older values being stored to the database.

With Django default behavior (at least for version 2.11) all model attributes are being saved to the database even if they were not modified. Imagine you have a model A with attributes b and c. Two parallel requests get its instance from the database instance = A(b='b1', c='c1'). Then request 1 changes b = 'b2' and saves instance to the database: instance.save(). A(b='b2', c='c1') will be stored to the database. Note, that although request 1 did not change attribute c it will be stored to the database anyway. At this moment request 2 still contains its own copy as A(b='b1', c='c1'). Then it changes c = 'c2' and saves changes to the database: instance.save(). A(b='b1', c='c2') will be stored to database. Again attribute b was not changed by the request, but its value is being saved to the database by default. Therefore overwrites the value saved by request 1 (b = 'b2') with an older value b = 'b1' that was retrieved before. It is a typical lost update problem (also see write-write conflict).

In development and test environments this issue rarely reproduces. This is because of very low level of concurrency in these environments. But in production environment it may very well happen. And when it does it is really hard to debug (because it is production and because it is hard to figure out what are conditions for reproduction). Therefore this kind of issues should be prevented during development. One should make habit to query objects with .select_for_update() if they are to be modified.

But developers still forget to do it (I do forget sometimes at least). There are two things that could be done here. First, if Django REST Framework is used it could use .select_for_update() for all modification operations by default or at least for PATCH and PUT (my question to the core developer). Second, Django should not save all attributes blindly, but only those that were modified (this will cover cases where different attributes are modified by parallel requests and also improve performance).

UPDATE: While we are waiting a reply from core developer here is a snippet for Django REST Framework:

from rest_framework import mixins
from rest_framework.generics import GenericAPIView
from rest_framework.viewsets import ViewSetMixin


class CustomGenericAPIView(GenericAPIView):
    def get_queryset(self):
        qs = super(CustomGenericAPIView, self).get_queryset()
        if self.request.method in ('PATCH', 'PUT'):
            qs = qs.select_for_update()

        return qs


class CustomGenericViewSet(ViewSetMixin, CustomGenericAPIView):
    pass


class CustomModelViewSet(mixins.CreateModelMixin,
                         mixins.RetrieveModelMixin,
                         mixins.UpdateModelMixin,
                         mixins.DestroyModelMixin,
                         mixins.ListModelMixin,
                         CustomGenericViewSet):
    pass

Apr 22, 2017

Refactor me

This repository represents a step by step refactoring of a dirty code given me as a test task to estimate my coding skills. The only remark about the code was: "refactor_me.py is expected to contain Python 3.5.x code" (actually file naming was not provided in the task).

I did it in a way that every commit contains one particular change described in the commit message. The original dirty code can be found in this commit: 1036c091cb70ef110b4e56702bdc012c8a110336

Remarks on final result:

100 character line length limit is used on purpose

Please, do not hesitate to submit pull requests for improvements if you feel that I missed something.

Apr 20, 2017

Yegor's Bugayenko blog review

Recently I have read the entire (honestly, not yet) Yegor's Bugayenko blog and found these posts interesting and recommendable for reading:

I should tell that I disagree with Yegor a lot about software development (may be because Yegor is a Java developer and I am a Python developer), but I do agree with Yegor a lot about software development management and people management.

Apr 16, 2017

My repo stats

My prospective clients and employers often ask me to show some code before agreeing to work with me. Unfortunately (or fortunately), most of the code I have wrote has been written professionally therefore the code is covered by NDA and closed sourced. I can not disclose the code, but it is OK to publish some stats to shed light on what I have done in the past as a software developer.

Lamoda / Senior Developer

Repository	Frameworks	Period	Total lines	My lines at the latest commit	My contribution	My lines + / -	Python lines
apilib	SQLAlchemy	Apr, 2013 - May, 2015	40,695	18,738	46%	+32,063 / -12,891	30,002 (74%)
apigateway	Spyne	Apr, 2013 - May, 2015	5,768	2,725	47%	+6,464 / -4,830	4,631 (80%)

Saprun / Team Leader / Architect

Repository	Frameworks	Period	Total lines	My lines at the latest commit	My contribution	My lines + / -	Python lines
NDA_core	Django, Celery	Jun, 2015 - Aug, 2016	43,972	7,223	16%	+26,497 / -31,412	36,819 (84%)
NDA_communication	Autobahn / Crossbar	Aug, 2015 - Jul, 2016	1,657	589	36%	-	779 (47%)

Diamondmine / Senior Developer

Repository	Frameworks	Period	Total lines	My lines at the latest commit	My contribution	My lines + / -	Python lines
diamondmine-server	Django	Jul, 2016 - Oct, 2016	5,393	5,204	96%	+6,284 / -775	1,801 (33%)
diamondmine-processor	Celery	Jul, 2016 - Sep, 2016	1,689	1,689	100%	+2,501 / -812	1,367 (81%)

Semilimes / Senior Developer

Repository	Frameworks	Period	Total lines	My lines at the latest commit	My contribution	My lines + / -	Python lines
NDA	Flask	Nov, 2016 - Dec, 2016	21,466	2,595	12%	+4,589 / -4029	11,506 (54%)

Trounceflow / Team Leader

Repository	Frameworks	Period	Total lines	My lines at the latest commit	My contribution	My lines + / -	Python lines
website	Django	Dec, 2016 - Mar, 2017	22,212	2,797	13%	+13,195 / -11,123	19,035 (86%)

Acura Capital / Senior Developer

Repository	Frameworks	Period	Total lines	My lines at the latest commit	My contribution	My lines + / -	Python lines
NDA-poller	gevent	Jan, 2017 - Apr, 2017	6,012	6,012	100%	+18,441 / -12429	5,381 (90%)
NDA-bidder	gevent	Mar, 2017 - Apr, 2017	1,730	1,730	100%	+2,366 / -636	1,272 (74%)
NDA-common	-	Mar, 2017 - Apr, 2017	1,991	1,991	100%	+2,280 / -289	1,737 (87%)

Open source projects

Repository	Frameworks	Period	Total lines	My lines at the latest commit	My contribution	My lines + / -	Python lines
pascal_triangle	-	Apr, 2015 - Apr, 2017	2,840	2,840	100%	+10,546 / -7,706	1,239 (44%)
dmu-utils	-	Mar, 2017 - Apr, 2017	372	372	100%	+379 / -7	261 (70%)

Mar 25, 2017

HackerRank stats

My current HackerRank stats:

	Contest (World)	Contest (Russia)	Practice (World)	Practice (Russia)
Top	10%	30%	10%	-
Percentile	91.47	71.08	92	-
Rank	10 247 (out of 120 108)	504 (out of 1 743)	3 049 (out of 757 851)	82 (out of 7 748)

Jan 7, 2017

Definition of Done

This is a sample definition of done that I use in mature development process environments.

Task is considered done if all of the below conditions are met:

Source code that corresponds to the task description has been developed
Unit tests for the task that test the source code are have been developed
All unit tests (including developed for the task) pass successfully
New features are covered by integration test
Integration test passes successfully
Changes that describe installation and migration process related to the task are provided in corresponding documentation or deployment scripts
Change log is updated
Pull request has passed code review (all comments are either covered by fixes or somehow resolved with responsible reviewer) and merged to upstream repository
Notes for QA people are added to the task
Task deployment provided for test environment
The task has passed manual testing successfully: discovered bugs are either fixed or brought out to separate issues for later fixing
The task has been deployed to live, it does not expose bugs and shows correct behavior in live

Dec 15, 2016

Strict dependencies

UPDATE 2019-02-21: It seems that pipenv solves the problem completely.

Use strict version dependencies to prevent unexpected upgrades. Apply the same rule to the dependencies of your dependencies recursively. Example: dependency-package-name==x.y.z

What happens when you do not follow the above rule? Let us see a development cycle on a time line.

Bob is a developer who does not follow the rule above. When he needs to add a new dependecy he just adds it to setup.py or requirements.txt without specifying a version. Then he uses pip install -e . or pip install -r requirements.txt to install the dependecy. Since an exact version of a dependecy is not specified pip installs the latest available version of the dependecy at the moment. Everything work fine and Bob happily continues a development and uses the dependecy in his code. The dependency will not be upgraded when new version is release, because pip does not upgrade by default. It checks for the presence of the dependency only, not its version if a version is not specified.

Time passes, say a month or two or a year, and at some point Alice joins the team to help Bob with the development. Alice sets up her development enviroment and runs pip install -e . or pip install -r requirements.txt to install all dependecies. Since exact versions of dependecies are not specified pip installs the latest available versions of the dependecies at the moment of installation. Modern development cycles are short and releases are frequent, so it is very luckily that Alice gets newer versions that Bob has.

First consequence of the events is that Bob and Alice start developing in different environments which is bad, because they may experience different behavior of the same dependecies but of different versions. Something that works for Alice would not work for Bob and vice verse. It leads to ineffective loss of time investigating this "magical" behavior.

Another consequence is that a newer version of a depency may become backward incompatible intentionally, by error or of improper usage. I had several such cases in my experience. In this case the program will fail with an exception or run with a logical error which is worse. Alice will still need to contact Bob and ask him to pip freeze to learn what is a working version of a dependcy to install it exactly.

The worst case is then a working combination of dependencies is lost. For example Bob has left the company before Alice joined the company. In this case Alice will need to downgrade a dependecy or a combination of dependecies version by version to find a working version.

The same happens when a production environment should be deployed. What versions of dependencies should installed if developers set up their environments a half year ago? Which developer's environment represents a master set of dependency versions?

One more thing that happens when dependecies are not strict. It prevents managed dependecies upgrades. Upgrades happen randomly along with setting up new environments which may lead to unplanned work because of backward incompatibilities or need for upgrading own code along with dependecies.

My other Python development practices