Migrating Relational Databases Using Pandas and Django

Migrating data between systems is a mess, especialy when comfronted zith

I tested this on a system running Python 3.6, pandas the Django 2.0 ORM and PostgreSQL 10. But most problems and solutions will also apply to your favorite programming language, object-relational mapping framework and relational database.

Potential issues

Primary key collisions

Primary keys need to be unique, this is usually not a problem within the same dataset. This becomes a problem when two datasets are merged with an incrementing integer primary key such as the PostgreSQL serial data type. Distributed systems solve this problem by using a universally unique identifier (UUID) where the chance of collisions is negligible. UUIDs however have a small storage and performance impact.

Primary key sequences

If primairy keys are forced by specifying them by hand in Django's models, the will not gewoon dat autoincrement moet gereset worden op de hoogte pk beschikbaar

performance, bulk create, n+1 problem ORM

intermediate fields/tables

Django example desciption / dataset

add constraints

pandas parsing power

drop duplicates on the same constraints fields

optizing using bulk create

performace comparison vs loops

conclusion

never fun

Comments

Comments powered by Disqus