Migrating Relational Databases Using Pandas and Django
Migrating data between systems is a mess, especialy when comfronted zith
I tested this on a system running Python 3.6, pandas the Django 2.0 ORM and PostgreSQL 10. But most problems and solutions will also apply to your favorite programming language, object-relational mapping framework and relational database.
Potential issues
Primary key collisions
Primary keys need to be unique, this is usually not a problem within the same dataset. This becomes a problem when two datasets are merged with an incrementing integer primary key such as the PostgreSQL serial data type. Distributed systems solve this problem by using a universally unique identifier (UUID) where the chance of collisions is negligible. UUIDs however have a small storage and performance impact.
Primary key sequences
If primairy keys are forced by specifying them by hand in Django's models, the will not gewoon dat autoincrement moet gereset worden op de hoogte pk beschikbaar
performance, bulk create, n+1 problem ORM
intermediate fields/tables
Django example desciption / dataset
add constraints
pandas parsing power
drop duplicates on the same constraints fields
optizing using bulk create
performace comparison vs loops
conclusion
never fun
Comments
Comments powered by Disqus