Efficiently Create Multiple Related Rows in DBIx::Class
This article will talk about something seemingly simple and boring -
inserting new rows into your database. If you frequent
irc.perl.org#dbix-class
you probably have heard of the mysterious
MultiCreate, yet you are not entirely sure if it can bring you fame and money.
Are you watching closely?
The Pledge
Let's examine the dreaded Artist/CDs example:
package MySchema::Artist; use base 'DBIx::Class::Core'; __PACKAGE__->table ('artist'); __PACKAGE__->add_columns ( id => { data_type => 'int', is_auto_increment => 1, }, name => { data_type => 'varchar', }, ); __PACKAGE__->set_primary_key ('id'); __PACKAGE__->has_many (cds => 'MySchema::CD', 'artist_id');
and
package MySchema::CD; use base 'DBIx::Class::Core'; __PACKAGE__->table ('cd'); __PACKAGE__->add_columns ( id => { data_type => 'int', is_auto_increment => 1, }, title => { data_type => 'varchar', }, artist_id => { data_type => 'int', }, ); __PACKAGE__->set_primary_key ('id'); __PACKAGE__->belongs_to (artist => 'MySchema::Artist', 'artist_id');
The Turn
Say some XML describing a modest CD collection needs to be parsed and stuffed in the above tables, while preserving relational integrity. A seasoned programmer will sigh, drink some coffee, and write the following code:
for my $chunk (@data) { my $artist = $schema->resultset ('Artist')->create ({ name => $chunk->artist_name }); for my $cd_title (@{$chunk->{cds}}) { $schema->resultset ('CD')->create ({ title => $cd_title, artist_id => $artist->id, }); } }
After the coffee kicks in, perhaps the programmer will remember that he already has declared relationships between Artist and CD. So he will reduce the second inner loop to:
... for my $cd_title (@{$chunk->{cds}}) { $artist->create_related ('cds', { title => $cd_title }); } ...
This is better, but it suffers from a number of deficiencies:
-
If the XML contained Tracks of individual CDs we would need a second nested loop, which may result in another loop, and so on.
-
The entire operation probably needs to be wrapped in a transaction - that's yet more boilerplate to cargo cult from code you've already written elsewhere.
-
It just doesn't feel like DRY
So the depressed developer sets out to write even more boring dull code.
The Prestige
An enlightened coworker arrives, looks at the code and changes it to:
for my $chunk (@data) { my $artist_with_cds = $schema->resultset ('Artist')->create ($chunk); }
Then looks closer at @data, makes sure it doesn't contain anything DBIx::Class wouldn't know how to handle and changes the entire loop to just:
$schema->resultset ('Artist')->populate (\@data);
The Trick Revealed
Just like being able to turn ResultSets into nested structures, DBIx::Class can turn such structures back to related rows. All you have to do is augment the hashref you normally pass to create in DBIx::Class::ResultSet with a structure representing the related row(s), keyed off the relationship name. The structure will be either another hashref (for single-type relationships, i.e. belongs_to, has_one, might_have) or an arrayref of hashrefs for has_many (relationship of type multi).
When the enlightened coworker did ->create ($chunk)
he effectively
supplied:
{ name => 'somename', cds => [ { title => 'cd1', }, { title => 'cd2', }, ... ], }
which was properly taken apart and inserted in the appropriate tables using the appropriate foreign keys as one would expect. In addition, since create() is a single call, DBIx::Class takes care of making it atomic by dutifully wrapping the entire operation in a transaction.
The same logic applies to populate in DBIx::Class::ResultSet which happily accepts an arrayref of hashrefs as the creation argument. In fact, you can bootstrap a poor man's replication by simply dumping a ResultSet via DBIx::Class::ResultClass::HashRefInflator, and feeding the result straight back to create(), using a ResultSet derived from the target Schema.
But does it scale?
The example above is rather simple, but what if we have a complex chain of relationships? For example:
Artist->has_many ('cds', ...) CD->has_many ('cd2producer', ...) CD2Producer->belongs_to ('producer', ...) ...and so on
Won't DBIx::Class attempt to create the same producer several times? Fortunately, no: any creation attempt over a belongs_to relationship will be executed via a find_or_create in DBIx::Class::ResultSet, thus allowing everything to Just Work. For the brave there is an example pushing all conceivable limits.
But there must be *some* caveats...?
-
Currently this works for create() only. While find_or_create() will be executed on belongs_to relationships, the final goal is nothing more than new row creation. This means, for example, that passing a nested structure to update_or_create in DBIx::Class::ResultSet will not do anything close to what you would expect.
-
The returned object does not always have the proper related objects inserted at the expected slots (similar to the effect of prefetch). We need a good amount of tests before we attempt to make this work as expected - patches welcome!
-
The find_or_create() use mentioned above can result in some strange behavior. Consider:
my $cd_data = { artist_id => $some_artist->id, title => 'my cd', genre => { name => 'vague genre', cds => [ { title => 'oddball cd', artist_id => $some_other_artist->id, }, ], } };
If the vague genre genre already exists, DBIx::Class will not descend to check if the oddball cd is in fact created. This has not yet caused anyone grief, thus there is no motivation for the (non-trivial) fix.
What The Future Holds
While the new row creation is mostly done and over with (in fact it delayed the release of the 0.081xx series by several months), there are more ways to expand the existing codebase.
-
Port DBIx::Class::ResultSet::RecursiveUpdate to leverage the (substantial) infrastructure behind MultiCreate. While some conceptual differences exist between both modules, they can certainly be resolved during the test-writing phase.
-
Merge create and update into the logical multi_update_or_create !
If you have tuits or simply ideas - you are always welcome to hang out on IRC, and start a discussion (or snatch a commit-bit and start hacking).
Happy search()ing!
Author
Peter Rabbitson <ribasushi@cpan.org>