Commit b9930b8a authored by François Ferry's avatar François Ferry
Browse files

doc(tutorials): redaction of "data-management/import" part of the museum tuto.

related: #148
parent 826328052e46
......@@ -10,10 +10,141 @@ Import data
With our application customized, let's see how to import more data.
There is several ways to import data in CubicWeb. In our tutorial, we want to import our
museums from a `csv` file.
museums from a `csv` file. This file is provided by the France's Ministry of Culture, and is
available here_.
Query data with RQL
~~~~~~~~~~~~~~~~~~~
.. _here: https://data.culture.gouv.fr/explore/dataset/liste-et-localisation-des-musees-de-france/export/
There are several ways to import data in CubicWeb; in this tutorial, we will use one of them,
the others are described here: :ref:`dataimport`.
First of all, we define in :file:`cubicweb-tuto/dataimport.py` a function which will read a file
from a `filepath` and create from read data entities, using a `CubicWeb connection`:
.. sourcecode:: python
import csv
def import_museums(cnx, filepath):
existing_cities = dict(cnx.execute("Any Z, C Where C is City, C zip_code Z"))
existing_cities_nb = len(existing_cities)
created_museum_nb = 0
with open(filepath) as fileobj:
reader = csv.DictReader(fileobj, delimiter=";")
for record in reader:
museum_name = record["NOM DU MUSEE"]
street = record["ADR"]
zip_code = record["CP"]
city_name = record["VILLE"]
try:
lat, lng = record["coordonnees_finales"].split(",")
lat_long = {
"latitude": lat,
"longitude": lng,
}
except (AttributeError, ValueError):
lat_long = {}
try:
city = existing_cities[zip_code]
except KeyError:
city = cnx.create_entity("City", name=city_name, zip_code=zip_code)
existing_cities[zip_code] = city.eid
cnx.create_entity(
"Museum",
name=museum_name,
postal_address=f"{street}, {zip_code} {city_name}",
is_in=city,
**lat_long,
)
created_museum_nb += 1
print(
"Import finished! {} existing cities, {} cities created, {} museums created.".format(
existing_cities_nb,
len(existing_cities) - existing_cities_nb,
created_museum_nb,
)
)
To be sure we don't have several time the same city, we first query CubicWeb to ask for all
existing city. To do this, we use a specific language called **RQL**. As for SPARQL, it's a
query language designed to query linked data. See :ref:`rql_intro` for more information about it.
Then, we put existing cities in a dictionary, using zip code as key (a zip code can concern
several cities, but it's not really an issue in this tutorial).
For each line of our `csv` file, we will see if we already have the city in our base (then in
our dictionary). If not, we create it. Then, we create our Museum entity with all needed arguments.
To create an entity, we use `create_entity` method of our CubicWeb connection. This method take
as first argument the type of the entity (ie: the name of the corresponding class previously
defined in :file:`cubicweb-tuto/schema.py`), and then all arguments of the entity type.
In our example, a city need a name and a zip code; and a museum need name, a postal address,
a latitude, a longitude and a city. As `is_in` is a relation, we give to the corresponding argument
the eid of the city.
.. Note::
As we have defined Museum in the schema, we have to link each instance of Museum to a City,
that's why we create city before museum, and give this city as argument of the museum.
If the city wasn't mandatory, we could add it later, using:
.. sourcecode:: python
museum_entity.cw_set(is_in=city)
Now we have our function we need to create a CubicWeb command using it. First, we create a file
:file:`cubicweb-tuto/ccplugin.py` (the name doesn't matter, but it is commonly used for all new
CubicWeb commands). Then, we write the following code:
.. sourcecode:: python
from cubicweb.cwctl import CWCTL
from cubicweb.toolsutils import Command
from cubicweb.utils import admincnx
from cubicweb_tuto.dataimport import import_museums
@CWCTL.register
class ImportMuseums(Command):
"""
Import Museums and Cities from a CSV from:
https://data.culture.gouv.fr/explore/dataset/liste-et-localisation-des-musees-de-france/export/
"""
arguments = "<instance> <csv_file>"
name = "import-museums"
min_args = max_args = 2
def run(self, args):
appid, csv_file = args[:2]
with admincnx(appid) as cnx:
import_museums(cnx, csv_file)
cnx.commit()
* ``@CWCTL.register`` allows to register the command and then make it available with
``cubicweb-ctl`` command by its ``name``.
* ``arguments`` defines which arguments take our command.
* ``name`` defines the name of the command.
* ``with admincnx(appid) as cnx`` allows to have an admin access to our instance, and then
be able to create new entities.
Thus, to execute our import command, we just have to enter in our shell (within our virtual env):
.. code-block:: console
cubicweb-ctl import-museums tuto_instance <path_to_the_csv>
After this script, we should be able to see that we have much more cities and museums by
visiting the homepage of our CubicWeb instance:
.. image:: ../../images/tutos-museum_finished_import.png
:alt: A CubicWeb instance with several cities and museums.
RDF serialisation
~~~~~~~~~~~~~~~~~
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment