Infrastructure at your Service

In my previous post, I talked about the Lifecycle of Alfresco Nodes. You may have noticed that I tried to insert in my explanations some elements that are specific to databases (tables, fields, aso…). These elements are quite essential to prepare a post like this one: more database oriented. I already explained what exactly are the consequences on the database side when a node is removed and I will try in this post to share some useful queries regarding these points but not only!

For this post, I used my local Alfresco Community 4.2.c installation with a PostgreSQL database. For your information, it just take 30 minutes to get this test environment ready with the Alfresco’s installer (Windows, Mac or Unix – Not for Production use!). Of course, use the Database only for your daily administration work is certainly not the best idea but in some cases, it can really be faster and easier to just run some SQL commands at the DB level…

 

I. Document information

So let start this post with some generic queries that can be used to retrieve some information about documents. In this part, all columns of the results will be the same because I just pick up the same fields in my queries but the filter part (the WHERE clause) changes a little bit to be able to retrieve some information from different elements.

The first command I would like to show you is how to retrieve some information about documents based on the size of the content. Here, I just uploaded the document “Test_Lifecycle.docx” with a size of 52MB. So based on that, let’s say that I want to retrieve all elements on my Alfresco installation with a content that is bigger than 40MB. In the same approach, you can select all elements with a content that is smaller than or between XX and YYMB. The conversion in MB is done using the round() function. Therefore, if you want this value to be in KB instead, just remove one division by 1024 in each round() function:

  • All documents bigger than 40MB (across all stores so you can have different versions, aso… A filter can be used on the store if needed)
SELECT n.id AS "Node ID",
  n.store_id AS "Store ID",
  concat(s.protocol, '://', s.identifier) AS "Store Name",
  round(u.content_size/1024/1024,2) AS "Size (MB)",
  n.uuid AS "Document ID (UUID)",
  n.audit_creator AS "Creator",
  n.audit_created AS "Creation Date",
  n.audit_modifier AS "Modifier",
  n.audit_modified AS "Modification Date",
  p1.string_value AS "Document Name",
  u.content_url AS "Location"
FROM alf_node AS n,
  alf_node_properties AS p,
  alf_node_properties AS p1,
  alf_namespace AS ns,
  alf_qname AS q,
  alf_content_data AS d,
  alf_content_url AS u,
  alf_store AS s
WHERE n.id=p.node_id
  AND ns.id=q.ns_id
  AND p.qname_id=q.id
  AND p.long_value=d.id
  AND d.content_url_id=u.id
  AND p1.node_id=n.id
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND q.local_name='content'
  AND n.store_id=s.id
  AND round(u.content_size/1024/1024,2)>40
ORDER BY u.content_size DESC;

 

I will just put it once but here is the result of this command in my case:

Node ID | Store ID |       Store Name        | Size (MB) |          Document ID (UUID)          | Creator |         Creation Date         | Modifier |       Modification Date       |    Document Name    |                            Location                            
--------+----------+-------------------------+-----------+--------------------------------------+---------+-------------------------------+----------+-------------------------------+---------------------+----------------------------------------------------------------
 131856 |        6 | workspace://SpacesStore |     52.00 | eb267742-c018-4ba5-8ca4-75ca23c860f0 | Morgan  | 2015-04-30T12:05:50.613+02:00 | Morgan   | 2015-04-30T12:05:50.613+02:00 | Test_Lifecycle.docx | store://2015/4/30/12/5/0e111f05-7fcf-4a44-b719-b94cd04dd5ab.bin

 

So why did I selected these fields?!

  • Node ID: can be useful to join different tables
  • Store ID: a Store ID of 6 means that your document is in its active life (in my case, check the query below on alf_store to see the different stores). Still in my case, a Store ID of 5 means that this document has been deleted by a user and is now in the global trashcan
  • Store Name: I added the display of the store so it is easier to know what the Store ID corresponds to. It’s formatted as “PROTOCOL://IDENTIFIER”
  • Size (MB): what we are searching for…
  • Document ID (UUID): the unique identifier of this document. The simplest way to preview this document is just to open the following url in any browser: http://HOSTNAME:PORT/share/page/document-details?nodeRef=PROTOCOL://IDENTIFIER/eb267742-c018-4ba5-8ca4-75ca23c860f0 (workspace://SpacesStore for store_id=6)
  • Creator, Modifier, Dates: well…
  • Document Name: can be useful to know the type of document without opening an URL (file extension)
  • Location: the actual location of the content’s file on the File System. The “store://” refers to $ALF_DATA/contentstore/

The second command I would like to show you is how to retrieve some information based on the actual UUID of a document. As explained above, the UUID of a document can be found in the URL of its detail’s page:

  • A document using its UUID
SELECT n.id AS "Node ID",
  n.store_id AS "Store ID",
  round(u.content_size/1024/1024,2) AS "Size (MB)",
  n.uuid AS "Document ID (UUID)",
  n.audit_creator AS "Creator",
  n.audit_created AS "Creation Date",
  n.audit_modifier AS "Modifier",
  n.audit_modified AS "Modification Date",
  p1.string_value AS "Document Name",
  u.content_url AS "Location"
FROM alf_node AS n,
  alf_node_properties AS p,
  alf_node_properties AS p1,
  alf_namespace AS ns,
  alf_qname AS q,
  alf_content_data AS d,
  alf_content_url AS u
WHERE n.id=p.node_id
  AND ns.id=q.ns_id
  AND p.qname_id=q.id
  AND p.long_value=d.id
  AND d.content_url_id=u.id
  AND p1.node_id=n.id
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND n.uuid='eb267742-c018-4ba5-8ca4-75ca23c860f0';

 

Another possible command would be to find some information based on the File System location. That can be useful for example if there is a big document on the File System and you want to know the type of this document with the extension, its name or maybe some other information about the creator/modifier:

  • A document using its path on the File System
SELECT n.id AS "Node ID",
  n.store_id AS "Store ID",
  round(u.content_size/1024/1024,2) AS "Size (MB)",
  n.uuid AS "Document ID (UUID)",
  n.audit_creator AS "Creator",
  n.audit_created AS "Creation Date",
  n.audit_modifier AS "Modifier",
  n.audit_modified AS "Modification Date",
  p1.string_value AS "Document Name",
  u.content_url AS "Location"
FROM alf_node AS n,
  alf_node_properties AS p,
  alf_node_properties AS p1,
  alf_namespace AS ns,
  alf_qname AS q,
  alf_content_data AS d,
  alf_content_url AS u
WHERE n.id=p.node_id
  AND ns.id=q.ns_id
  AND p.qname_id=q.id
  AND p.long_value=d.id
  AND d.content_url_id=u.id
  AND p1.node_id=n.id
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND u.content_url='store://2015/4/30/12/5/0e111f05-7fcf-4a44-b719-b94cd04dd5ab.bin';

 

II. Number of…

From a reporting point of view, let’s say that you need some information regarding the number of… something. In this case and if you want to use your DB directly, then there is a really simple solution (simple but is it the best?) because Alfresco provide a Database architecture that is quite simple to understand and to use to get what you need. Indeed, if you take a look at the “alf_qname” table, you will see that every element that is part of Alfresco has its QName listed here. A QName is the Qualified Name of a repository item. This can be seen as a kind of “Super-Type”:

alfresco=> SELECT * FROM alf_qname;
 id | version | ns_id | local_name
----+---------+-------+------------
  1 |       0 |     1 | store_root
  2 |       0 |     1 | aspect_root
  3 |       0 |     1 | container
  4 |       0 |     1 | children
  5 |       0 |     2 | user
...
 24 |       0 |     6 | folder
...
 51 |       0 |     6 | content
...
133 |       0 |     6 | thumbnail
134 |       0 |    13 | rendition
...

 

As you can see above, if you are searching for something that has a content, it can be done quite easily using the id or the local_name that correspond to that. So based on this table, here are some queries that can be useful:

  • Retrieve the number of users in the Repository
SELECT count(*)
FROM alf_node AS n,
  alf_qname AS q
WHERE n.type_qname_id=q.id
  AND q.local_name='user';

 

  • Retrieve the number of elements with a content in the Repository (include system’s documents)
SELECT count(*)
FROM alf_node AS n,
  alf_qname AS q
WHERE n.type_qname_id=q.id
  AND q.local_name='content';

 

  • Retrieve the number of thumbnails in the Repository
SELECT count(*)
FROM alf_node AS n,
  alf_qname AS q
WHERE n.type_qname_id=q.id
  AND q.local_name='thumbnail';

 

  • Retrieve the number of renditions in the Repository
SELECT count(*)
FROM alf_node AS n,
  alf_qname AS q
WHERE n.type_qname_id=q.id
  AND q.local_name='rendition';

 

Of course you can do that for all QNames but you can also be more precise! So based on the query to retrieve the number of elements with a content, if you only want the number of documents of a specific type, then you can simply complete your query (this suppose the documents kept their extension… which should be the case):

  • Retrieve the number of XML documents in the Repository
SELECT count(*)
FROM alf_node AS n,
  alf_qname AS q,
  alf_node_properties AS p
WHERE n.type_qname_id=q.id
  AND p.node_id=n.id
  AND p.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND q.local_name='content'
  AND p.string_value LIKE '%.xml';

 

  • Retrieve the number of PDF documents in the Repository
SELECT count(*)
FROM alf_node AS n,
  alf_qname AS q,
  alf_node_properties AS p
WHERE n.type_qname_id=q.id
  AND p.node_id=n.id
  AND p.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND q.local_name='content'
  AND p.string_value LIKE '%.pdf';

 

As the creation date, creator, modification date and modifier information are also stored on the “alf_node” table, you can also very easily filter your query based on the creation/update date of an Alfresco Node. That’s pretty cool, right?! 😉

 

III. Lifecycle specific

To complete the relation between this blog post and the previous one, I wanted to share some queries that can be used to identify the current state of a document. As explained in my previous post, a document that is not yet deleted will be in the store named “workspace://SpacesStore”. A document that has been deleted by a user will be in the sotre named “archive://SpacesStore” and when this document is removed from the global trashcan, the orphan_time is set to the current timestamp. With all these information and with the “alf_node” and “alf_content_url” tables we can easily build our own queries to find what is needed.

alfresco=> SELECT * FROM alf_store;
 id | version | protocol  |       identifier        | root_node_id
----+---------+-----------+-------------------------+--------------
  1 |       1 | user      | alfrescoUserStore       |            1
  2 |       1 | system    | system                  |            5
  3 |       1 | workspace | lightWeightVersionStore |            9
  4 |       1 | workspace | version2Store           |           10
  5 |       1 | archive   | SpacesStore             |           11
  6 |       1 | workspace | SpacesStore             |           12
(6 rows)

 

So let’s find all documents that have been created and aren’t deleted yet. In the queries below, you can replace the store_id= and type_qname_id= with some SELECT queries as I showed you previously in this blog. Since there are already some examples, I will use the IDs directly:

  • All documents created in their active life
SELECT *
FROM alf_node
WHERE store_id=6
  AND type_qname_id=51;

 

The next step on the lifecycle is when the documents have been deleted by a user but aren’t deleted from the global trashcan (orphan_time is still NULL):

  • All documents created that are in the global trashcan (deleted by users)
SELECT *
FROM alf_node
WHERE store_id=5
  AND type_qname_id=51;

 

Finally, when the documents are removed from the global trashcan, some references/fields are removed, the QName of these documents change from “content” (51) to “deleted” (140) on the “alf_node” table and the orphan_time is set to the current timestamp on the “alf_content_url” table:

  • All elements that have been removed from the global trashcan and that are now orphaned
SELECT *
FROM alf_content_url
WHERE orphan_time IS NOT NULL;

 

IV. Other examples

This section of the blog has been added starting in 2017. There were several comments on this blog asking for help on how to get information about several things so I thought I would just add a new section and complete the blog with new queries that can be useful for some people. So here we go:

  • Listing all existing Sites with names, descriptions and visibility
SELECT n.id AS "Node ID",
  n.store_id AS "Store ID",
  n.uuid AS "Site ID (UUID)",
  p.string_value AS "Site Name",
  p1.string_value AS "Site Description",
  p2.string_value AS "Site Visibility",
  n.audit_creator AS "Creator",
  n.audit_created AS "Creation Date",
  n.audit_modifier AS "Modifier",
  n.audit_modified AS "Modification Date"
FROM alf_node AS n,
  alf_node_properties AS p,
  alf_node_properties AS p1,
  alf_node_properties AS p2,
  alf_qname AS q
WHERE n.type_qname_id=q.id
  AND p.node_id=n.id
  AND p1.node_id=n.id
  AND p2.node_id=n.id
  AND p.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='description')
  AND p2.qname_id IN (SELECT id FROM alf_qname WHERE local_name='siteVisibility')
  AND q.local_name='site';

 

  • Size and number of documents created by all users
SELECT n.audit_creator AS "Creator",
  round(sum(s.bin_size)/1024/1024,2) AS "Total Size (MB)",
  count(*) AS "Nb Document"
FROM alf_node AS n,
  alf_qname AS q,
  (SELECT n1.id AS "id",
     u1.content_size AS "bin_size"
   FROM alf_node AS n1,
     alf_node_properties AS p1,
     alf_content_data AS d1,
     alf_content_url AS u1
   WHERE n1.id=p1.node_id
     AND p1.long_value=d1.id
     AND d1.content_url_id=u1.id
     AND p1.qname_id=(SELECT id FROM alf_qname WHERE local_name='content')) AS s
WHERE n.type_qname_id=q.id
  AND n.id=s.id
  AND q.local_name='content'
GROUP BY n.audit_creator
ORDER BY sum(s.bin_size) DESC;

 

  • Retrieve all users with their username and status:
SELECT n.id,
  n.uuid,
  q.local_name AS "Type",
  p1.string_value AS "Username",
  p2.boolean_value AS "Enabled"
FROM alf_node AS n,
  alf_qname AS q,
  alf_node_properties AS p1,
  alf_node_properties AS p2
WHERE n.type_qname_id=q.id
  AND n.id=p1.node_id
  AND p1.node_id=p2.node_id
  AND q.local_name='user'
  AND p1.string_value!=''
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='username')
  AND p2.qname_id IN (SELECT id FROM alf_qname WHERE local_name='enabled');

 

  • Disabling all users (except ‘admin’). Please note that you really (REALLY) shouldn’t do this on the DB as it skips all caches, aso… You would need to restart Alfresco for that and it would only be for the Alfresco inline users, it wouldn’t prevent them to login using an AD, if configured. This should really be done from the API layer instead.
UPDATE alf_node_properties
  SET boolean_value=false
WHERE node_id IN (SELECT n1.id
    FROM alf_node AS n1,
      alf_node_properties AS p1
    WHERE n1.type_qname_id IN (SELECT q1.id FROM alf_qname AS q1 WHERE q1.local_name='user')
      AND n1.id=p1.node_id
      AND p1.qname_id IN (SELECT q2.id FROM alf_qname AS q2 WHERE q2.local_name='username')
      AND p1.string_value NOT IN ('admin'))
 AND qname_id IN (SELECT q3.id
    FROM alf_qname AS q3
    WHERE q3.local_name='enabled');

 

For the next commands, let’s use the document that I created in my previous blog regarding the lifecycle and that I mentioned at the beginning of this blog. The Document name was “Test_Lifecycle.docx” and its node_id was “131856”. With the very first SQL from this blog, I could find its location on the file system… But what about its location on Alfresco?

  • Retrieve the parent node_id of an Alfresco node from the node’s name (files, folders, aso…) – Obviously the name needs to be unique otherwise you will see several parents node_id
SELECT child_node_id AS "Parent node_id",
  qname_localname AS "Parent name"
FROM alf_child_assoc
WHERE type_qname_id IN (SELECT id FROM alf_qname WHERE local_name='contains')
  AND child_node_id IN (SELECT parent_node_id
    FROM alf_child_assoc
    WHERE qname_localname='Test_Lifecycle.docx');

 

  • Retrieve the Parent node_id of an Alfresco node from the child node’s id (files, folders, aso…)
SELECT child_node_id AS "Parent node_id",
  qname_localname AS "Parent name"
FROM alf_child_assoc
WHERE type_qname_id IN (SELECT id FROM alf_qname WHERE local_name='contains')
  AND child_node_id IN (SELECT parent_node_id
    FROM alf_child_assoc
    WHERE child_node_id='131856');

 

From this point, you could obviously continue to go up & up again on the folder’s hierarchy by just changing the child_node_id with the Parent node_id and you would at some point end-up at the root of the Alfresco tree: the “company_home”. Now if you don’t want to do it manually, I also have something for you:

  • Recursively find the parents of an Alfresco node (from its id, uniqueness is important here) – 1 node per line
WITH RECURSIVE parents AS
(SELECT a1.id,
    a1.child_node_id,
    a1.qname_localname AS "child_name",
    a1.parent_node_id
  FROM alf_child_assoc AS a1
  WHERE a1.child_node_id='131856'
  UNION
  SELECT a2.id,
    a2.child_node_id,
    a2.qname_localname AS "child_name",
    a2.parent_node_id
  FROM alf_child_assoc AS a2
  JOIN parents p ON a2.child_node_id=p.parent_node_id
)
SELECT * FROM parents;

 

If the file “Test_Lifecycle.docx” is present in Alfresco under a folder named “MyFolder” in a site name “MySite”, then this would be a possible outcome of the query:

   id   | child_node_id |      child_name      | parent_node_id
--------+---------------+----------------------+----------------
 956325 |        131856 | Test_Lifecycle.docx  |         131800
 957208 |        131800 | MyFolder             |         131732
 943025 |        131732 | documentLibrary      |         131704
 943002 |        131704 | MySite               |         130207
  72781 |        130207 | sites                |             13
      7 |            13 | company_home         |             12
 (6 rows)

 

If you rather prefer to display the full path directly, I also have something for you!

  • Recursively find the parents of an Alfresco node (from its id, uniqueness is important here) – full path
WITH RECURSIVE parents AS
(SELECT a1.id,
    a1.child_node_id,
    a1.qname_localname AS "child_name",
    a1.parent_node_id
  FROM alf_child_assoc AS a1
  WHERE a1.child_node_id='131856'
  UNION
  SELECT a2.id,
    a2.child_node_id,
    a2.qname_localname AS "child_name",
    a2.parent_node_id
  FROM alf_child_assoc AS a2
  JOIN parents p ON a2.child_node_id=p.parent_node_id
)
SELECT string_agg(child_name,' > ') AS "file_path"
FROM (SELECT *,
    ROW_NUMBER () OVER ()
  FROM parents
  ORDER BY ROW_NUMBER DESC)
AS ordered_parents;

 

This time the outcome would be the following one:

                                    file_path
----------------------------------------------------------------------------------
 company_home > sites > MySite > documentLibrary > MyFolder > Test_Lifecycle.docx
 (1 row)

 

Then you could also do it the other way around (up->down instead of down->up). So starting with a parent and retrieving all its childs. There are some things you need to be aware of first: all the Alfresco nodes will be retrieved and not just folders/files! You can put a restriction on the “contains” alf_qname like I did in a few queries above but you will still have some hidden files and stuff like that, especially for the sites (surf-config, user’s dashboards setup, aso…)

  • Recursively find the number of childs (Alfresco nodes) of a specific Alfresco node (included in the count – in the example below, I’m using “company_home” = 13 (don’t run that on Billions of nodes!))
WITH RECURSIVE childs AS
(SELECT a1.id,
    a1.child_node_id,
    a1.qname_localname AS "child_name",
    a1.parent_node_id,
    a1.type_qname_id
  FROM alf_child_assoc AS a1
  WHERE a1.child_node_id='13'
  UNION
  SELECT a2.id,
    a2.child_node_id,
    a2.qname_localname AS "child_name",
    a2.parent_node_id,
    a2.type_qname_id
  FROM alf_child_assoc AS a2
  JOIN childs c ON a2.parent_node_id=c.child_node_id
)
SELECT count(*)
FROM childs
WHERE type_qname_id IN (SELECT id
  FROM alf_qname
  WHERE local_name='contains');

 

  • Listing the permissions assigned to a specific Alfresco node (it’s possible to use n.id instead of n.uuid if you have the Node ID)
SELECT n.id AS "Node ID",
  p.string_value AS "Document Name",
  ap.id AS "Permission ID",
  ap.name AS "Permission Name",
  aa.authority AS "User/Group",
  aace.allowed AS "Allowed",
  aacl.inherits AS "Inheritance Enabled",
  CASE WHEN aam.pos=0 THEN 'f'
    ELSE 't'
  END AS "Inherited"
FROM alf_node AS n,
  alf_node_properties AS p,
  alf_acl_member AS aam,
  alf_access_control_entry AS aace,
  alf_authority as aa,
  alf_access_control_list AS aacl,
  alf_permission AS ap
WHERE n.id=p.node_id
  AND n.acl_id=aam.acl_id
  AND aam.ace_id=aace.id
  AND aa.id=aace.authority_id
  AND aacl.id=aam.acl_id
  AND ap.id=aace.permission_id
  AND p.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND n.uuid='eb267742-c018-4ba5-8ca4-75ca23c860f0';

 

The “Inheritance Enabled” column is whether or not the inheritance is enabled on this specific node while the “Inherited” column is whether or not this specific permission has been set on the node locally (value = f) or is present here because of the inheritance (value = t) in which case the permission has been set on a parent (direct or not). The outcome would be something like that:

Node ID |    Document Name    | Permission ID | Permission Name |    User/Group     | Allowed | Inheritance Enabled | Inherited
--------+---------------------+---------------+-----------------+-------------------+---------+---------------------+-----------
 131856 | Test_Lifecycle.docx |            21 | dbi_Security    | GROUP_Accounting  | t       | t                   | t
 131856 | Test_Lifecycle.docx |            20 | dbi_Delete      | GROUP_Management  | t       | t                   | t
 131856 | Test_Lifecycle.docx |            18 | dbi_Read        | GROUP_Consultants | t       | t                   | t
 131856 | Test_Lifecycle.docx |            19 | dbi_Write       | GROUP_EVERYONE    | t       | t                   | f
 131856 | Test_Lifecycle.docx |            18 | dbi_Read        | MorganPatou       | t       | t                   | f
(5 rows)

 

  • Listing Workflow Tasks that are opened (end_time_ IS NULL) with the Workflow Instance details (initiator, id, start time) as well as the current Task details (start time, assignee, type):
SELECT DISTINCT pi.proc_def_id_ AS "WF Definition",
  pi.proc_inst_id_ AS "Instance ID",
  pi.start_user_id_ AS "Instance Initiator",
  pi.start_time_ AS "Instance Start",
  ai.task_id_ AS "Task ID",
  ai.act_name_ AS "Task Name",
  ai.act_type_ AS "Task Type",
  ai.assignee_ AS "Task Assignee",
  ai.start_time_ AS "Task Start",
  d1.time_ AS "Task Update",
  d1.text_ AS "Task Status",
  (SELECT d2.text_
    FROM act_hi_detail AS d2
    WHERE d2.id_=(SELECT MAX(id_) FROM act_hi_detail WHERE task_id_=ai.task_id_ AND name_='bpm_comment')
  ) AS "Task Comment"
FROM act_hi_procinst AS pi,
  act_hi_actinst AS ai,
  act_hi_detail AS d1,
  act_hi_detail AS d2
WHERE pi.proc_inst_id_=ai.proc_inst_id_
  AND ai.end_time_ IS NULL
  AND d1.id_=(SELECT MAX(id_) FROM act_hi_detail WHERE task_id_=ai.task_id_ AND name_='bpm_status')
  AND d1.task_id_=d2.task_id_
ORDER BY "Task Update";

 

  • Listing all locked documents

There are two types of locks in Alfresco: Persistent and Ephemeral. A Persistent lock is written to the Database so, it remains after a restart and it can be created with a CheckOut for example. On the other hand, an Ephemeral lock is volatile so, if you restart Alfresco, it won’t be there anymore and it can be created with the AOS integration (“Edit in M$ Office” action). Therefore, on the database, you can only see Persistent locks:

SELECT n.id AS "Node ID",
  n.uuid AS "Document ID (UUID)",
  p1.string_value AS "Document Name",
  p2.string_value AS "Lock Owner",
  p3.string_value AS "Lock Type",
  p4.string_value AS "Lock Lifetime",
  p5.string_value AS "Lock Expiry",
  p6.string_value AS "Lock Info"
FROM alf_node AS n,
  alf_node_properties AS p1,
  alf_node_properties AS p2,
  alf_node_properties AS p3,
  alf_node_properties AS p4,
  alf_node_properties AS p5,
  alf_node_properties AS p6
WHERE n.id=p1.node_id
  AND n.id=p2.node_id
  AND n.id=p3.node_id
  AND n.id=p4.node_id
  AND n.id=p5.node_id
  AND n.id=p6.node_id
  AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
  AND p2.qname_id IN (SELECT id FROM alf_qname WHERE local_name='lockOwner')
  AND p3.qname_id IN (SELECT id FROM alf_qname WHERE local_name='lockType')
  AND p4.qname_id IN (SELECT id FROM alf_qname WHERE local_name='lockLifetime')
  AND p5.qname_id IN (SELECT id FROM alf_qname WHERE local_name='expiryDate')
  AND p6.qname_id IN (SELECT id FROM alf_qname WHERE local_name='lockAdditionalInfo');

 

With this query, you should be able to see all documents that are currently locked with the UUID, Document Name as well as some information about the lock itself. This is the original document, it’s not the WorkingCopy! When you perform a Persistent lock, Alfresco will create a copy of the original document (a WorkingCopy) and you will be able to work on this duplicate. Once you CheckIn the document, it will replace the original one with the WorkingCopy. To find details about the WorkingCopy, you can use pretty much the same command but with other qnames: “name”, “workingCopyOwner”, “workingCopyLabel” or “workingCopyMode”. Here are the details printed for my original test document that has been locked:

Node ID |          Document ID (UUID)          |    Document Name    | Lock Owner |   Lock Type    | Lock Lifetime | Lock Expiry | Lock Info
--------+--------------------------------------+---------------------+------------+----------------+---------------+-------------+-----------
 131856 | eb267742-c018-4ba5-8ca4-75ca23c860f0 | Test_Lifecycle.docx | Morgan     | READ_ONLY_LOCK | PERSISTENT    |             |

 

These are a few examples… If you have some that you think should be shared, don’t hesitate to post them in the comments!

I hope you enjoyed this blog post because it was quite hard for me to write something about database queries without giving up my soul to the DB world! See you soon ;).

51 Comments

  • myriam says:

    Thank you very much!! this helped a lot

  • Mallesham says:

    HI Morgan,

    this is really pretty good for learners and appreciated your effort.

    Can you please help me out to understand how the version nodes will have a link/reference with workspace/spaceStore noderefs from DB side.

    All nodes i can see in alf_node with different store_id so what is the common property for store-id 4 & 6 to get all the current nodes and version nodes.

    Just want to see all the current and version nodes relationship in DB side. if possible please provide me the sql

    Thanks in advance!!!!!

    Regards,
    Mallesham

    • Morgan Patou says:

      Hi Mallesham,

      When the versionable aspect is applied to a node, there is a version history that is being created into the Version Store which is named “lightWeightVersionStore” (N°3 in my example above). Is this what you were looking for?

      Regards,
      Morgan

  • Wim says:

    Thanks lots for these examples.
    We have a multi-tenanted server, and in order to get the sizes per tenant I combined a few of your sample queries to break down content count & size by created user. Then I could split the usernames to find the tenant and use Excel to report on usage by tenant.
    Not the cleanest code but it works!

    See below:
    SELECT max(audit_creator) creator, sum(s.bin_size) size, count(*) cnt
    FROM alf_node AS n,
    alf_qname AS q,
    (SELECT n.id AS “id”,
    n.store_id AS “Store ID”,
    round(u.content_size/1024/1024,2) AS “bin_size”,
    n.uuid AS uuid,
    n.audit_creator AS “Creator”,
    n.audit_created AS “Creation Date”,
    n.audit_modifier AS “Modifier”,
    n.audit_modified AS “Modification Date”,
    p1.string_value AS “Document Name”,
    u.content_url AS “Location”
    FROM alf_node AS n,
    alf_node_properties AS p,
    alf_node_properties AS p1,
    alf_namespace AS ns,
    alf_qname AS q,
    alf_content_data AS d,
    alf_content_url AS u
    WHERE n.id=p.node_id
    AND ns.id=q.ns_id
    AND p.qname_id=q.id
    AND p.long_value=d.id
    AND d.content_url_id=u.id
    AND p1.node_id=n.id
    AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name=’name’)) s
    WHERE n.type_qname_id=q.id
    and n.id=s.id
    AND q.local_name=’content’
    group by n.audit_creator;

    • Morgan Patou says:

      Thanks for this precision Wim! I must admit that I never tried that on a multi-tenant installation. I would have expected another solution to separate the documents per tenant since Alfresco is able to separate them properly and I don’t think that Alfresco is checking which user created the document for that but that’s still a solution so thanks ;).

      Regards,
      Morgan

  • Very helpful information. Thank you.

  • Julian says:

    Hi Morgan,

    Thanks for the examples.

    Anyway, at the first one (Document Information) you’re assuming that the unique property that has a long type is the content size. I think it would be better to filter explicity by the size property, adding something like this toe the where clause:

    p.qname_id = (SELECT id FROM alf_qname WHERE local_name=’content’ and ns_id = (select id from alf_namespace where uri =”http://www.alfresco.org/model/content/1.0″) )

    My two cents.

    Kind regards,
    Julian

  • Ilse says:

    Hi,

    thanks for the example. Where do I have to put the sql queries?

    Regards,
    Ilse

  • Mozart says:

    You saved my day! Thank you!

  • Parth says:

    This is very useful. Thanks Morgan.

    Many times users would edit “Name” of document and remove file extension. Is there a way to capture extension of the file even if document name has been changed from say ‘Training.docx’ to ‘Training’?

    • Morgan Patou says:

      Hi Parth,

      It is true that on the queries above, I used the name of the file but if you check the full properties of a file, it should contain a field saying that this file is a Microsoft Word 2007 file for example. Therefore there is a field in the Database that store this information, it isn’t in the name of the file but I’m afraid this might not be exactly what you are searching for.

      Of course, you can also create your own extension which would link a new property to all documents of your Alfresco installation and that would store the extension.

      Now the problem with renaming the file inside Alfresco without extension… If new users are coming and downloading these files without extensions, then it might cause issues. I already saw corrupted files because someone renamed a file without its extension and overwritten something inside the repository. A best solution for me would be to actually hide the extension on the Share UI or making it a non-editable field, allowing just the users to rename the files without touching the extension at all… That would require a lot of customization from my point of view.

      Regards,
      Morgan

  • Rendra says:

    Thank you so much Morgan for sharing this one.
    I need learn more and more about Alfresco, I wanna implement it into my company.

  • Manuel says:

    Thank you very much!!

  • DSr says:

    Nice Post.

    Can anyone share the SQL/PostgreSQL query to retrieve all sites created in Alfresco.

    • Morgan Patou says:

      Hi DSr,

      Something like this should do the trick:

      SELECT n.id AS "Node ID",
      n.store_id AS "Store ID",
      n.uuid AS "Site ID (UUID)",
      p.string_value AS "Site Name",
      p1.string_value AS "Site Description",
      p2.string_value AS "Site Visibility",
      n.audit_creator AS "Creator",
      n.audit_created AS "Creation Date",
      n.audit_modifier AS "Modifier",
      n.audit_modified AS "Modification Date"
      FROM alf_node AS n,
      alf_node_properties AS p,
      alf_node_properties AS p1,
      alf_node_properties AS p2,
      alf_qname AS q
      WHERE n.type_qname_id=q.id
      AND p.node_id=n.id
      AND p1.node_id=n.id
      AND p2.node_id=n.id
      AND p.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
      AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='description')
      AND p2.qname_id IN (SELECT id FROM alf_qname WHERE local_name='siteVisibility')
      AND q.local_name='site';

      => When I will have some time, I will update my blog to add more example like this one!

      Regards,
      Morgan

  • Sandeep Reddy says:

    Hi Sir,
    I need to show most viewed documents in alfresco and most liked documents.
    help me out please

  • Sandeep Reddy says:

    Hi Everyone,

    By using this query

    SELECT distinct n.audit_creator as creator,p.string_value as document_name,date(n.audit_created) as created_on
    From alf_node AS n,alf_qname AS q,alf_node_properties as p
    WHERE n.type_qname_id=q.id AND p.node_id=n.id AND p.qname_id IN (SELECT id FROM alf_qname WHERE local_name=’name’)
    AND n.audit_creator not in(‘admin’) AND n.audit_creator not in(‘System’) AND q.local_name=’content’ AND p.string_value NOT LIKE ‘%.xml’ order by created_on ASC;

    i am getting like this

    creator | document_name | created_on
    ———-+————————–+————
    Sandeep | Alfresco_User_Guide.docx | 2018-03-20
    (1 row)

    Like this i need to show list of most viewed documents in alfresco, can anyone help me please

  • Raji says:

    thanks for the example. Where do I have to put the sql queries and how to view database tables?

    • Morgan Patou says:

      Hi Raji,

      It all depends on your Database. I’m personally using PostgreSQL above because it’s the better DB for Alfresco from my point of view so for this DB, you would need for example to start a psql session (here the DB name is “alfresco”): “psql alfresco”. You could potentially add more parameters like -h for the hostname if it is running remotely, aso… You can find the full list here for example: https://www.postgresql.org/docs/9.6/static/app-psql.html

      Then for listing, it is pretty simple with PostgreSQL: “\l” can be used to list all databases and “\dt” to list all databases tables.

      Regards,
      Morgan

  • Chris says:

    Is the a query to list documents per site? I can list docs and I can list sites but what is the relationship?

    Thanks
    Chris

    • Morgan Patou says:

      Hi Chris,

      It is rather difficult to find this kind of things on the Database directly because all you have, as far as I know, is a parent/child association and therefore you would need to loop through them.

      Let’s say that you have a file “SiteFile.txt” present on a folder “SiteFolder” which is under the “documentLibrary” folder of a site named “dbi-services”. Then you could find the parent of the file “SiteFile.txt” like that:
      alfresco=> SELECT child_node_id, qname_localname AS child_name, parent_node_id FROM alf_child_assoc WHERE qname_localname='SiteFile.txt';
      child_node_id | child_name | parent_node_id
      ---------------+--------------+----------------
      471898 | SiteFile.txt | 471911
      (1 row)

      From there, you have the child node id and name (the document) as well as the direct parent id (the folder “SiteFolder”) which you can then retrieve as follow (slightly different query):
      alfresco=> SELECT child_node_id, qname_localname AS child_name, parent_node_id FROM alf_child_assoc WHERE child_node_id='471911';
      child_node_id | child_name | parent_node_id
      ---------------+------------+----------------
      471911 | SiteFolder | 337867
      (1 row)

      And you can continue up 1 folder:
      alfresco=> SELECT child_node_id, qname_localname AS child_name, parent_node_id FROM alf_child_assoc WHERE child_node_id='337867';
      child_node_id | child_name | parent_node_id
      ---------------+-----------------+----------------
      337867 | documentLibrary | 337850
      (1 row)

      and again:
      alfresco=> SELECT child_node_id, qname_localname AS child_name, parent_node_id FROM alf_child_assoc WHERE child_node_id='337850';
      child_node_id | child_name | parent_node_id
      ---------------+--------------+----------------
      337850 | dbi-services | 110196
      (1 row)

      And when finally you reach the “company_home”, that’s the root of your directory.

      You could create a loop to reconstruct this path. You could also do it the other way around… Since you want the content of sites, you know that you need to find the childs of the “sites” folder (in my case ‘110196’). Beware of the difference between files and folders through…

      Regards,
      Morgan

    • Morgan Patou says:

      I forgot to mention it but the simplest way would most probably be to use the Solr Index for that, either via the Solr Stats or doing a search query.

      Regards,
      Morgan

    • Morgan Patou says:

      Hi Chris,

      I updated this blog with some queries regarding child/parents including recursive lookups from the DB. The last one in particular might be something you would want to check.

      Regards,
      Morgan

  • brahim soumer says:

    Please can you help me to have an SQL query to have the user/groups permissions on folders
    thank you

    • Morgan Patou says:

      Hi Brahim,

      This should do the trick (I will update my blog with this new query):
      SELECT n.id AS "Node ID",
      p.string_value AS "Document Name",
      ap.id AS "Permission ID",
      ap.name AS "Permission Name",
      aa.authority AS "User/Group",
      aace.allowed AS "Allowed",
      aacl.inherits AS "Inheritance Enabled",
      CASE WHEN aam.pos=0 THEN 'f'
      ELSE 't'
      END AS "Inherited"
      FROM alf_node AS n,
      alf_node_properties AS p,
      alf_acl_member AS aam,
      alf_access_control_entry AS aace,
      alf_authority as aa,
      alf_access_control_list AS aacl,
      alf_permission AS ap
      WHERE n.id=p.node_id
      AND n.acl_id=aam.acl_id
      AND aam.ace_id=aace.id
      AND aa.id=aace.authority_id
      AND aacl.id=aam.acl_id
      AND ap.id=aace.permission_id
      AND p.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
      AND n.uuid='eb267742-c018-4ba5-8ca4-75ca23c860f0';

      Obviously you will need to change the UUID at the last line or use instead the Node ID (n.id) to retrieve the information that you want.

      Regards,
      Morgan

  • Sanjay Singh says:

    How can I get last modified project or Last use

  • Tanks a lot, so functionals SQL for alfresco 5.2

  • Brian says:

    Hi Morgan,

    Thanks for these queries – they are very useful.

    However I now know that my account (I am the admin) has 3900+ deleted items – but there are zero in my trashcan???

    There are hundreds of deleted items – but they were deleted by other users, not my account.

    Any ideas as to how I can find (and permanently delete) those 3900+ items???

    This query shows me the count (I think):
    SELECT audit_creator,COUNT(*) FROM alf_node WHERE store_id=5 AND type_qname_id=53 GROUP BY audit_creator;
    audit_creator | count
    —————+——-
    ITSupport | 3907

    (9 rows)

    In our case, store_id=5 is the archive spacesstore (deleted items?) and content has id=53:
    SELECT * FROM alf_qname WHERE local_name=’content’;
    id | version | ns_id | local_name
    —-+———+——-+————
    53 | 0 | 6 | content

    Many thanks in advance for your help on this.

    • Morgan Patou says:

      Hi Brian,

      The query you are using doesn’t show how many items are in your trashcan, it shows how many documents in the trashcan have been created initially by you. When a document is deleted, the audit_creator doesn’t change, it still has the same value as when the document was still active: it’s the person that created the document in Alfresco.

      So these 3900+ items were initially created by you but they might have been deleted by any user.

      If you are a member of the ADMIN group, then you should be able to see all items in the trashcan (deleted by any user). If you are only a normal user, you will only see the items that you deleted personally, not even the documents that you created or modified, only the ones that you deleted.

      Regards,
      Morgan

  • Hi Morgan,

    This is Makarand (Mak) working as PM with Trafigura. We use Alfresco DMS.

    May I request some help from you.

    I am trying to retrieve following information from Alfresco Database.

    Basically I want to find all users names who Drag emails from Outlook & drop the mails in Alfresco (via Fred UI).
    I also want to find the count of emails uploaded by every user in last one year.

    1. All users names who uploaded emails (with attachments) from Outlook & Dropped the emails in Alfresco through FRED.
    2 Count of emails Dragged & dropped in Alfresco.
    Could you please help me give some tips / SQL query to retrieve above data.

    I am looking for above information a bit urgently so will be grateful if you provide me SQL query early if possible.

    Thank you very much.
    Regards,
    Makarand Sinkar (Mak)

    • Morgan Patou says:

      Hi Makarand,

      I’m afraid this won’t be an easy task if you have a default setup. If I’m not mistaken, by default, the emails imported into Alfresco or send to Alfresco will have a mimetype of “plain/text”, which means that it’s the same mimetype as any .txt file for example. So unless you created a specific mimetype in your Alfresco installation such as “mail/text” for example or setup some rules to switch all incoming emails to a specific mimetype, then you might not find all uploaded emails that easily.
      => You could try to find all documents in Alfresco with a name that ends with “.msg” (the extension for emails). For this, you can use the query from my blog (search for “Retrieve the number of PDF documents in the Repository”) and just replace “.pdf” with “.msg”. But as a warning, this suppose that all emails uploaded to Alfresco have a name that ends with .msg!
      And this specific query doesn’t know if there is an attachment for this email. Depending on your implementation/solution, attachments might stay inside the .msg file or they can be extracted as a separate document with an association to the .msg file, aso… So depending on your implementation/solution, it is possible or not to find the number of .msg with attachment. If you have an association between the .msg and the attachment, you can find the number of emails with attachments for example. And from there, it’s easy to find the usernames and when it was imported into Alfresco. If the attachments are still inside the .msg document, I believe it won’t be possible to find what you want.

      As a side note, Alfresco doesn’t care where the document was imported from so you cannot differentiate if the emails have been imported from FRED or from Outlook or from something else. Unless you have a different implementation that would change something identifiable (like a different mimetype or a specific parameter).

      Regards,
      Morgan

  • aldrin says:

    Hi Morgan,

    Is there any query that i can view all the open task on a specific site. And who is the assignee and creator of the task.

    thank you very much!

    • Morgan Patou says:

      Hi aldrin,

      Unfortunately there are no direct links between sites of documents and the workflow tasks. You can retrieve a lot of information about workflow tasks and link that to people or documents like the folder is resides in. But then you would need to find out where that folder is exactly and that isn’t that easy from a DB point of view. There are some commands in this blog to do that recursively.

      For the Workflow Task details, here is an example to show all Tasks not done yet with the initiator and the current assignee:

      SELECT DISTINCT pi1.proc_def_id_ AS "WF Definition",
      pi1.start_user_id_ AS "Instance Initiator",
      pi1.start_time_ AS "Instance Start",
      pi1.end_time_ AS "Instance End",
      pi1.proc_inst_id_ AS "Instance ID",
      ai1.task_id_ AS "Task ID",
      ai1.act_name_ AS "Task Name",
      ai1.act_type_ AS "Task Type",
      ai1.assignee_ AS "Task Assignee",
      ai1.start_time_ AS "Task Start",
      d1.text_ AS "Task Status",
      d1.time_ AS "Task Update",
      (SELECT d2.text_ FROM act_hi_detail AS d2 WHERE d2.id_=(SELECT MAX(id_) FROM act_hi_detail WHERE task_id_=ai1.task_id_ AND name_='bpm_comment')) AS "Task Comment"
      FROM act_hi_procinst AS pi1,
      act_hi_actinst AS ai1,
      act_hi_detail AS d1,
      act_hi_detail AS d2
      WHERE pi1.proc_inst_id_=ai1.proc_inst_id_
      AND ai1.end_time_ IS NULL
      AND d1.id_=(SELECT MAX(id_) FROM act_hi_detail WHERE task_id_=ai1.task_id_ AND name_='bpm_status')
      AND d1.task_id_=d2.task_id_;

      Regards,
      Morgan

  • Hi Morgan,

    Thank you for detailed answer.

    Please accept my sincere apologies. I was away hence could not acknowledge.

    Regards,
    Makarand Sinkar

  • LEON TSATY says:

    I would like to list in Alfresco all files locked by users and the name of the user who blocked it.
    Thank you

    • Morgan Patou says:

      Hi Leon,

      I updated my blog with the command you were looking for:

      SELECT n.id AS "Node ID",
      n.uuid AS "Document ID (UUID)",
      p1.string_value AS "Document Name",
      p2.string_value AS "Lock Owner",
      p3.string_value AS "Lock Type",
      p4.string_value AS "Lock Lifetime",
      p5.string_value AS "Lock Expiry",
      p6.string_value AS "Lock Info"
      FROM alf_node AS n,
      alf_node_properties AS p1,
      alf_node_properties AS p2,
      alf_node_properties AS p3,
      alf_node_properties AS p4,
      alf_node_properties AS p5,
      alf_node_properties AS p6
      WHERE n.id=p1.node_id
      AND n.id=p2.node_id
      AND n.id=p3.node_id
      AND n.id=p4.node_id
      AND n.id=p5.node_id
      AND n.id=p6.node_id
      AND p1.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name')
      AND p2.qname_id IN (SELECT id FROM alf_qname WHERE local_name='lockOwner')
      AND p3.qname_id IN (SELECT id FROM alf_qname WHERE local_name='lockType')
      AND p4.qname_id IN (SELECT id FROM alf_qname WHERE local_name='lockLifetime')
      AND p5.qname_id IN (SELECT id FROM alf_qname WHERE local_name='expiryDate')
      AND p6.qname_id IN (SELECT id FROM alf_qname WHERE local_name='lockAdditionalInfo');

      Cheers,
      Morgan

  • ami says:

    Hi Morgan,
    Can you share query for no of users last logged in alfresco for 30 days. Audit is enabled for our application.

    Thanks in advance.

  • Peter Szladovics says:

    Dear Morgan,

    Thanks your help! But…
    One of my customers have a big document library under alfresco 4.2.f – at the moment it has more then 14TB data. I’m afraid of the front-end application is buggy, because it’s database have less available ID than the number of files in the alfresco spacestore directory. That application has a separated database with alfresco noderefs.
    Example… That app has N record, but under the alfresco have 2N (or more) files in the SpaceStore (the app not use versioning).
    What I need…
    How can I display the all of available noderefs what are point to any uploaded file (not a folder)?
    How can I display the count of them?
    Thank you so much

    Peter

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Morgan Patou
Morgan Patou

Senior Consultant & Technology Leader ECM