Change datasource in a saved project / data preprocess workflow

Regarding “Change datasource in a saved project”:

Could you please let me know if it is possible to change only the file to be loaded for a created project? Currently, it is possible to display the same analysis screen by applying the layout file, but it can be a bit cumbersome when there are many files to load or when there are many analysis screens.
It would be greatly convenient if this operation could be performed, especially in projects using data-sync. I would appreciate it if you could consider developing it.

Regarding “data preprocess workflow”:
I would like to hear your opinion on this matter. While it is generally based on loading data into datagrok after data preprocessing, sometimes we need to start the analysis from data that has not been preprocessed. In such cases, is there a way to register a function in datagrok in advance to split the values with inequality signs (e.g., >10) into “xxx_sign: >” and “xxx_value: 10” for columns where the activity value is entered (xxx being the original column name)?
It is probably a functionality provided by the current datagrok, but if possible, I would appreciate it if you could share a simple example.

Hi,

Could you please let me know if it is possible to change only the file to be loaded for a created project? Currently, it is possible to display the same analysis screen by applying the layout file, but it can be a bit cumbersome when there are many files to load or when there are many analysis screens.
It would be greatly convenient if this operation could be performed, especially in projects using data-sync. I would appreciate it if you could consider developing it.

Not sure I understand correctly.
Do you want to open the project, change one single file and save it?
I think it’s impossible, but let me better understand the use case.

1 Like

I am thinking I would like to make a common “Project” as template for multiple teams. Data tables for each team have many common columns and some team specific columns.

Although this idea might not align with datagrok’s philosophy, if I can replace file to load, I can re-use common Project and make team specific Project efficiently.

I would like to hear opinion for my question. I appreciate your consideration in advance.

So, you have multiple Data Sources for different teams, but you want to use same visualizations?
We have Data Sync feature now, but unfortunately it does not support parameters, only fixed script.

I’m thinking of a system that asks something to user before project loads and loads project accordingly to users input, passing it to Data Sync data source function.

Does it make sense to you?

I’m thinking of a system that asks something to user before project loads and loads project accordingly to users input, passing it to Data Sync data source function.

Dear Alex,
Thanks for your comment and that’s exactly what I wanted. I am confident that it is definitely useful because one Project can be utilized in multiple scenarios.

To be easier to imagine, I set exemplified data source files are something like below (column names are listed):
File for team A: activity_A1, activity_A2, common1, common2, common3
File for team B: activity_B1, activity_B2, activity_B3, common1, common2, common3
File for team C: activity_C1, activity_C2, activity_A1, common1, common2, common3

For clarification, team C is focusing on selectivity between activity_A1 and activity_C1 so activity_A1 is included in the file for team C.

Additionally, I imagine I will load multiple files for each Project. That’s why selecting files to be loaded before opening Project will be nice feature.

1 Like

What about file names? Who determines, which files to use?
Do teams make decisions, or it should be pre-set-up by admin?
For instance, how user decides if he needs activity_A1 or activity_B1?

I have to ask questions to better understand your problem and propose a better solution.

Dear Alex,
Thank you for your questions. I should clarify more to organize my idea.

File names and common column names are fixed by Admin. Let’s say, we start loading main.sdf and suppl.csv .

In main.sdf, some columns are independent and some are common. The number of columns and their names related to activity against therapeutic target vary depending on each team. Common columns also exists and their name are fixed by Admin.
File for team A: Cmpd_ID, activity_A1 , activity_A2, common1, common2, common3
File for team B: Cmpd_ID, activity_B1, activity_B2, activity_B3, common1, common2, common3
File for team C: Cmpd_ID, activity_C1, activity_C2, activity_A1 , common1, common2, common3

In suppl.csv , their column names are completely common like below and their name are fixed by Admin. Additionally, they are linked with corresponding main.sdf by Cmpd_ID.
File for team A: Cmpd_ID, common4, common5, common6
File for team B: Cmpd_ID, common4, common5, common6
File for team C: Cmpd_ID, common4, common5, common6

So, the bottomline is:
You have N different files for teams, and they contain similar data. You want similar visualizations for files.

Last (I think) questions,
is the choice of the file is a free will of a team? Shoud the access be restricted?
Files are one-to-one with teams, or many-to-many?
Who decides which files team has access to?

Thanks!

You have N different files for teams, and they contain similar data. You want similar visualizations for files.

Exactly!

is the choice of the file is a free will of a team? Shoud the access be restricted?
Who decides which files team has access to?

Only administrators can choose files to prepare each team’s visualization, but end-users can access files freely (no access limit). Hence, end-users can load files when opening Project.

Files are one-to-one with teams, or many-to-many?

In my mind, I am planning to prepare separated folder for each team like beow. Similar files are stored for each team. Files names will be almost the same (team’s name + common name) or I can completely align file names if you want.
Once similar visualizations are created, all teams can start from the same starting point for their analysis and each team will evolve their visualizations with progress of their task like team C.

team A folder

  • A_main.sdf
  • A_suppl.csv

team B folder

  • B_main.sdf
  • B_suppl.csv

team C folder

  • C_main.sdf
  • C_suppl.csv
  • C_addional_data.csv ( ← added later after team wants to add additional visualization)

I am happy to share my ideas and discuss. If you have any questions, please let me know.

I think I will come up with POC of a system like this in a while.
I see it like a project with data sync enbled, that linked to a function that asks user which folder he want to access each time he opens the project.

1 Like