Accessing Files from Python and R Scripts

How can a Datagrok script developer access files from Python or R code? Within the Datagrok documentation there are several ways to access files using the JS API, from files contained in a Package:

or contained in a user’s home directory:

Our use case is to load a custom pretrained ML model and then run predictions across a large set of molecules. Providing scientists with a convenient script that will add a new column with predicted values. This trained model is built outside of Datagrok and then we need to incorporate it. Loading from S3 would be ideal.

Thank you for your advice!

Tom

Hello, Tom! You’ve given us an interesting challenge. We’re currently working on various solutions and hope to get back to you soon with an optimal one.

Hi Tom.
Earlier we had a MVP technology that allows to pass files from any file connectors to python scripts. Unfortunately, we don’t support it right now, but it’s a great idea to bring it back.

I assume, you need to pass an S3 file as a parameter to python or R script?

Hello, Tom! We have created a ticket for your request, and you can track its status here.

Yes, we need to access files hosted on S3 and utilize them in either Python or R scripts. What does MVP technology stand for? Thanks Alex for bringing back this technology.

minimum viable product.

It was a demo that shows how to access files in S3 from Python. I’ll try to make it work.

Hi Tom, I wonder if we can’t make it a lot simpler? Why wouldn’t your Python script just download the necessary files from S3 using AWS SDK (boto3)?.

Hi Andrew, hosting files in S3 and downloading them is certainly an option. How can we handle AWS credentials within Datagrok Python functions in a manner that doesn’t embed the credentials in code?

This was solved using AWS Roles/Policies that allow S3 bucket access. Using boto3 library you can natively connect to S3 using Datagrok Python scripts.

3 Likes